Clinical Epidemiology - The Essentials (PDFDrive)
Clinical Epidemiology - The Essentials (PDFDrive)
Epidemiology
The Essentials
Clinical
Epidemiology
The Essentials
Fifth Edition
Fifth Edition
Copyright © 2014, 2005, 1996, 1988, 1982 Lippincott Williams & Wilkins, a Wolters Kluwer business.
Printed in China
All rights reserved. This book is protected by copyright. No part of this book may be reproduced or transmitted
in any form or by any means, including as photocopies or scanned-in or other electronic copies, or utilized
by any information storage and retrieval system without written permission from the copyright owner, except
for brief quotations embodied in critical articles and reviews. Materials appearing in this book prepared by
individuals as part of their official duties as U.S. government employees are not covered by the above-mentioned
copyright. To request permission, please contact Lippincott Williams & Wilkins at 2001 Market Street,
Philadelphia, PA 19103, via email at [email protected], or via website at lww.com (products and services).
98 7 6 5 4 3 2 1
Fletcher, Robert H.
Clinical epidemiology : the essentials / Robert H. Fletcher, Suzanne
W. Fletcher, Grant S. Fletcher. – 5th ed.
p. ; cm.
Includes bibliographical references and index.
ISBN 978-1-4511-4447-5 (alk. paper)
I. Fletcher, Suzanne W. II. Fletcher, Grant S. III. Title.
[DNLM: 1. Epidemiologic Methods. WA 950]
614.4–dc23
2012022346
DISCLAIMER
Care has been taken to confirm the accuracy of the information present and to describe generally accepted
practices. However, the authors, editors, and publisher are not responsible for errors or omissions or for any
consequences from application of the information in this book and make no warranty, expressed or implied,
with respect to the currency, completeness, or accuracy of the contents of the publication. Application of this
information in a particular situation remains the professional responsibility of the practitioner; the clinical
treatments described and recommended may not be considered absolute and universal recommendations.
The authors, editors, and publisher have exerted every effort to ensure that drug selection and dosage set
forth in this text are in accordance with the current recommendations and practice at the time of
publication. However, in view of ongoing research, changes in government regulations, and the constant flow of
information relating to drug therapy and drug reactions, the reader is urged to check the package insert for each
drug for any change in indications and dosage and for added warnings and precautions. This is particularly
important when the recommended agent is a new or infrequently employed drug.
Some drugs and medical devices presented in this publication have Food and Drug Administration (FDA)
clearance for limited use in restricted research settings. It is the responsibility of the health care provider to
ascertain the FDA status of each drug or device planned for use in their clinical practice.
To purchase additional copies of this book, call our customer service department at (800) 638-3030 or fax orders
to (301) 223-2320. International customers should call (301) 223-2300.
Visit Lippincott Williams & Wilkins on the Internet: http://www.lww.com. Lippincott Williams & Wilkins
customer service representatives are available from 8:30 am to 6:00 pm, EST.
Preface
Modern research design and analyses, supported Clinical epidemiology is now considered a cen-
by powerful computers, make it possible to answer tral part of a broader movement, evidence-based
clinical questions with a level of validity and gen-
eralizability not dreamed of just a few years ago. to judging the
Preface
medicine. This is in recognition of the importance, in
addition validity and
However, this often comes at the cost of complexity, generalizability of clinical research results, of
placing readers at a distance from the actual data and asking questions that can be answered by research,
their meaning. Many of us may be confused as finding the available evi- dence, and using the best of
highly specialized research scientists debate that evidence in the care of patients. We have
alternative meanings of specific terms or tout new always considered these addi- tional competencies
approaches to study design and statistical analyses, important, and we give them even more attention
some of which seem uncomfortably like black boxes in this edition of the book.
no matter how hard we try to get inside them. In We hope that readers will experience as much
such situations, it is especially valuable to remain enjoyment and understanding in the course of
grounded in the basics of clinical research. We have read- ing this book as we have in writing it.
tried to do just that with the understanding that
readers may well want to go on to learn more about Robert H. Fletcher
this field than is possible from an introductory Suzanne W. Fletcher
textbook alone. Grant S. Fletcher
We are fortunate to have learned clinical epidemi- people and now learns from medical students, resi-
ology from its founders. Kerr White was Bob and dents, and faculty colleagues as he teaches them
Suzanne’s mentor during postgraduate studies at about the care for patients at Harborview Hospital
Johns Hopkins and convinced us that what matter and the University of Washington.
are “the benefits of medical interventions in relation While editors of the Journal of General Inter-
to their hazards and costs.” Alvan Feinstein taught nal Medicine and Annals of Internal Medicine, Bob
a generation of young clinician–scholars about the and Suzanne learned from fellow editors, including
“architecture of clinical research” and the dignity members of the World Association of Medical
of clinical scholarship. Archie Cochrane spent a Edi- tors (WAME), how to make reports of research
night at our home in Montreal when Grant was a more complete, clear, and balanced so that readers
boy and opened our eyes to “effectiveness and can understand the message with the least effort.
efficiency.” David Sackett asserted that clinical With our colleagues at UpToDate, the electronic
epidemiology is a “basic science for clinical informa- tion source for clinicians and patients, we
medicine” and helped the world to understand. Many have been developing new ways to make the best
others have followed. We are especially grateful for available evidence on real-world, clinical questions
our work in common with Brian Haynes, founding readily accessible during the care of patients and to
editor of ACP Journal Club; Ian Chalmers, who make that evidence understandable not just to
made the Cochrane Collabora- tion happen; Andy academicians and investigators, but to full-time
Oxman, leader of the Rocky Mountain Evidence- clinicians as well.
Based Healthcare Workshop; Peter Tugwell, a Ed Wagner was with us at the beginning of this
founding leader of the International Clinical project. With him, we developed a new course in
Epidemiology Network (INCLEN); and Russ clini- cal epidemiology for the University of North
Harris, our long-time colleague at the interface Carolina School of Medicine and wrote the first
between clinical medicine and public health at the edition of this book for it. Later, that course was
University of North Carolina. These extraordinary Grant’s introduction to this field, to the extent he had
people and their colleagues have created an exciting not already been intro- duced to it at home. Ed
intellectual environment that led to a revolution in remained a coauthor through three editions and
clinical scholarship, bringing the evidence base for then moved on to leadership of Group Health
clinical medicine to a new level. Research Institute and other responsi- bilities based
Like all teachers, we have also learned from our in Seattle. Fortunately, Grant is now on the writing
students, clinicians of all ages and all specialties who team and contributed his expertise with the
wanted to learn how to judge the validity of clini- application of clinical epidemiology to the current
cal observations and research for themselves. Bob prac- tice of medicine, especially the care of very sick
and Suzanne are grateful to medical students at patients. We are grateful to members of the team,
McGill University (who first suggested the need led by Lippincott Williams & Wilkins, who
for this book), the University of North Carolina translated word processed text and hand-drawn
and Har- vard Medical School; fellows in the figures into an attractive, modern textbook. We got
Robert Wood Johnson Clinical Scholars Program, expert, personal attention from Catherine Noonan,
the International Clinical Epidemiology Network who guided us in the preparation of this book
(INCLEN), and the Harvard General Medicine throughout; Jonathan Dimes, who worked closely
Fellowship; CRN Schol- ars in the Cancer Research with us in preparing illus- trations; and Jeri Litteral,
Network, a consortium of research institutes in who collaborated with us in
integrated health systems; and participants in the the copy editing phase of this project.
Rocky Mountain Evidence-Based Healthcare We are especially grateful to readers all over the
Workshops. They were our students and now are world for their encouraging comments and practical
our colleagues; many teach and do research with us. suggestions. They have sustained us through the rig-
Over the years, Grant has met many of these ors of preparing this, the fifth edition of a textbook
first published 30 years ago.
Acknowledgment
vii
C o n t e n t s in B r i e f
1. Introduction 1
2. Frequency 17
3. Abnormality 31
7. Prognosis 93
8. Diagnosis 108
9. Treatment 132
ix
Contents
CHAPTER 7: PROGNOSIS 93
Trade-Offs between Sensitivity and Specificity 113
The Receiver Operator Characteristic (ROC)
Differences in Risk and Prognostic Curve 114
Factors 93 Establishing Sensitivity and
The Patients Are Different 94 Specificity 115
The Outcomes Are Different 94 Spectrum of Patients 116
The Rates Are Different 94 Bias 116
The Factors May be Different 94 Chance 117
Clinical Course and Natural History Predictive Value 117
of Disease 94 Definitions 117
Elements of Prognostic Studies 95 Determinants of Predictive Value 118
Patient Sample 95 Estimating Prevalence (Pretest Probability) 119
Zero Time 96 Increasing the Pretest Probability of Disease 120
Follow-Up 96 Specifics of the Clinical Situation 120
Outcomes of Disease 96 Selected Demographic Groups 120
Referral Process 120
Describing Prognosis 97 Implications for Interpreting the Medical
A Trade-Off: Simplicity versus More Literature 122
Information 97
Survival Analysis 97 Likelihood Ratios 122
Survival of a Cohort 97 Odds 122
Survival Curves 98 Definitions 122
Interpreting Survival Curves 100 Use of Likelihood Ratios 122
Why Use Likelihood Ratios? 123
Identifying Prognostic Factors 100 Calculating Likelihood Ratios 124
Case Series 101 Multiple Tests 125
Clinical Prediction Rules 102 Parallel Testing 126
Clinical Prediction Rules 127
Bias in Cohort Studies 102 Serial Testing 128
Sampling Bias 103 Serial Likelihood Ratios 128
Migration Bias 103 Assumption of Independence 129
Measurement Bias 104
Bias from “Non-differential” Misclassification 104
Bias, Perhaps, but does it Matter? 104 CHAPTER 9: TREATMENT 132
Sensitivity Analysis 104 Ideas and Evidence 132
Ideas 132
Testing Ideas 133
CHAPTER 8: DIAGNOSIS 108
Studies of Treatment Effects 134
Observational and Experimental Studies of
Treatment Effects 134
Simplifying Data 108
Randomized Controlled Trials 134
The Accuracy of a Test Result 109 Ethics 135
The Gold Standard 109 Sampling 135
Lack of Information on Negative Tests 110 Intervention 136
Lack of Information on Test Results in the Comparison Groups 138
Nondiseased 110 Allocating Treatment 139
Lack of Objective Standards for Disease 110 Differences Arising after Randomization 139
Consequences of Imperfect Gold Standards 111 Patients May Not Have the Disease Being
Sensitivity and Specificity 111 Studied 140
Definitions 113 Compliance 140
Use of Sensitive Tests 113 Cross-over 141
Use of Specific Tests 113 Cointerventions 141
x Conten
Introduction
We should study “the benefits of medical interventions in relation to their hazards
and costs.”
—Kerr L. White
1992
KEY WORDS
Clinical epidemiology and sometimes at rest. He gave up smoking
Dependent variable one pack of cigarettes per day 3 years ago
Clinical sciences
Extraneous variables and has been told that his blood pressure
Population sciences
Covariates is “a little high.” He is otherwise well and
Epidemiology
Populations takes no medications, but he is worried
Evidence-based
Sample about his health, particularly about heart
medicine
Inference disease. He lost his job 6 months ago and
Health services
Bias has no health in- surance. A complete
research
Selection bias physical examination and resting
Quantitative decision
Measurement bias electrocardiogram are normal except for a
making
Confounding blood pressure of 150/96 mm Hg.
Cost-effectiveness
Chance
analyses
Random variation
Decision analyses
Internal validity This patient is likely to have many questions. Am
Social sciences
External validity I sick? How sure are you? If I am sick, what is
Biologic sciences
Generalizability causing my illness? How will it affect me? What can
Variables
Shared decision be done about it? How much will it cost?
Independent variable
making As the clinician caring for this patient, you
have the same kinds of questions, although yours
reflect greater understanding of the possibilities. Is
the probability of serious, treatable disease high
Example enough to proceed immediately beyond simple
explanation and reassurance to diagnostic tests?
How well do various tests distinguish among the
possible causes of chest pain: angina pectoris,
A 51-year-old man asks to see you because of chest pain that he thinks is “indigestion.” He was well until 2 weeks ag
esophageal spasm, muscle strain, anxiety, and the
like. For example, how accurately will an exercise
stress test be in either confirming or ruling out
coronary artery disease? If coronary artery disease
is found, how long can the patient expect to have
the pain? How likely is it that other complications
—congestive heart failure, myo- cardial infarction, or
atherosclerotic disease of other organs—will occur?
Will the condition shorten his
1
2 Clinical Epidemiology: The
life? Will reduction of his risk factors for coronary Table 1.1
artery disease (from cigarette smoking and hyperten- Clinical Issues and Questionsa
sion) reduce his risk? Should other possible risk fac-
tors be sought? If medications control the pain, Issue Question
would a coronary revascularization procedure add Frequency (Ch. 2) How often does a disease occur?
benefit— by preventing a future heart attack or
Abnormality (Ch. 3) Is the patient sick or well?
cardiovascular death? Since the patient is
unemployed and without health insurance, can less Risk (Chs. 5 and 6) What factors are associated
with an increased risk of
expensive diagnostic work- ups and treatments
disease?
achieve the same result as more expensive ones?
Prognosis (Ch. 7) What are the consequences of
having a disease?
Clinical Questions and
Clinical Epidemiology Diagnosis (Ch. 8) How accurate are tests used
to diagnose disease?
The questions confronting the patient and doctor Treatment (Ch. 9) How does treatment change the
in the example are the types of clinical questions at course of disease?
issue in most doctor–patient encounters: What is Prevention (Ch. 10) Does an intervention on well
“abnor- mal”? How accurate are the diagnostic tests people keep disease from
we use? How often does the condition occur? arising? Does early detection and
What are the risks for a given disease, and how do treatment improve the course of
we determine the risks? Does the medical condition disease?
usually get worse, stay the same, or resolve Cause (Ch. 12) What conditions lead to disease?
(prognosis)? Does treatment really improve the What are the origins of the
patient or just the test results? Is there a way to disease?
prevent the disease? What is the under- lying cause a
Four chapters—Risk: Basic Principles (4), Chance (11), Systematic
of the disease or condition? and How can we give Reviews (13), and Knowledge Management (14)—pertain to all of
good medical care most efficiently? These clinical these issues.
questions and the epidemiologic methods to answer
them are the bedrock of this book. The clini- cal
cultures, cell membranes, and genetic sequences) or
questions are summarized in Table 1.1. Each is
in animals. Clinical epidemiology is the science used
also the topic of specific chapters in the book.
Clinicians need the best possible answers to these to study the 5 Ds in intact humans.
kinds of questions. They use various sources of infor- In modern clinical medicine, with so much order-
mation: their own experiences, the advice of their ing and treating of lab test results (for such things
colleagues, and reasoning from their knowledge of as plasma glucose levels, hematuria, troponins, etc.),
the biology of disease. In many situations, the it is difficult to remember that laboratory test
most credible source is clinical research, which results are not the important events in clinical
involves the use of past observations on other similar medicine. It
patients to predict what will happen to the patient
at hand. The manner in which such observations Table 1.2
are made and interpreted determines whether the Outcomes of Disease (the 5 Ds)a
conclusions reached are valid, and thus how
helpful the conclu- sions will be to patients. Death A bad outcome if untimely
Diseaseb A set of symptoms, physical signs, and
Health Outcomes laboratory abnormalities
The most important events in clinical medicine are Discomfort Symptoms such as pain, nausea,
the health outcomes of patients, such as symptoms dyspnea, itching, and tinnitus
(discomfort and/or dissatisfaction), disability, disease, Disability Impaired ability to go about usual
and death. These patient-centered outcomes are activities at home, work, or recreation
some- times referred to as “the 5 Ds” (Table 1.2). Dissatisfaction Emotional reaction to disease and
They are the health events patients care about. its care, such as sadness or anger
Doctors should try to understand, predict, interpret, a
Perhaps a sixth D, destitution, belongs on this list because
and change these outcomes when caring for the financial cost of illness (for individual patients or society) is
patients. The 5 Ds can be studied directly only in an important consequence of disease.
b
Or illness, the patient’s experience of disease.
intact humans and not in parts of humans (e.g.,
humeral transmitters, tissue
Chapter 1: Introduction 3
becomes easy to assume that if we can change abnor- aggressively lowering levels of blood sugar does not
mal lab tests toward normal, we have helped the protect against heart disease.) Establishing improved
patient. This is true only to the extent that careful health outcomes in patients is particularly impor-
study has demonstrated a link between laboratory test tant with new drugs because usually pharmacologic
results and one of the 5 Ds. interventions have several clinical effects rather than
just one.
available research
to obtaining the kind of information clinicians
need to make good decisions in the care of
patients.
The term “clinical epidemiology” is derived from
its two parent disciplines: clinical medicine and
epide- miology. It is “clinical” because it seeks to
answer clin- ical questions and to guide clinical
decision making with the best available evidence. It
is “epidemiology” because many of the methods
used to answer ques- tions about how to best care
for patients have been developed by epidemiologists
and because the care of individual patients is seen in
the context of the larger population of which the
patient is a member.
Clinical sciences provide the questions and
approach that can be used to care for individual
patients. Some biologic sciences, such as anatomy
and physiology, are “clinical” to the extent that
they provide sound information to guide clinical
deci- sions. For example, knowing the anatomy of
the body helps determine possibilities for diagnosis
and treat- ment of many symptoms.
The population sciences study large groups of
people. Epidemiology is the “study of disease occur-
rence in human populations” (4) by counting health-
related events in people in relation to the naturally
occurring groups (populations) of which they are
members. The results of many such studies are
directly applicable to the care of individual
patients. For example, epidemiology studies are
used as the basis for advice about avoiding behaviors
such as smoking and inactivity that place patients
at increased risk. Other epidemiologic studies, such
as those showing harmful effects of passive
smoking and other envi- ronmental and
occupational hazards, are the basis for public health
recommendations. Clinical epidemiol- ogy is a
subset of the population sciences useful in the care
of patients.
Clinicians have long depended on research evi-
dence to some extent, but understanding clinical
evidence is more important in modern times than
it was in the past for several reasons. An extraor-
dinary amount of information must be sorted
through. Diagnostic and therapeutic interventions
have the potential for great effectiveness, as well as
risk and cost, so the stakes in choosing among them
are high. Clinical research at its best has become
stronger and, thus, can be a sounder basis for clini-
cal decisions. Nevertheless, the credibility of clini-
cal research continues to vary from study to study,
so clinicians need to have a method for sorting out
strong from weak evidence.
Evidence-based medicine is a modern term for
the application of clinical epidemiology to the care of
patients. It includes formulating specific
“answerable” clinical questions, finding the best
Chapter 1: Introduction 5
beliefs and patients’ cooperation) affect
Table 1.3
Factors Other Than Evidence-Based
Medicine That May Influence
Clinical Decisions
Example
In the 1980s, clinicians in San Francisco noticed unusual infect
8 Clinical Epidemiology: The
BASIC PRINCIPLES
within 30 days of the procedure, as opposed to
The purpose of clinical epidemiology is to foster 40% to 80% when emergency repair is necessary.
methods of clinical observation and interpretation
that lead to valid conclusions and better patient care. Populations and Samples
The most credible answers to clinical questions are
based on a few basic principles. Two of these— Populations are all people in a defined setting (such
that observations should address questions facing as North Carolina) or with certain defined
patients and clinicians, and results should include characteristics (such as being age 65 years or
patient- centered health outcomes (the 5 Ds)— having a thyroid nod- ule). Unselected people in the
have already been covered. Other basic principles community are the usual population for
are discussed below. epidemiologic studies of cause. On the other hand,
clinical populations include all patients with a clinical
Variables characteristic such as all those with community-
acquired pneumonia or aortic stenosis. Thus, one
Researchers call the attributes of patients and clinical speaks of the general population, a hospitalized
events variables—things that vary and can be population, or a population of patients with a specific
mea- sured. In a typical study, there are three main disease.
kinds of variables. One is a purported cause or Clinical research is ordinarily carried out on a
predictor variable, sometimes called the sam- ple or subset of people in a defined
independent vari- able. Another is the possible population. One is interested in the characteristics of
effect or outcome vari- able, sometimes called the the defined popu- lation but must, for practical
dependent variable. Still, other variables may be reasons, estimate them by describing the
part of the system under study and may affect the characteristics of people in a sample (Fig. 1.2). One
relationship between the indepen- dent and then makes an inference, a reasoned judgment
dependant variables. These are called extra- neous based on data, that the characteristics of the sample
variables (or covariates) because they are resemble those of the parent population.
extraneous to the main question, though perhaps very The extent to which a sample represents its
much a part of the phenomenon under study. popula- tion, and thus is a fair substitute for it,
depends on how the sample was selected. Methods
Numbers and Probability in which every member of the population has an
Clinical science, like all other sciences, depends on equal (or known) chance of being selected can
quantitative measurements. Impressions, instincts, produce samples that are extraordinarily similar to
and beliefs are important in medicine too, but only the parent population, at least in the long run and for
when added to a solid foundation of numerical infor- large samples. An everyday example is opinion polls
mation. This foundation allows better using household sampling based on census data. In
confirmation, more precise communication among our own clinical research, we often use a computer to
clinicians and between clinicians and patients, and select a representative sample from all patients in
estimation of error. Clinical outcomes, such as our large, multispecialty group practice, each of
occurrence of dis- ease, death, symptoms, or which has the same chance of being selected. On the
disability, can be counted and expressed as other hand, samples taken haphaz- ardly or for
numbers. convenience (i.e., by selecting patients who are
In most clinical situations, the diagnosis, prog- easy to work with or happen to be visiting the clinic
nosis, and results of treatment are uncertain for an when data are being collected) may misrepresent their
individual patient. An individual will either experi- parent population and be misleading.
ence a clinical outcome or will not, and
predictions can seldom be so exact. Therefore, a
prediction must be expressed as a probability. The
probability for an individual patient is best
estimated by referring to past experience with
groups of similar patients—for example, that SAMPLING SAMPLE
cigarette smoking more than doubles the risk of
dying among middle-aged adults, that blood tests
for troponins detect about 99% of myo- cardial INFERENCE
Selection Bias
Selection bias occurs when comparisons are
made between groups of patients that differ in
ways other than the main factors under study,
ones that affect the outcome of the study. Groups
of patients often differ in many ways—age, sex,
severity of disease, the presence of other diseases,
the care they receive, and so on. If one compares
the experience of two groups that differ on a
specific characteristic of interest (e.g., a treatment
or a suspected cause of disease) but are dissimilar
in these other ways and the differences are
themselves related to outcome, the comparison is
biased and little can be concluded about the inde-
pendent effects of the characteristic of interest. In
the herniorrhaphy example, selection bias would
have occurred if patients receiving the laparoscopic
proce- dure were healthier than those who had open
surgery.
1 Clinical Epidemiology: The
Measurement Bias 30
Example
Doctor
Blood pressure levels are powerful predictors of cardiovascular disease. However, multiple stud- ies have shown that taking
10
Nurse
0
0 5 10
Duration of visit (minutes)
Figure 1.3 ■ White coat hypertension. Increase in systolic
pressure, determined by continuous intraarterial monitor-
ing, as the blood pressure is taken with a sphygmoma-
nometer by an unfamiliar doctor or nurse. (Redrawn with
permission from Mancia G, Parati G, Pomidossi G, et al.
Alerting reaction and rise in blood pressure during mea-
surement by physician and nurse. Hypertension 1987;9:
209–215.)
Example
Supplements of antioxidants, such as vitamins A, C, and E, are po
Confounding
Confounding can occur when one is trying to find
out whether a factor, such as a behavior or drug
expo- sure, is a cause of disease in and of itself. If
the fac- tor of interest is associated or “travels
together” with another factor, which is itself related
to the outcome, the effect of the factor under study
can be confused with or distorted by the effect of
the other.
Chapter 1: Introduction 13
ANTIOXIDANTS CARDIOVASCULAR
MAIN QUESTION
INTAKE DISEASE
PREVENTION
Age Aspirin
use
Physical activity
POTENTIALLY
CONFOUNDING Body mass index
FACTORS Cigarette smoking
Family history
Diet
Figure 1.4 ■ Confounding. The relationship between antioxidant intake and cardio-
vascular risk is potentially confounded by patient characteristics and behaviors related to
both antioxidant use and development of cardiovascular disease.
Chance
amount of pain caused, each followed by pain in
10% of patients. Because of chance alone, a single
study with small numbers of patients in each Bias
treatment group might easily find that patients do
better with laparoscopy
than with open surgery (or vice versa). the sampling of patients for the study, the selection of
Chance can affect all the steps involved in clinical
observations. In the assessment of the two ways of
repairing inguinal hernia, random variation occurs in
80 90
Chapter 1: Introduction 15
Diastolic blood pressure (mm Hg)
Figure 1.5 ■ Bias and chance. True blood
pressure by intra-arterial cannula and clinical
measurement by sphygmo- manometer.
1 Clinical Epidemiology: The
an earlier example, the simpler instrument is prone to All patients with the INTERNAL
error or deviations from the true value. In the figure, condition of interest VALIDITY
the error is represented by all of the
sphygmomanometer readings falling to the right of
Sampling
the true value. The devia- tion of sphygmomanometer SAMPLE SAMPLE
readings to higher values (bias) may have several
explanations (e.g., the wrong cuff size, patient
Selection
anxiety, or “white coat hypertension”). Individual bias
blood pressure readings are also subject to error
because of random variation in measurement, as
illustrated by the spread of the sphygmomanometer
Measurement and
readings around the mean value (90 mm Hg). confounding bias
The main reason for distinguishing between bias ?
and chance is that they are handled differently. In ?? Chance
theory, bias can be prevented by conducting clini-
cal investigations properly or can be corrected dur- EXTERNAL CONCLUSIO
ing data analysis. If not eliminated, bias often can be VALIDITY
detected by the discerning reader. Most of this (generalizability)
book is about how to recognize, avoid, or
minimize bias.
Chance, on the other hand, cannot be eliminated, they apply to my patients as well?” Generalizability
but its influence can be reduced by proper design expresses the
of research, and the remaining effect can be
estimated by statistics. No amount of statistical
treatment can correct for unknown biases in data.
Some statisticians would go so far as to suggest that
statistics should not be applied to data that are
vulnerable to bias because of poor research design, for
fear of giving false respect- ability to fundamentally
misleading work.
Example
16
15
a great deal more as well, including value judgments
and weighing competing risks and benefits.
14
In recent years, medical decision making has
12
become a valued discipline in its own right. The
field includes qualitative studies of how clinicians
make decisions and how the process might be biased
3-Year mortality
10
and can be improved. It also includes quantitative
8
7 methods such as decision analysis, cost-benefit anal-
ysis, and cost-effectiveness analysis that present the
6
decision-making process in an explicit way so that its
4 components and the consequences of assigning vari-
ous probabilities and values to them can be
2 examined. Patients and clinicians make clinical
decisions. At best, they make decisions together, a
42 Clinic-based Population-based
process called shared decision making, recognizing
studies study that their exper- tise is complementary. Patients are
experts in what they
Figure 1.7 ■ Sampling bias. Thirty-year mortality from hope to achieve from medical care, given their unique
all causes in patients with anorexia nervosa. Comparison experiences and preferences. They may have found a
of a synthesis of 42 published studies, mainly from referral lot of information about their condition (e.g., from
centers, and a study of all patients with anorexia in the popu- the Internet) but are not grounded in how to sort
lation. (Data from Sullivan PF. Mortality in anorexia nervosa.
out credible from fallacious claims. Doctors are
Am J Psychiatry 1995;152:1073–1074; and Korndorter
SR, Lucan AR, Suman VJ, et al. Long-term survival of
experts in whether and how likely patients’ goals can
patients with anorexia nervosa: a population-based study in be achieved and how to achieve them. For this, they
Rochester, Minn. Mayo Clin Proc 2003;78:278–284.) depend on the body of research evidence and the
ability, based on the principles of clinical
epidemiology, to distinguish stron- ger from weaker
The generalizability of clinical observations, even evidence. Of course, clinicians also bring to the
those with high internal validity, is a matter of per- encounter experience in how disease pres- ents and
sonal judgment about which reasonable people might the human consequences of care, such as what it is like
disagree. A situation often occurs when clinicians to be intubated or to have an amputation, with which
must decide whether to use the results of a well-done patients may have little experience. For clinicians to
study for a patient who is older than those in the play their part on this team, they need to be experts in
study, a different gender, or sicker. It might be that the interpretation of clinically relevant information.
a treatment that works well in young healthy men Patients’ preferences and sound evidence are the
does more harm than good in older, sicker women. basis for choosing among care options. For example,
Generalizability can rarely be dealt with a patient with valvular heart disease may prefer the
satisfacto- rily in any one study. Even a defined, pos- sibility of long-term good health that surgery
geographically based population is a biased sample offers, even though surgery is associated with
of other popula- tions. For example, hospital discomfort and risk of death in the short term. A
patients are biased sam- ples of county residents, clinician armed with critical reading and
counties of states, states of regions, and so on. The communication skills can help the patient understand
best a researcher can do about generalizability is to how big those potential benefits and risks are and
ensure internal validity, have the study population fit how surely they have been established.
the research question, describe the study patients Some aspects of decision analysis, such as evaluation
carefully, and avoid studying patients who are so of diagnostic tests, are included in this book.
unusual that experience with them gener- alizes to However, we have elected not to go deeply into
few others. It then remains for other studies, in other medical decision making itself. Our reason is that
settings, to extend generalizability. decisions are only as good as the information used
to make them, and we have found enough to say
INFORMATION AND DECISIONS about the essentials of collect- ing and interpreting
clinical information to fill a book.
The primary concerns of this book are the quality
of clinical information and its correct ORGANIZATION OF THIS BOOK
interpretation. Making decisions is another matter.
True, good deci- sions depend on good In most textbooks on clinical medicine,
information, but they involve information about each disease is presented as
Chapter 1: Introduction 19
answers to traditional clinical questions:
diagnosis, clinical course, treatment,
2 Clinical Epidemiology: The
and the like. However, most epidemiology books We have organized this book primarily according
are organized around research strategies such as to the questions clinicians encounter when caring for
clinical trials, surveys, case-control studies, and the patients (Table 1.1). Figure 1.8 illustrates how
like. This way of organizing a book may serve those these questions correspond to the book’s chapters,
who perform clinical research, but it is often taking HIV infection as an example. The questions
awkward for clinicians. relate to
Population
at risk
Infection FrequencyPg. 17
Abnormality Pg. 31
DiagnosisPg. 108
PreventionPg. 152
Outcomes PrognosisPg. 93
Death
Sick with AIDS Well
Figure 1.8 ■ Organization of this book in relation to the natural history of human immunode-
ficiency virus (HIV) infection. Chapters 11, 13, and 14 describe cross-cutting issues related to all
points in the natural history of disease.
Chapter 1: Introduction 21
the entire natural history of disease, from the time Some strategies, such as cohort studies, are use-
people without HIV infection are first exposed to ful for answering several different kinds of clinical
risk, to when some acquire the disease and emerge as questions. For the purposes of presentation, we
patients, through complications of the disease, AIDS- have discussed each strategy primarily in one chap-
defining illness, to survival or death. ter and have simply referred to the discussion
In each chapter, we describe research strategies when the method is relevant to other questions in
used to answer that chapter’s clinical questions. other chapters.
Revie w Question s
Questions 1.1–1.6 are based on the following 1.3. Fewer patients who did not have surgery
clinical scenario. remained under care at the clinic 2
months after surgery.
A 37-year-old-woman with low back pain for the
past 4 weeks wants to know if you recommend A. Selection bias
surgery. B. Measurement bias
You prefer to base your treatment recommendations C. Confounding
on research evidence whenever possible. In the stron- D. Chance
gest study you can find, investigators reviewed the E. External validity (generalizability)
medical records of 40 consecutive men with low
back pain under care at their clinic—22 had been 1.4. The patients who were referred for
referred for surgery, and the other 18 patients had surgery were younger and fitter than
remained under medical care without surgery. The those who remained under medical care.
study compared rates of disabling pain after 2 A. Selection bias
months. B. Measurement bias
All of the surgically treated patients and 10 of the C. Confounding
medically treated patients were still being seen in D. Chance
the clinic throughout this time. Rates of pain relief E. External validity (generalizability)
were slightly higher in the surgically treated
patients. 1.5. Compared with patients who had medical
care alone, patients who had surgery might
For each of the following statements, have been less likely to report whatever pain
circle the one response that best they had and the treating physicians might
represents the corresponding threat to have been less inclined to record pain in the
validity. medical record.
1.1. Because there are relatively few A. Selection bias
patients in this study, it may give a B. Measurement bias
misleading impression of the actual C. Confounding
effectiveness of surgery. D. Chance
E. External validity (generalizability)
A. Selection bias
B. Measurement bias 1.6. Patients without other medical conditions
C. Confounding were both more likely to recover and more
D. Chance likely to be referred for surgery.
E. External validity (generalizability)
A. Selection bias
1.2. The results of this study may not apply B. Measurement bias
to your patient, a woman, because all the C. Confounding
patients in the study were men. D. Chance
E. External validity (generalizability)
A. Selection bias
B. Measurement bias
C. Confounding
D. Chance
E. External validity (generalizability)
2 Clinical Epidemiology: The
For questions 1.7–1.11, select the best answer. the rates of subsequent coronary events were
compared in employees who volunteered
1.7. Histamine is a mediator of inflammation in for the program and those who did not
patients with allergic rhinitis (“hay fever”). volunteer. The development of CHD was
Based on this fact, which of the following is determined by means of regular voluntary
true? checkups, including a careful history, an
A. Drugs that block the effects of histamines electrocardiogram, and a review of routine
will relieve symptoms. health records. Surprisingly, the members of
B. A fall in histamine levels in the nose is the exercise group developed higher rates of
a reliable marker of clinical success. CHD even though fewer of them smoked
C. Antihistamines may be effective, and cigarettes. This result is least likely to be
their effects on symptoms (e.g., itchy explained by which of the following?
nose, sneezing, and congestion) should be A. The volunteers were at higher risk
studied in patients with allergic rhinitis. for developing CHD than those not
D. Other mediators are not important. volunteering before the study began.
E. If laboratory studies of disease are B. The volunteers did not actually increase
convincing, clinical research is their exercise and the amount of exercise
unnecessary. was the same in the two groups.
C. Volunteers got more check-ups, and silent
1.8. Which of the following statements about myocardial infarctions were, therefore,
samples of populations is incorrect? more likely to have been diagnosed in the
A. Samples of a populations may have exercise group.
characteristics that differ from the
1.11. Ventricular premature depolarizations are
population even though correct sampling
associated with an increased risk of sudden
procedures were followed.
death from a fatal arrhythmia, especially in
B. Samples of populations are the only
people with other evidence of heart disease.
feasible way of studying the population.
You have read there is a new drug for
C. When populations are correctly sampled,
ventricular premature depolarizations.
external validity is ensured.
D. Samples of populations should be selected in What is the most important thing you
a way that every member of the population would like to know about the drug before
has an equal chance of being chosen. prescribing it to a patient?
A. The drug’s mechanism of action.
1.9. You are making a treatment decision with a B. How well the drug prevents ventricular
72-year-old man with colon cancer. You are premature depolarizations in people using
aware of several good studies that have the drug compared to those who do not
shown that a certain drug combination use the drug.
prolongs the life of patients with colon C. The rate of sudden death in similar people
cancer. However, all the patients in these who do and do not take the drug.
studies were much younger. Which of the
statements below is correct?
Questions 1.12–1.15 are based on the following
A. Given these studies, the decision about clinical scenario.
this treatment is a matter of personal
judgment. Because reports suggested estrogens increase the risk
B. Relying on these studies for your patient of clotting, a study compared the frequency of oral
is called internal validity. contraceptive use among women admitted to a hos-
C. The results in these studies are affected pital with thrombophlebitis and a group of women
by chance but not bias. admitted for other reasons. Medical records were
reviewed for indication of oral contraceptive use in
1.10. A study was done to determine whether the two groups. Women with thrombophlebitis were
regular exercise lowers the risk of found to have been using oral contraceptives more
coronary heart disease (CHD). An fre- quently than the women admitted for other
exercise program was offered to reasons.
employees of a factory, and
Chapter 1: Introduction 23
1.12. Women with thrombophlebitis may 1.14. The number of women in the study was small.
have reported the use of contraceptives
more completely than women without A. Selection bias
thrombophlebitis because they remembered B. Measurement bias
hearing of the association. C. Confounding
D. Chance
A. Selection bias E. External validity (generalizability)
B. Measurement bias
C. Confounding 1.15. The women with thrombophlebitis were
D. Chance admitted to the hospital by doctors
E. External validity (generalizability) working in different neighborhoods than
the physicians of those that did not
1.13. Doctors may have questioned women with have thrombophlebitis.
thrombophlebitis more carefully about
contraceptive use than they did those A. Selection bias
without thrombophlebitis (and recorded the B. Measurement bias
information more carefully in the medical C. Confounding
record) because they were aware that D. Chance
estrogen could cause clotting. E. External validity (generalizability)
REFERENCES
1. Home PD, Pocock SJ, Beck-Nielsen H, et al. Rosiglitazone
11. Sackett DL. Bias in analytic research. J Chronic Dis 1979;32:
evaluated for cardiovascular outcomes in oral agent combina-
51–63.
tion therapy for type 2 diabetes (RECORD): a multicentre,
12. Pickering TG, Hall JE, Appel LJ, et al. Recommendations
randomized, open-label trial. Lancet 2009;373:2125–2135.
for blood pressure in humans and experimental animals. Part
2. Lipscombe LL, Gomes T, Levesque LE, et al. Thiazolidinedio-
1: Blood pressure measurement in humans. A statement for
nes and cardiovascular outcomes in older patients with diabe-
professionals from the Subcommittee of Professional and
tes. JAMA 2007;298:2634–2643.
Public Education of the American Heart Association
3. Nissen SE, Wolski K. Effect of rosiglitazone on the risk of
Coun- cil on High Blood Pressure Research. Circulation
myocardial infarction and death from cardiovascular causes.
2005;111: 697–716.
N Engl J Med 2007;356:2457–2471.
13. Bjelakovic G, Nikolova D, Gluud LL, et al. Mortality in
4. Friedman GD. Primer of Epidemiology, 5th ed. New York:
random- ized trials of antioxidant supplements for primary and
Appleton and Lange; 2004.
secondary prevention: systematic review and meta-analysis.
5. Straus SE, Richardson WS, Glasziou P, et al. Evidence-Based
JAMA 2007; 297(8):842–857.
Medicine: How to Practice and Teach EBM, 4th ed. New
14. Vevekananthan DP, Penn MS, Sapp SK, et al. Use of anti-
York: Churchill Livingstone; 2011.
oxidant vitamins for the prevention of cardiovascular dis-
6. Stuebe AM. Level IV evidence—adverse anecdote and
ease: meta-analysis of randomized trials. Lancet 2003;361:
clinical practice. N Engl J Med 2011;365(1):8–9.
2017–2023.
7. Murphy EA. The Logic of Medicine. Baltimore: Johns
15. Norman RJ, Nisenblat V. The effects of caffeine on
Hop- kins University Press; 1976.
fertility and on pregnancy outcomes. In: Basow DS, ed.
8. Porta M. A Dictionary of Epidemiology, 5th ed. New
UpToDate. Waltham, MA: UpToDate; 2011.
York: Oxford University Press; 2008.
16. Savitz DA, Chan RL, Herring AH, et al. Caffeine and miscar-
9. McCormack K, Scott N, Go PM, et al. Laparoscopic
riage risk. Epidemiology 2008;19:55–62.
techniques versus open techniques for inguinal hernia repair.
17. Sullivan PF. Mortality in anorexia nervosa. Am J Psychiatry
Cochrane Database Systematic Review 2003;1:CD001785.
1995;152:1073–1074.
Publication History: Edited (no change to conclusions) 8
18. Korndorfer SR, Lucas AR, Suman VJ, et al. Long-term
Oct 2008.
survival of patients with anorexia nervosa: a population-
10. Neumayer L, Giobbie-Hurder A, Jonasson O, et al. Open
based study in Rochester, Minn. Mayo Clin Proc 2003;78:
mesh versus laparoscopic mesh repair of inguinal hernia.
278–284.
N Eng J Med 2004;350:1819–1827.
2 Clinical Epidemiology: The
Chapter 2
Frequency
Here, it is necessary to count.
—P.C.A. Louis†
1787–1872a
KEY WORDS
Numerator
Cohort studies Example
A 72-year-old man presents with slowly pro-
Denominator
Cumulative gressive urinary frequency, hesitancy, and drib-
Prevalence
incidence bling. A digital rectal examination reveals a
Point prevalence
Incidence density symmetrically enlarged prostate gland and no
Period prevalence
Person-time nodules. Urinary flow measurements show a
Incidence
Dynamic population reduction in flow rate, and his serum
Duration of disease
Population at risk prostate- specific antigen (PSA) is not
Case fatality rate
Random sample elevated. The cli- nician diagnoses benign
Survival rate
Probability sample prostatic hyperplasia (BPH). In deciding on
Complication rate
Sampling fraction treatment, the clinician and patient must
Infant mortality rate
Oversample weigh the benefits and haz- ards of various
Perinatal mortality
Convenience samples therapeutic options. To simplify, let us say the
rate
Grab samples options are medical therapy with drugs or
Prevalence studies
Epidemic surgery. The patient might choose medical
Cross-sectional studies
Pandemic treatment but runs the risk of worsening
Surveys
Epidemic curve symptoms or obstructive renal disease
Cohort
Endemic because the treatment is less immediately
effective than surgery. Or he might choose
surgery, gaining
Chapter 1 outlined the questions that clinicians need immediate relief of symptoms but at the risk
to answer as they care for patients. Answers are usu- of operative mortality and long-term urinary
ally in the form of probabilities and only rarely as incontinence and impotence.
cer- tainties. Frequencies obtained from clinical
research are the basis for probability estimates for the
purposes of patient care. This chapter describes basic
expres- sions of frequency, how they are obtained Decisions such as the one this patient and
from clini- cal research, and how to recognize clinician face have traditionally relied on clinical
threats to their validity. judgment based on experience at the bedside and in
the clinics. In modern times, clinical research has
become suffi- ciently strong and extensive that it is
possible to ground clinical judgment in research-
†
A 19th Century physician and proponent of the “numerical based probabilities— frequencies. Probabilities of
method” (relying on counts, not impressions) to understand the disease, improvement, deterioration, cure, side
natural history of diseases such as typhoid fever.
effects, and death are the basis for answering most
clinical questions. For this
17
1 Clinical Epidemiology: The
patient, sound clinical decision making requires event could have occurred (population). The two
accu- rate estimates of how his symptoms and basic measures of frequency are prevalence and
complica- tions of treatment will change over time incidence.
according to which treatment is chosen.
Prevalence
ARE WORDS SUITABLE Prevalence is the fraction (proportion or percent) of
SUBSTITUTES FOR NUMBERS? a group of people possessing a clinical condition
or outcome at a given point in time. Prevalence is
Clinicians often communicate probabilities as words
mea- sured by surveying a defined population and
(e.g., usually, sometimes, rarely) rather than as num-
counting the number of people with and without the
bers. Substituting words for numbers is convenient
condition of interest. Point prevalence is
and avoids making a precise statement when one
measured at a sin- gle point in time for each
is uncertain about a probability. However, words
person (although actual measurements need not
are a poor substitute for numbers because there is
necessarily be made at the same point in calendar
little agreement about the meanings of commonly
time for all the people in the population). Period
used adjectives describing probabilities.
prevalence describes cases that were present at any
time during a specified period of time.
Example Incidence
Incidence is the fraction or proportion of a group
of people initially
Physicians were asked to assign percentage val- ues to 13 expressions free of the
of probability (1).outcome of interestgenerally
These physicians that agreed o
devel- ops the condition over a given period of time.
Incidence refers then to new cases of disease occurring
in a popula- tion initially free of the disease or new
outcomes such as symptoms or complications
occurring in patients with a disease who are initially
free of these problems.
Figure 2.1 illustrates the differences between inci-
dence and prevalence. It shows the occurrence of
Table 2.1
Characteristics of Incidence and Prevalence
study would, therefore, miss nearly all these events Similarly, the prevalence of prostate cancer on
and underestimate the true burden of coronary autopsy is so much higher than its incidence that
heart disease in the community. In contrast, the majority of these cancers must never become
diseases of long duration are well represented in symp- tomatic enough to be diagnosed during life.
prevalence sur- veys, even when their incidence is
low. The incidence of inflammatory bowel disease in
North America is only about 2 to 14 per SOME OTHER RATES
100,000/year, but its preva- lence is much higher,
37 to 246/100,000, reflecting the chronic nature of Table 2.2 summarizes some rates used in health care.
the disease (4). Most of them are expressions of events over time.
The relationship among incidence, prevalence For example, a case fatality rate (or alternatively,
and duration of disease in a steady state, in which the survival rate) is the proportion of people
none of the variables is changing much over time, having a disease who die of it (or who survive it).
is approximated by the following expression: For acute dis- eases such as Ebola virus infection,
follow-up time may be implicit, assuming that
Prevalence Incidence Average deaths are counted over a long enough period of
duration of the disease time (in this case, a few weeks) to account for all of
Alternatively, them that might have occurred. For chronic
diseases such as cardiovascular disease or cancer, it
Prevalence/Incidence Duration is more usual to specify the period of obser- vation
(e.g., the 5-year survival rate). Similarly, com-
Example
The incidence and prevalence of ulcerative colitis were measured in Olmstead County, Minnesota, from 1984 to 1993 (5). In
Table 2.2
Some Commonly Used Rates
Population at risk
No
Sample
Yes
STUDIES OF PREVALENCE
AND INCIDENCE period prevalence but a good estimate of
point prevalence because of the narrow
Prevalence and incidence are measured by entirely time win- dow) ranged from a high of 4.6%
different kinds of studies. in the United States to a low of 0.9% in
Japan. Period preva- lence was higher; for
Prevalence Studies example, in the United States, the 12-
In prevalence studies, people in a population are month prevalence was 10.0% and the
examined for the presence of the condition of lifetime prevalence was 16.9%. The authors
interest. Some members of the population have the concluded that “major depressive episodes
condition at that point in time, whereas others do not are a commonly occurring disorder that
(Fig. 2.2). The fraction or proportion of the usually has a chronic-intermittent course”
population that has the condition (i.e., cases)
constitutes the prevalence
of the disease. Incidence Studies
Another term for prevalence studies is cross-
The population under examination in an incidence
sectional studies because people are studied at a
study is a cohort, which is defined as a group of
“cross-section” of time. Prevalence studies are also
peo- ple having something in common when they
called surveys if the main measurement is a
are first assembled and are then followed over time
questionnaire.
for the devel- opment of outcome events. For this
The following is an example of a typical prevalence
reason, incidence studies are also called cohort
study.
studies. A sample of people free of the outcome of
interest is identified and observed over time to see
Example whether an outcome event occurs. Members of the
The World Health Organization created a re- search consortium to study
cohort may the cross-national
be healthy prevalence
at first and of depression.
then followed
forward in time for the emergence of disease—for
example, from being cancer-free until the onset (or
not) of pancreatic cancer. Or, all of them may have a
recently diagnosed disease (such as pancreatic
cancer) and then be followed forward in time to
out- comes such as recurrence or death. Incidence
studies will be discussed in greater detail in Chapters
5 and 7.
Cumulative Incidence
To this point, the term “incidence” has been used to
describe the rate of new events in a group of
people of fixed size, all members of which are
observed over
2 Clinical Epidemiology: The
Example
Community
A study of the incidence of herpes zoster in- fections (“shingles”) and its complications pro- vides and example of both incid
Population
Die
Move out
Figure 2.3 ■ A dynamic population.
Chapter 2: Frequency 25
BASIC ELEMENTS OF
FREQUENCY STUDIES
To make sense of a study reporting prevalence,
one needs careful definition of both the numerator
and the denominator.
What Is a Case?
Defining the Numerator
Cases might be people in the general population who
develop a disease or patients in clinical settings with
disease who develop an outcome event such as recur-
rence, complication of treatment, or death. In
either situation, the way in which a case is defined
affects rates. Rates may also be affected by how aggressively
Most clinical phenomena (serum cholesterol, one looks for cases. For example, aspirin can induce
serum calcium, thyroid hormone levels, etc.) exist asthma in some people. How often does this occur?
on a continuum from low to high. The cutoff point It depends on the definition of a case. When peo-
defining a case can be placed at various points and ple are simply asked whether they have a breath-
this can have large effects on the resulting ing problem after taking aspirin, rates are relatively
prevalence. We will discuss some of the reasons why low, about 3% in adults. When a case is defined
one would place a cutoff at one or another point in more rigorously, by giving aspirin and measuring
Chapter 3 and the consequences for a diagnostic whether this was followed by bronchoconstriction,
test perfor- mance in Chapter 8. the prevalence of aspirin-induced asthma is much
higher, about 21% in adults (10). The lower rate
10
Overweight
Population
Class I
4
Normal weight
Class II
2
Under- weight Class III
0 Obese
10 15 20
25 30 35 40 45 50 55
Body mass index
Figure 2.4 ■ The prevalence of overweight and obesity in men, 2007 to
2008. (Data from Flegal KM, Carroll MD, Ogden CL, et al. Prevalence and trends
in obesity among US adults, 1999–2008. JAMA 2010;303(3):235–241.)
2 Clinical Epidemiology: The
Table 2.3
Example
Classification of Obesity According Many cases of prostate cancer remain indolent and are not detect
to the U.S. National Institutes of
Health and World Health
Organization
Classification Body Mass Index (kg/m2)
Underweight 18.5
Normal weight 18.5–24.9
Overweight 25.0–29.9
Obesity 30
Obesity Class I 30.0–34.9
Obesity Class II 35.0–39.9
Obesity Class III 40
(“severe,” “extreme,”
or “morbid”)
Data from Flegal KM, Carroll MD, Ogden CL et al. Prevalence
and trends in obesity among US adults, 1999–2008. JAMA
2010;303:235–241.
250
200
Age-adjusted Incidence /
150
100
PSA Approval
50
0
1975 1980 1985 1990 1995 2000 2005 2007
Year of diagnosis
Figure 2.5 ■ Incidence depends on the intensity of efforts to find cases. Incidence
of prostate cancer in the United States during the widespread use of screening with
prostate- specific antigen (PSA). (Redrawn with permission from Wolf AMD, Wender RC,
Etzioni RB et al. American Cancer Society guideline for the early detection of prostate
cancer: Update 2010. CA Cancer Journal for Clinicians 2010;60:70–98.)
Chapter 2: Frequency 27
Time
An epidemic is a concentration of new cases in and signs of a febrile respiratory illness,
time. The term pandemic is used when a disease chest radiograph changes, lack of response
is especially widespread, such as a global epidemic to anti- biotics, and normal or decreased
of particularly severe influenza (e.g., the one in white blood cell count. Later, as more
1918– 1919) and the more slowly developing but became known about this new disease,
world- wide rise in HIV infection/AIDS. The laboratory testing for the responsible
existence of an epidemic is recognized by an coronavirus could be used to define a case.
epidemic curve that shows the rise and fall of cases Cases were called “reported” to make clear
of a disease over time in a population. that there was no assurance that all cases
in the Beijing community were detected.
Figure 2.6 also indicates when major
con- trol measures were instituted. The
Example epidemic declined in relation to aggressive
Figure 2.6 shows the epidemic curve for Se- vere Acute Respiratory Syndrome
quarantine (SARS),involving
measures in Beijing,the
People’s Republic of China
closing
of public gath- ering places, identifying new
cases early in their course, removing cases
from the community, and isolating cases in
facilities specifically for SARS. It is possible
that the epidemic abated for reasons other
than these control measures, but it is
unlikely given that similar control measures in
other places were also followed by a resolu-
tion of the epidemic. Whatever the cause,
the decline in new cases allowed the World
Health Organization to lift its advisory
against travel to Beijing so that the city
could reopen public places and resume
normal international busi- ness and tourism.
200 Fever checks at airports begins Quarantine of close contacts Start to group patients with SARS in designated war
100
SARS made reportable Contact tracing begins All patients with SARS in designated hospitals
50
0
Mar 7 Mar 14 Mar 21 Mar 28 Apr 4 Apr 11 Apr 18 Apr 25 May 2 May 9 May 15 May 23 May 30
Date of hospitalization
Figure 2.6 ■ An epidemic curve. Probable cases of severe acute respiratory syndrome in Beijing March 2003 through
May 2003, in relation to control measures. (Adapted with permission from Pang X, Zhu Z, Xu F, et al. Evaluation of
control measures implemented in the severe acute respiratory syndrome outbreak in Beijing, 2003. JAMA 2003;290:
3215–3221.)
Chapter 2: Frequency 29
Example
The incidence of colorectal cancer is very differ- ent in different parts of the world. Rates, even when adjusted for di
5.0 cases/10,000/year
Figure 2.7 ■ Colorectal cancer incidence for men according to area of the globe. (Data from Center MM, Jemal
A, Smith RA, et al. Worldwide variations in colorectal cancer. CA Cancer J Clin 2009;59:366–378.)
3 Clinical Epidemiology: The
by an infectious agent transmitted in semen and Hodgkin disease, aplastic anemia, or systemic lupus
blood. Laboratory studies confirmed this hypothesis ery- thematosus. In contrast, some referral hospitals
and discovered the human immunodeficiency virus. are well prepared for just these diseases, and
Identification of the kinds of people most affected appropriately so.
also led to special efforts to prevent spread of the
disease in them—for example, by targeting education What Are Prevalence Studies Not
about safe sex to those communities, closing public Particularly Good For?
bathhouses, and instituting safe-needle programs.
Prevalence studies provide only weak evidence of
cause and effect. Causal questions are inherently
USES OF PREVALENCE STUDIES about new events arising over time; that is, they
Properly performed prevalence studies are the very are about incidence. One of the other limitations of
best ways of answering some important questions prevalence studies, for this purpose, is that it may
and are a weak way of answering others. be difficult to know whether the purported cause
actually preceded or followed the effect because the
What Are two are measured at the same point in time. For
Prevalence Studies example, if inpatients with hyperglycemia are more
often infected, is it because hyperglycemia impairs
Good For? immune function leading to infection or has the
Prevalence studies provide valuable information infection caused the hyperglyce- mia? If a risk
about what to expect in different clinical factor (e.g., family history or a genetic marker) is
situations. certain to have preceded the onset of dis- ease or
outcome, interpretation of the cause-and-effect
Example sequence is less worrisome.
The approach to cervical lymphadenopathy depends on where and in whom it is seen. Children with persistent cervical aden
Another limitation is that prevalence may be
the result of incidence of disease, the main
consideration in causal questions, or it may be
related to duration of disease, an altogether
different issue. With only information about
prevalence, one cannot determine how much each of
the two, incidence and duration, contributes.
Nevertheless, cross-sectional studies can provide
compelling hypotheses about cause and effect to be
tested by stronger studies.
The underlying message is that a well-performed
cross-sectional study, or any other research design, is
not inherently strong or weak but is only in
relation to the question it is intended to answer.
Example
Children living on farms are less likely to have asthma than childr
Revie w Question s
Read the following statements and mark the about 1/100 persons. On average, how many
best answer. years does the disease persist?
2.1. Cancer registries report 40 new cases of A. 10
bladder cancer per 100,000 men per year. B. 25
Cases were from a complete count of all C. 33
patients who developed bladder cancer in D. 40
several regions of the United States, and the E. 50
number of men at risk was estimated from
2.6. Which of the following studies is not a cohort
the census data in those regions. Which rate
study?
is this an example of?
A. The proportion of patients with
A. Point prevalence
stomach cancer who survive 5 years
B. Period prevalence
B. The risk of developing diabetes mellitus in
C. Incidence density
children according to their weight
D. Cumulative incidence
C. Complications of influenza vaccine
E. Complication rate
among children vaccinated in 2011
D. The earlier course of disease in a group
2.2. Sixty percent of adults in the U.S.
of patients now under care in a clinic
population have a serum cholesterol
E. Patients admitted to an intensive care
200mg/dL (5.2 mmol/L). Which rate is this
unit and followed up for whether they are
an example of?
still alive at the time of hospital
A. Point prevalence discharge
B. Complication rate
C. Incidence density 2.7. A sample for a study of incidence of
D. Cumulative Incidence medication errors is obtained by enrolling
E. Period prevalence every 10th patient admitted to a
hospital. What kind of sample is this?
2.3. You are reading a study of the prevalence of A. Stratified sample
uterine cervix infections and want to decide if B. Probability sample
the study is scientifically sound. Which of the C. Convenience sample
following is not important? D. Random sample
A. Participants are followed up for a E. Oversample
sufficient period of time for anemia to 2.8. Cohort studies of children with a first febrile
occur. seizure have shown that they have a one in
B. The study is done on a representative three chance of having another seizure during
sample of the population. childhood. What kind of rate is this?
C. All members of the population are
women. A. Point prevalence
D. Cervical infection is clearly defined. B. Complication rate
E. The study is done on a sample from a C. Cumulative Incidence
defined population. D. Period prevalence
E. Incidence density
2.4. A probability sample of a defined population:
2.9. Which of the following would not increase
A. Is invalidated by oversampling. the observed incidence of disease?
B. Is inferior to a random sample.
C. Is not representative of the population. A. More aggressive efforts to detect the disease
D. Results in a representative sample of B. A true increase in incidence
the population only if there are enough C. A more sensitive way of detecting the disease
people in the sample. D. A lowering of the threshold for diagnosis
of disease
2.5. The incidence of rheumatoid arthritis is E. Studying a larger sample of the
about 40/100,000/year and the prevalence is population
Chapter 2: Frequency 33
study of the incidence and complication rates of herpes zoster
2.10. Infection with a fungus, coccidioidomycosis,
is common in the deserts of the southwestern
United States and in Mexico, but
uncommon elsewhere. Which of the
following best describes this infection?
A. Endemic
B. Pandemic
C. Incident
D. Epidemic
E. Prevalent
REFERENCES
Abnormality
. . . the medical meaning of “normal” has been lost in the shuffle of statistics.
—Alvan Feinstein
1977
VALIDITY
(Accuracy)
High Low
A B
High
Frequenc
RELIABILITY
(Precision)
C D
Low
Measurement
Figure 3.1 ■ Validity and reliability. A. High validity and high reliability. B. Low
validity and high reliability. C. High validity and low reliability. D. Low validity and
low reliability. The white lines represent the true values.
Example
Findings on chest radiographs are used as part of the diagnosis40of Acute Lung Injury and Acute Respiratory Distress Syndrom
20
0
Readings by 21 experts
Figure 3.2 ■ Observer variability. Variability among
21 specialists reading chest x-rays for acute lung injury and
acute respiratory distress syndrome. The percentage of
radiographs read as positive for the diagnosis varied from 36%
to 71% among the experts. (Data from Rubenfeld GD, Caldwell
E, Granton J, et al. Interobserver variability in ap- plying a
radiographic definition for ARDS. Chest 1999;116: 1347–
1353.)
Example
Variations in measurements also arise because Clinicians estimate the frequency of ventricular
they are made on only a sample of the phenomenon premature beats (VPBs) to help determine the
being described, which may misrepresent the whole. need for and effectiveness of treatment. For
practical reasons, they may do so by making
Often, the sampling fraction (the fraction of the
relatively brief observations—perhaps feeling a
whole that is included in the sample) is very small. pulse for 1 minute or reviewing an electrocar-
For example, a liver biopsy represents only about diogram recording lasting several seconds. How-
1/100,000 of the liver. Because such a small part of ever, the frequency of VPBs in a given patient
the whole is exam- ined, there is room for varies over time. To obtain a larger sample to
considerable variation from one sample to another. estimate the VPB rate, a portable monitor was
If measurements are made by several different developed that tracks ventricular premature
methods, such as different laboratories, technicians, depolarizations (VPDs) electrocardiographically.
or instruments, some of the measurements may be Early studies found monitoring even for ex-
tended periods of time can be misleading. Fig-
unre- liable or may produce results that are
ure 3.3 shows observations on one patient with
systematically different from the correct value, VPDs, similar to other patients studied (4). VPDs
which could contrib- ute to the spread of values per hour varied from 20 to 380 during a 3-day
obtained. period, according to day and time of day. The
Variation Resulting
from Biologic
Differences
Variation also arises because of biologic changes within
Chapter 3: Abnormality 39
individuals over time. Most biologic phenomena
4 Clinical Epidemiology: The
400
300
Number of Day 1
200
Day 2
100
Day 3
0
Noon 6 P.M. Midnight 6 A.M.
Simultaneous–same observer
Measurement
60 70 80 90 100 110
Diastolic blood pressure (mm Hg)
Figure 3.4 ■ Sources of variation in the measurement of diastolic (phase V)
blood pressure. The dashed line indicates the true blood pressure. Multiple sources of variation,
including within and among patients as well as intra- and inter-observer variation, all con- tribute
to blood pressure measurement results.
Table 3.4
Expressions of Central Tendency and Dispersion
(X X )2
a –
N 1 , where X each observation; X mean of all observations; and N number of observations.
Chapter 3: Abnormality 43
30
20 Serum potassium Alkaline phosphatase
20
10
10
30 40
Plasma 30
20 glucose Hemoglobin
20
10
10
mg/100 mL g/100 mL
Figure 3.6 ■ Actual clinical distributions. (Data from Martin HF, Gudzinowicz BJ,
Fanger H. Normal Values in Clinical Chemistry. New York: Marcel Dekker; 1975.)
Standard deviations –3 –2 –1 0 +1 +2 +3
2.1413.5934.1334.1313.592.14
95.44
99.72
in this way.
1 standard deviation of the mean, and about 95%,
within 2 standard deviations.
Although clinical distributions often resemble a
normal distribution the resemblance is superficial. As
summarized in a perspective, “The experimental fact
is that for most physiologic variables the distribution
is smooth, unimodal, and skewed, and that mean
2 standard deviations does not cut off the desired
95%. We have no mathematical, statistical, or
other theo- rems that enable us to predict the shape
of the distri- butions of physiologic
measurements” (5).
The shapes of clinical distributions differ from
one another because many differences among people,
other than random variation, contribute to distribu-
tions of clinical measurements. Therefore, if distri-
butions of clinical measurements resemble normal
curves, it is largely by accident. Even so, it is
often assumed, as a matter of convenience (because
means and standard deviations are relatively easy to
calculate and manipulate mathematically) that
clinical mea- surements are “normally”
distributed.
Example
Phenylketonuria (PKU) is an inherited disease characterized by
It is common practice to screen newborns for PKU with a blo
4 Clinical Epidemiology: The
Example
At what point does higher than average weight for height be
16
4
Mortal
0
115 120 140 160 180 Abnormal = Treating the
Usual systolic blood pressure (mm Hg) Condition Leads to a Better
Figure 3.9 ■ Ischemic heart disease mortality for peo- Clinical Outcome
ple ages 40 to 49 years is related to systolic blood
pres- sure throughout the range of values occurring in It makes intuitive sense to define a clinical condition
most people. There is no threshold between normal and or finding as “abnormal” if treatment of it leads to
abnormal. “Mortality” is presented as a multiple of the baseline a better outcome. This approach makes
rate. (Data from Prospective Studies Collaboration. Age-specific particularly good sense for asymptomatic
relevance of usual blood pressure to vascular mortality: a meta- conditions. If a con- dition is causing no trouble,
analysis of individual data for one million adults in 61 prospective and treatment makes no difference, why try to treat
studies. Lancet 2002;360:1903–1913.)
it? However, even for symptomatic patients, it is
sometimes difficult to
Chapter 3: Abnormality 49
A 20,000
16,0 00
Mortality rate / 100,000 person-
12,0 00
8,0 00
4,0 00
<18.5
18.5–21.9 22.0–24.9 25.0–27.4 27.5–29.930.0–34.9 ≥35.0
Body mass index (kg/m2)
B 40
30
Men with functional decline
20
10
Figure 3.10 ■ Abnormal as associated with disease and other patient outcomes. The re-
lationship between body mass index and (A) total mortality and (B) functional decline in men age 65 and
older on Medicare. Body mass index is weight in kilograms divided by height in meters squared. Mortality
rates are adjusted for age and smoking. (Redrawn with permission from Wee CC, Huskey KW, Ngo LH, et
al. Obesity, race and risk for death or functional decline among Medi- care beneficiaries. A cohort study.
Ann Intern Med 2011;154:645–655.)
5 Clinical Epidemiology: The
distinguish between clinical findings that will and What is considered treatable changes with time.
will not improve with treatment. Modern technology, At their best, therapeutic decisions are grounded on
especially newer imaging techniques, are now able to evi- dence from well-conducted clinical trials
detect abnormalities in patients so well that it is (Chapter 9). As new knowledge is acquired from
not always clear what is found is related to the the results of clinical trials, the level at which
patient’s complaint. The result is an increasingly treatment is consid- ered useful may change.
common dilemma for both patients and clinicians.
Example Example
Folic acid, a vitamin that occurs mainly in green leafy veget
Magnetic resonance imaging (MRI) of the knee is frequently performed in middle-aged and elderly patients presenting
Revie w Question s
For each of the numbered clinical scenarios C. Nominal
below (3.1–3.5), select from the lettered D. Ordinal
options the most appropriate term for the type E. Interval—Discrete
of data.
3.3. Serum sodium 139 mg/dL.
3.1. Deep tendon reflex grade 0 (no response),
1 (somewhat diminished), 2 (normal), A. Interval—Continuous
3 (brisker than average), and 4 (very brisk). B. Dichotomous
C. Nominal
A. Interval—Continuous D. Ordinal
B. Dichotomous E. Interval—Discrete
C. Nominal
D. Ordinal 3.4. Three seizures per month.
E. Interval—Discrete
A. Interval—Continuous
3.2. Cancer recurrent/not recurrent 5 years after B. Dichotomous
initial treatment. C. Nominal
D. Ordinal
A. Interval—Continuous E. Interval—Discrete
B. Dichotomous
5 Clinical Epidemiology: The
3.5. Causes of upper gastrointestinal bleeding: 3.8. “Abnormal” is commonly defined by all
duodenal ulcer, gastritis, esophageal, or of the following except:
other varices. A. The level at which treatment has been
A. Interval—Continuous shown to be effective
B. Dichotomous B. The level at which death rate is increased
C. Nominal C. Statistically unusual values
D. Ordinal D. Values that do not correspond to a
E. Interval—Discrete normal distribution
E. The level at which there is an
increased risk of symptoms
For questions 3.6–3.10, choose the best
answer.
3.9. All of the following statements are true except:
3.6. When it is not possible to verify measure- A. The normal distribution describes the
ment of a phenomenon, such as itching, distribution of most naturally occurring
by the physical senses, which of the phenomena.
following can be said about its validity? B. The normal distribution includes 2.5%
of people in each tail of the
A. It is questionable, and one should
distribution (beyond 2 standard
rely on “hard” measures such as
deviations from the mean).
laboratory tests.
C. The normal distribution is unimodal
B. It can be established by showing that
and symmetrical.
the same value is obtained when the
D. The normal distribution is the most
measurement is repeated by many
common basis for defining abnormal
different observers at different times.
laboratory tests measured on interval
C. It can be supported by showing that
scales.
the measurement is related to other
measures of phenomena such as the 3.10. You see a new patient who is a 71-year-old
presence of diseases that are known to woman on no medicines and without
cause itching. history of heart disease in herself or her
D. It can be established by showing that
family. She has never smoked and is not
measurement results in a broad range
diabetic. Her blood pressure is 115/75 mm
of values.
Hg, she is about 15 pounds overweight. A
E. It cannot be established.
total choles- terol test done 2 days ago was
high at as 250 mg/dL and the HDL
3.7. A physician or nurse measures a patient’s
cholesterol was 59 mg/ dL. The Framingham
heart rate by feeling the pulse for 10 seconds
risk calculator estimates that the patient’s risk
each time she comes to clinic. The rates
of developing general cardiovascular disease
might differ from visit to visit because of all
in the next 10 years
the following except:
is 9%. You know that treating cholesterol at
A. The patient has a different pulse rate at the level found reduces cardiovascular risk.
different times. The patient wants to know if she should start
B. The measurement may misrepresent the taking a statin. Which of the following state-
true pulse by chance because of the brief ments is least correct?
period of observation.
A. The patient is likely to have a lower
C. The physician and nurse use different
serum cholesterol the next time it is
techniques (e.g., different degrees of
measured.
pressure on the pulse).
B. The estimation of a 9% probability of
D. The pulse rate varies among patients.
cardiovascular disease in the next 10
E. An effective treatment was begun
years could be influenced by chance.
between visits.
C. The patient should be given a prescription
for a statin to lower her risk of coronary
heart disease.
Chapter 3: Abnormality 53
10 A
15
Monitored fetal heart rate 130–150
8
Proportion of infants
10
5
4
0
2 B
Number of
5 Monitored fetal heart rate <130
0
0 2,000 4,000 6,000 8,000
Birthweight (g) 0
3.11. Which statement about the central Figure 3.13 ■ Observer variability. Comparing fetal heart
tendency is incorrect ? auscultation to electronic monitoring of fetal heart rate. (Redrawn
with permission from Day E, Maddem L, Wood C. Auscultation
A. The mean birthweight is below 4,000 g. of foetal heart rate: an assessment of its error and significance.
B. There is more than one mode Br Med J 1968;4:422–424.)
birth weight.
C. Mean and median birth weights are
Figure 3.13 compares fetal heart rates
similar.
measured by electronic monitoring (white
middle bar),
3.12. Which statement about dispersion is most
to measurements by hospital staff in three
correct?
different circumstances: when fetal heart rate
A. The range is the best way to describe the of beats per minute by electronic monitoring
babies’ birth weights. was normal (130–150), low (<130), and high
B. Standard deviations should not be (>150). Questions 3.14–3.16 relate to the figure.
calculated because the distribution is For each question, choose the best answer.
skewed.
C. Ninety-five percent of the birth weights 3.14. The distribution of hospital staff
will fall within about 2 standard measurements around the electronic monitor
deviations of the mean. measurement in Panel A of Figure 3.13 could
be due to:
3.13. One standard deviation of babies’ birth
weights encompasses approximately: A. Chance
B. Inter-observer variability
A. Weights from 2,000 to 4,000 g C. Biased preference for normal results
B. Weights from 3,000 to 4,000 g D. A and B
C. Weights from 2,000 to 6,000 g E. A and C
F. B and C
G. A, B, and C
5 Clinical Epidemiology: The
3.15. The distribution of hospital staff measure- 3.16. The distribution of hospital staff
ments around the electronic monitor mea- measurements around the electronic monitor
surement in Panel B of Figure 3.13 could be measurement in Panel C of Figure 3.13 could
due to: be due to:
A. Chance A. Chance
B. Inter-observer variability B. Inter-observer variability
C. Biased preference for normal results C. Biased preference for normal results
D. A and B D. A and B
E. A and C E. A and C
F. B and C F. B and C
G. A, B, and C G. A, B, and C
A.
REFERENCES
1. Sharma P, Parekh A, Uhl K. An innovative approach to deter-
6. Prospective Studies Collaboration. Age-specific relevance of
mine fetal risk: the FDA Office of Women’s Health pregnancy
usual blood pressure to vascular mortality: a meta-analysis of
exposure registry web listing. Womens Health Issues 2008;18:
individual data for one million adults in 61 prospective studies.
226–228.
2. Feinstein AR. The need for humanized science in evaluating Lancet 2002;360:1903–1913.
7. Wee CC, Huskey KW, Ngo LH, et al. Obesity, race and risk
medication. Lancet 1972;2:421–423.
3. Rubenfeld GD, Caldwell E, Granton J, et al. Interobserver vari- for death or functional decline among Medicare beneficiaries.
A cohort study. Ann Intern Med 2011;154:645–655.
ability in applying a radiographic definition for ARDS.
8. Englund M, Guermazi A, Gale D. Incidental meniscal findings
Chest 1999;116:1347–1353.
4. Morganroth J, Michelson EL, Horowitz LN, et al. on knee MRI in middle-aged and elderly persons. N Engl J
Med 2008;359:1108–1115.
Limitations of routine long-term electrocardiographic
9. Lazo M, Selvin E, Clark JM. Brief communication: clinical
monitoring to assess ventricular ectopic frequency. Circulation
implications of short-term variability in liver function test
1978;58:408–414.
5. Elveback LR, Guillier CL, Keating FR. Health, normality, and results. Ann Intern Med 2008;148:348–352.
the ghost of Gauss. JAMA 1970;211:69–75.
Chapter 3: Abnormality 55
Chapter 4
RECOGNIZING RISK
Risk factors associated with large effects that occur
rapidly after exposure are easy for anyone to recognize.
It is not difficult to appreciate the relationship
5 Clinical Epidemiology: The
Long Latency
Many chronic diseases have a long latency
period between exposure to a risk factor and the
first mani- festations of disease. Radiation
exposure in child- hood, for example, increases
the risk for thyroid cancer in adults decades later.
Similarly, hypertension precedes heart disease by
decades, and calcium intake in young and middle-
aged women affects osteopo- rosis and fracture
rates in old age. When patients experience the
consequence of exposure to a risk fac- tor years
later, the original exposure may be all but
forgotten and the link between exposure and disease
obscured.
Common Exposure to
Risk Factors Low Incidence of Disease
Many risk factors, such as cigarette smoking or eat- The incidence of most diseases, even ones thought
ing a diet high in sugar, salt, and fat, have become to be “common,” is actually uncommon. Thus,
so common in Western societies that for many years although lung cancer is the most common cause of
their dangers went unrecognized. Only by compar- cancer deaths in Americans, and people who smoke
ing patterns of disease among people with and with- are as much as 20 times more likely to develop
out these risk factors, using cross-national studies lung cancer than those that do not smoke, the
or investigating special subgroups—Mormons, for yearly incidence of lung cancer in people who have
example, who do not smoke, or vegetarians who eat smoked heavily for 30 years, is 2 to 3 per 1,000.
diets low in cholesterol—were risks recognized that In the average physician’s practice, years may pass
were, in fact, large. It is now clear that about half between new cases of lung cancer. It is difficult
of lifetime users of tobacco will die because of their for the average clinician to draw conclusions about
habit; if current smoking patterns persist, it is pre- risks from such infrequent events.
dicted that in the 21st century, more than 1 billion
deaths globally will be attributed to smoking (1). Small Risk
A relationship between the sleeping position of
babies and the occurrence of sudden infant death The effects of many risk factors for chronic disease
syndrome (SIDS) is another example of a common are small. To detect a small risk, a large number of
exposure to a risk factor and the dramatic effect asso- people must be studied to observe a difference in dis-
ciated with its frequency, an association that went ease rates between exposed and unexposed persons.
unrecognized until relatively recently. For example, drinking alcohol has been known to
increase the risk of breast cancer, but it was less clear
whether low levels of consumption, such as drinking
Example
SIDS, the sudden, unexplained death of an in- fant younger than 1 year of age, is a leading cause of infant mortality. Studie
Multiple
Causes and
Multiple
Effects
There is usually not a close, one-to-one
relation- ship between a risk factor and a
particular disease.
Chapter 4: Risk: Basic Principles 55
Example
DISEASES Homocystinuria, a rare pediatric disease caused by autosomal
disease occurs quickly after an unusual exposure, but risk factor for a disease improves the ability to
most diseases and most exposures do not conform to predict disease, that is, improves risk stratification.
such a pattern. For accurate information about
risk, clinicians must turn to the medical literature,
partic- ularly to carefully constructed studies that
involve a large number of patients.
Example
PREDICTING RISK CVD is the most common cause of death glob- ally. In the United
Combining Multiple
Risk Factors to Predict
Risk
Because most chronic diseases are caused by several
relatively weak risk factors acting together, statisti-
cally combining their effects can produce a more
powerful prediction of risk than considering one risk
factor at a time. Statistically combining risk
factors produces a risk prediction model or a risk
predic- tion tool (also sometimes called a clinical
predic- tion tool or a risk assessment tool). Risk
prediction tools are increasingly common in clinical
medicine; well-known models used for long-term
predictions include the Framingham Risk Score
for predict- ing cardiovascular events and the
National Cancer Institute’s Breast Cancer Risk
Assessment Tool for predicting breast cancer Even if a risk factor improves a risk prediction
occurrence. Shorter-term hospital risk prediction model, a clinical trial is necessary that demonstrates
tools include the Patient At Risk of Re-admission lowering or removing the risk factor protects
Scores (PARR) and the Criti- cal Care Early patients. In the case of CRP, such a trial has not yet
Warning Scores. Prediction tools have also been reported, so it is possible that it is a marker
combined diagnostic test results, for example, to rather than a causal factor for CVD.
diagnose acute myocardial infarction when a
patient presents with chest pain, or for diagnosing Risk Prediction in Individual
the occurrence of pulmonary embolism. The statisti- Patients and Groups
cal methods used to combine multiple risk factors are
discussed in Chapter 11. Risk prediction tools are often used to predict the
Risk prediction models help with two important future for individuals, with the hope that each per-
clinical activities. First, a good risk prediction model son will know his or her risks, a hope summarized
aids risk stratification, dividing groups of people by the term “personalized medicine.” As an exam-
into subgroups with different risk levels (e.g., low, ple, Table 4.1 summarizes the information used in
medium, and high). Using the risk stratification
approach can also help determine whether adding a
newly proposed
Chapter 4: Risk: Basic Principles 57
99 100 100
100 98 With CRP Without CRP
60
53
40
35
20
9
0
Cardiovascular risk <5 5 to <10 10 to <20 ≥20
over 10 years (%)
Number of woman 6,965 633 248 65
Figure 4.2 ■ Effect of adding a new risk factor to a risk predic-
tion model. Comparison of risk prediction models for CVD over 10 years
among 7,911 non-diabetic women, with and without CRP as a risk
fac- tor. Adding CRP into the risk model improved risk stratification
of the women, especially to strata at higher risk by the model without CRP.
(Data from Ridker PM, Buring JE, Rifal N et al. Development and
validation of improved algorithms for the assessment of global
cardiovascular risk in women. JAMA 2007;297:611–619.)
compared with non-smokers. Even so, the smoker score than the non-diseased individual, the c-statistic
has about a 1 in 10 chance of developing lung cancer would be 1.0. In one study assessing
in the next 10 years. Most risk factors (and risk discrimination of the NCI breast cancer risk tool,
pre- diction tools) for most diseases are much the c-statistic was calculated as 0.58 (9). It is clear
weaker than the risk of lung cancer with smoking. that this is not a high c-statistic, but just what the
meaning of val- ues between 0.5 and 1.0 is
EVALUATING RISK difficult to understand clinically.
PREDICTION TOOLS The clearest (although rarest) method to under-
stand how well a risk prediction model discriminates
Determining how well a particular risk prediction is to compare visually the predictions for individuals
tool works is done by asking two questions: (i) to the observed results for all individuals in the
how accurately does the tool predict the proportion study. Figure 4.3A illustrates perfect
of dif- ferent groups of people who will develop the discrimination by a hypothetical risk prediction tool;
disease (calibration), and (ii) how accurately does it the tool completely separates people destined to
identify individuals who will and will not develop develop disease from those destined not to develop
the disease (discrimination)? To answer these disease. Figure 4.3B illustrates the ability of the
questions, the tool is tested on a large group of NCI breast cancer risk prediction tool to
people who have been followed for several years discriminate between women who subsequently
(sometimes, decades) with known outcomes of did and did not develop breast can- cer over a 5-
disease for each person in the group. year period and visually shows what a c-statistic of
0.58 means. Although the average risk scores are
Calibration slightly higher for the women who devel- oped
Calibration, determining how well a prediction breast cancer, and the their curve on the graph is
tool correctly predicts the proportion of a group slightly to the right of those who did not develop
who will develop disease, is conceptually and breast cancer, the individual risk prediction scores of
operation- ally simple. It is measured by comparing the two groups overlap substantially; there is no
the number of people in a group predicted or place along the x-axis of risk that separates women
estimated (E) by the prediction tool to develop into groups who did and did not develop breast
disease to the num- ber who are observed (O) to cancer. This is so even though the calibration of
develop the disease. Ratios of E/O close to 1.0 the model was very good.
mean the risk tool is well calibrated—it predicts a
proportion of people that is very close to the actual Sensitivity and Specificity of a
proportion that develops the disease. Evaluations of Risk Prediction Tool
the NCI breast cancer risk assessment tool have Yet another way to assess a risk prediction tool’s ability
found it is highly accurate in predicting the to distinguish who will and will not develop disease
proportion of women in a group who will develop is to determine its sensitivity and specificity (a topic
breast cancer in the next 5 years, with E/O ratios that will be discussed more thoroughly in Chapters 8
close to 1.0. and 10). Sensitivity of a risk prediction tool is the
ability of the tool to identify those individuals
Discrimination destined to develop a disease and is expressed as the
Discriminating among individuals in a group who percentage of people who the tool correctly identifies
will and will not develop disease is difficult, even for will develop the disease. A tool’s specificity is the
well-calibrated risk tools. The most common method ability to iden- tify individuals who will not
used to measure discrimination accuracy is to cal- develop the disease, expressed as percentage of
culate a concordance statistic (often shortened to people the tool correctly identifies who will not
c-statistic). It estimates how often in pairs of ran- develop the disease. Looking at Figure 4.3, a 5-year
domly selected individuals, one of whom went on to risk of 1.67% was chosen as a cut point between
develop the disease of interest and one of whom “low” and “high” risk. Using that cut point, the
did not, the risk prediction score was higher for sensitivity was estimated as 44% (44% of women
the one who developed disease. If the risk prediction who developed breast cancer had a risk score
tool did not improve prediction at all, the resulting 1.67%) and specificity was estimated as 66% (66%
estimate would be like a coin toss and the c-statistic of women who did not develop breast cancer had a
would be risk score 1.67%). In other words, the risk
0.50. If the risk prediction tool worked perfectly, prediction tool missed more than half the women
so that in every pair the diseased individual had a who developed breast cancer over a 5-year period,
higher
Chapter 4: Risk: Basic Principles 59
A analysis
0.25 Women not developingWomen developing breast cancerbreast cancer that combines the results of sensitivity and
specificity and can be used to compare different
0.20
tools. ROCs are discussed in detail in Chapter 8.
Each group of women
Risk Stratification
0.15
As already mentioned, and as shown in Figure 4.2,
0.10
risk stratification can be used to assess how well a
risk prediction tool works and to determine whether
adding a new risk factor improves the tool’s
0.05
ability to classify people correctly into clinically
meaning- ful risk groups. Better risk stratification
0.00 improves a tool’s calibration. Risk stratification
Low risk x High risk may not dra- matically affect the tool’s
5-year risk of breast cancer diagnosis discrimination ability. For example, examining
Figure 4.2, the risk tool that included CRP
B 0.25 correctly assigned 99% of 6,965 women to the
Did not develop breast cancer lowest risk stratum (5% CVD events over 10
0.20 years). The study found that CVD events occurred
Each group of women
Revie w Question s
For questions 4.1–4.10, select the best answer. 4.4. Figure 4.2 shows:
4.1. In the mid-20th century, chest surgeons A. The risk model incorporating CRP
in Britain were impressed that they were results assigned too many women to the
operating on more men with lung cancer, intermediate risk strata.
most of whom were smoking. How might the B. The risk model incorporating CRP results
surgeons’ impression that smoking was a risk predicts which individual women will
factor for developing lung cancer have been develop CVD better than the risk model
wrong? without CRP results.
C. The number of women developing CVD
A. Smoking had become so common that over 10 years is likely highest in the
more men would have a history of group with a risk of 5%.
smoking, regardless of whether they
were undergoing operations for lung 4.5. A risk model for colon cancer estimates that
cancer. one of your patients has a 2% chance of
B. Lung cancer is an uncommon cancer, devel- oping colorectal cancer in the next 5
even among smokers. years. In explaining this to your patient,
C. Smoking confers a low risk of which of the following statements is most
lung cancer. correct?
D. There are other risk factors for
lung cancer. A. Because colorectal cancer is the second
most common non-skin cancer in
4.2. Risk factors are easier to recognize: men, he should be concerned about it.
B. The model shows that your patient will
A. When exposure to a risk factor occurs a not develop colorectal cancer in the next
long time before the disease. 5 years.
B. When exposure to the risk factor is C. The model shows that your patient is a
associated with a new disease. member of a group of people in whom
C. When the risk factor is a marker rather a very small number will develop
than a cause of disease. colorectal cancer in the next 5 years.
4.3. Risk prediction models are useful for: 4.6. In general, risk prediction tools are best at:
A. Predicting onset of disease A. Predicting future disease in a given patient.
B. Diagnosing disease B. Predicting future disease in a group of
C. Predicting prognosis patients.
D. All of the above C. Predicting which individuals will and will
not develop disease.
6 Clinical Epidemiology: The
4.7. When a risk factor is a marker for future C. Most women developing breast cancer
disease: over 5 years are at higher risk.
D. The risk model does not discriminate
A. The risk factor can help identify people at
very well.
increased risk of developing the disease.
B. Removing the risk factor can help prevent
4.10. It is difficult for risk models to determine
the disease.
which individuals will and will not develop
C. The risk factor is not confounding a true
disease for all of the following reasons
causal relationship.
except:
4.8. A risk factor is generally least useful in: A. The combination of risk factors is not
strongly related to disease.
A. The risk stratification process
B. The risk factors are common throughout a
B. Diagnosing a patient’s complaint
population.
C. Preventing disease
C. The model is well calibrated.
D. Most people destined to develop the
4.9. Figure 4.3B shows that:
disease are not at high risk.
A. The risk model is well calibrated.
B. The risk model works well at Answers are in Appendix A.
stratifying women into different risk
groups.
REFERENCES
to impose possible risk factors on a group of healthy 1. They do not have the disease (or outcome) in
people for the purposes of scientific research. question at the time they are assembled.
Second, most people would balk at having their 2. They should be observed over a meaningful
diets and behaviors constrained by others for long period of time in the natural history of the dis-
periods of time. Finally, the experiment would ease in question so that there will be sufficient
have to go on for many years, which is difficult and time for the risk to be expressed. For example,
expensive. As a result, it is usually necessary to if one wanted to learn whether neck irradiation
study risk in less obtrusive ways. during childhood results in thyroid neoplasms,
Clinical studies in which the researcher gath- a 5-year follow-up would not be a fair test of
ers data by simply observing events as they this hypothesis, because the usual time period
happen, without playing an active part in what between radiation exposure and the onset of
takes place, are called observational studies. dis- ease is considerably longer.
Most studies of risk are observational studies and 3. All members of the cohort should be observed
are either cohort studies, described in the rest of over the full period of follow-up or methods
this chapter, or case- control studies, described in must be used to account for dropouts. To the
Chapter 6. extent that people drop out of the study and
their reasons for dropping out are related in some
Cohorts way to the outcome, the information provided
As defined in Chapter 2, the term cohort is used by an incomplete cohort can misrepresent the
to describe a group of people who have something true state of affairs.
in common when they are first assembled and
who are then observed for a period of time to see
what happens to them. Table 5.1 lists some of the Cohort Studies
ways in which cohorts are used in clinical
The basic design of a cohort study is illustrated in
research. Whatever members of a cohort have in
Figure 5.1. A group of people (a cohort) is
common, observations of them should fulfill three
assembled, none of whom has experienced the
criteria if the observations are to provide sound
outcome of interest, but all of whom could
information about risk of disease.
experience it. (For example, in a study of risk
factors for endometrial cancer, each member of
Table 5.1 the cohort should have an intact uterus.) Upon
Cohorts and Their Purposes entry into the study, people in the cohort are
classified according to those character- istics
Characteristic To Assess (possible risk factors) that might be related to
in Common Effect of Example outcome. For each possible risk factor, members
Age Age Life expectancy for of the cohort are classified either as exposed (i.e.,
people age 70 pos- sessing the factor in question, such as
(regardless of birth date)
hypertension) or unexposed. All the members of
Date of birth Calendar Tuberculosis rates for the cohort are then observed over time to see
time people born in 1930 which of them expe- rience the outcome, say,
Exposure Risk factor Lung cancer in people cardiovascular disease, and the rates of the outcome
who smoke events are compared in the exposed and
Disease Prognosis Survival rate for patients unexposed groups. It is then possible to see
with brain cancer whether potential risk factors are related to
Therapeutic Treatment Improvement in survival subsequent outcome events. Other names for cohort
intervention for patients with studies are incidence studies, which emphasize that
Hodgkin lymphoma patients are followed over time; prospective stud-
given combination ies, which imply the forward direction in which
chemotherapy the patients are pursued; and longitudinal studies,
Preventive Prevention Reduction in incidence which call attention to the basic measure of new
intervention of pneumonia after disease events over time.
pneumococcal The following is a description of a classic cohort
vaccination study that has made important contributions to our
understanding of cardiovascular disease risk factors
and to modern methods of conducting cohort studies.
Chapter 5: Risk: Exposure to Disease 63
Exposed
NO
COHORT Time
YES
Not exposed
NO
Figure 5.1 ■ Design of a cohort study of risk. Persons without disease are divided into two groups—
those exposed to a risk factor and those not exposed. Both groups are followed over time to determine what
proportion of each group develops disease.
Cohort assembed
Follow-up
Prospective cohort
Cohort assembed Follow-up
Figure 5.2 ■ Retrospective and prospective cohort studies. Prospective cohorts are as-
sembled in the present and followed forward into the future. In contrast, retrospective cohorts are
made by going back into the past and assembling the cohort, for example, from medical records, then
following the group forward to the present.
Example
The incidence of autism increased sharply
in the 1990s, coinciding with an increasing
vaccination of young children for measles,
mumps, and rubella (MMR). A report linking
MMR vaccination and autism in several chil-
dren caused widespread alarm that vaccina-
tion (or the vaccine preservative, thimerosal)
was responsible for the increasing incidence
of autism. In some countries, MMR vaccination
rates among young children dropped, result-
Historical Cohort Studies ing in new outbreaks and even deaths from
Using Medical Databases measles. Because of the seriousness of the
situation, several studies were undertaken to
Historical cohort studies can take advantage of com- evaluate MMR vaccine as a possible risk fac-
puterized medical databases and population reg- tor. In Denmark, a retrospective cohort study
istries that are used primarily for patient care or to included all children (537,303) born from
track population health. The major advantages of January 1991 through December 1998 (4). The
historical cohort studies over classical prospective investigators reviewed the children’s coun-
cohort studies are that they take less time, are less trywide health records and determined that
82% received the MMR vaccine (physicians
expensive, and are much easier to do. However, they
cannot undertake studies of factors not recorded in
computerized databases, so patients’ lifestyle, social
Chapter 5: Risk: Exposure to Disease 65
†
Strictly speaking, the study was a modification of a
standard case-cohort design that would have included all
cases, not just 1%, of 26,800 breast cancers that developed
in the compari- son group of women not undergoing
prophylactic mastectomy. However, because breast cancer
occurs commonly, a random sample of the group sufficed.
6 Clinical Epidemiology: The
Table 5.2
Advantages and Disadvantages of Cohort Studies
Advantages Disadvantages
All Cohort Study Types
The only way of establishing incidence (i.e., absolute risk) Susceptible to confounding and other biases
directly
Follows the same logic as the clinical question: If persons are
exposed, do they get the disease?
Exposure can be elicited without the bias that might occur if
outcome were known before documentation of exposure
Table 5.3
Measures of Effect
Table 5.4
Calculating Measures of Effect: Cigarette Smoking and Death from Lung Cancer in Mena
Simple Risks
Death rate (absolute risk or incidence) from lung cancer in smokers 341.3/100,000/yr
Death rate (absolute risk or incidence) from lung cancer in non- 14.7/100,000/yr
smokers
Prevalence of cigarette smoking 32.1%
Lung cancer mortality rate in population 119.4/100,000/yr
Compared Risks
Attributable risk 341.3/100,000/yr – 14.7/100,000/yr 326.6/100,000/yr
Relative risk 341.3/100,000/yr 14.7/100,000/yr 23.2
Population-attributable risk 326.6/100,000/yr 0.321 104.8/100,000/yr
Population-attributable fraction 104.8 /100,000/yr 119.4/100,000/yr 0.88
a
Data from Thun MJ, Day-Lally CA, Calle EE, et al. Excess mortality among cigarette smokers: Changes in a 20-year interval. Am J Public Health
1995;85:1223–1230.
100,000 (3 to 4 lung cancer deaths per 1,000 ratios, discussed in Chapter 6) is the most commonly
smokers per year). reported result in studies of risk, not only because
of its computational convenience but also because
Attributable Risk it is a common metric in studies with similar risk
factors but with different baseline incidence rates.
One might ask, “What is the additional risk (inci- Because relative risk indicates the strength of the
dence) of disease following exposure, over and association between exposure and disease, it is a
above that experienced by people who are not useful measure of effect for studies of disease
exposed?” The answer is expressed as etiology.
attributable risk, the absolute risk (or incidence)
of disease in exposed persons minus the absolute
risk in non-exposed per- sons. In Table 5.4, the Interpreting Attributable
attributable risk of lung cancer death in smokers is and Relative Risk
calculated as 326.6 per 100,000 per year. Although attributable and relative risk are calculated
Attributable risk is the additional incidence of from the same two components—the incidence (or
disease related to exposure, taking into account the absolute risk) of an outcome from an exposed and
background incidence of disease from other unexposed group—the resulting size of the risk may
causes. Note that this way of comparing rates implies appear to be quite different depending on whether
that the risk factor is a cause and not just a marker. attributable or relative risk is used.
Because of the way it is calculated, attributable risk
is also called risk difference, the differences
between two absolute risks.
Example
Suppose a risk factor doubles the chance of dying of a certain dise
Relative Risk
On the other hand, one might ask, “How many times
more likely are exposed persons to get the disease
relative to non-exposed persons?” To answer this
question, relative risk or risk ratio, is the ratio of
incidence in exposed persons to incidence in non-
exposed persons, estimated in Table 5.4 as 23.2.
Relative risk (or an estimate of relative risk, odds
Chapter 5: Risk: Exposure to Disease 69
Table 5.5
Comparing Relative Risk and Attributable Risk in the Relationship of Bone Mineral
Density (BMD) T-scores, Fractures, and Age
15 A
disease 10-year
13
Coronary heart
10 Excess coronary
heart disease
adjusted
attributable to
8
5 elevated blood
.
4.6 2 pressure
0
B
20 2
Population
2
Prevalance of
elevated blood
10 1
pressure at
various levels
50 C
coronary heart
48 Percent excess
coronary heart
25 disease
Excess
27 attributable to
23 .6 various levels of
.7 hypertension
0 140–159
Systolic 130–139 90–99 ≥160
Diastolic 85–89 Blood pressure (mm ≥100
Hg)
Figure 5.3 ■ Relationships among attributable risk, prevalence of risk factor, and population risk for
coronary heart disease (CHD) death due to hypertension. Panel A shows that the attributable risk for CHD increases as
blood pressure levels increase. However, because mild and moderate hypertension are more prevalent than severe hypertension (Panel
B), most excess CHD deaths caused by hypertension are not due to the highest levels of blood pressure (Panel C). (Data from Wilson
PWF, Agostino RB, Levy D, et al. Prediction of coronary heart disease using risk factor categories. Circulation 1998;97:1837–
1847.)
Potential Confounders
How does one decide which variables should be con-
sidered potential confounders? One approach is to
identify all the variables that are known, from
other studies, to be associated with either exposure
or dis- ease. Age is almost always a candidate, as are
known risk factors for the disease in question.
Another approach is to screen variables in the
study data for statistical associations with exposure
and disease, using liberal criteria for “association”
so as to err on the side of not missing potential
confounding. Inves- tigators may also consider
variables that just make sense according to their CONTROL OF CONFOUNDING
clinical experience or the biol- ogy of disease,
regardless of whether there are strong research To determine whether a factor is independently
studies linking them to exposure or disease. The related to risk or prognosis, it is ideal to compare
intention is to cast a broad net so as not to miss cohorts with and without the factor, everything
possible confounders. This is because of the possibil- else being equal. But in real life, “everything else” is
ity that a variable may confound the exposure– usu- ally not equal in observational studies.
disease relationship by chance, because of the What can be done about this problem? There
particular data at hand, even though it is not a are several possible ways of controlling† for
confounder in nature. differences
Confirming Confounding
How does one decide whether a variable that might
confound the relationship between exposure and disease.
disease actually does so? One approach is to simply
show that the variable is associated with exposure
and (separately) to show that it is associated with
†
Unfortunately, the term “control” also has several
Chapter 5: Risk: Exposure to Disease 73
other mean- ings: the non-exposed people in a cohort
study, the patients in a clinical trial who do not
receive the experimental treatment, and non-
diseased people (non-cases) in a case-control study.
7 Clinical Epidemiology: The
Randomization
This study Other studies
Data source The best way to balance all extraneous variables
between groups is to randomly assign patients to
groups so that each patient has an equal chance of
falling into the exposed or unexposed group (see
Chapter 9). A special feature of randomization is that
Saturated fats Transfatty acids Smoking Exercise
it balances not only variables known to affect out-
Confounding variables come and included in the study, but also unknown or
unmeasured confounders. Unfortunately, it is usually
not possible to study risk or prognostic factors
with randomized trials.
Figure 5.4 ■ Example of confounding. The relationship
between folate intake and incidence of stroke was con- founded Restriction
by several cardiovascular risk and protective factors.
Patients who are enrolled in a study can be confined
to only those possessing a narrow range of
between groups. Controlling is a general term for any characteristics, a strategy called restriction. When
process aimed at removing the effects of extraneous this is done, cer- tain characteristics can be made
similar in the groups
variables while examining the independent effects being compared. For example, the effect of prior
of individual variables. A variety of methods can cardiovascular disease on prognosis after acute myo-
be applied during the design or analysis of cardial infarction could be studied in patients who
research (summarized in Table 5.6 and described had no history of cigarette smoking or hyperten-
in the fol- lowing text). One or more of these sion. However, this approach is limiting. Although
strategies should be applied in any observational restriction on entry to a study can certainly
study that attempts to describe the effect of one produce homogeneous groups of patients, it does
variable independent of other variables that might so at the expense of generalizability. In the course of
affect the outcome. The excluding
Table 5.6
Methods for Controlling Confounding
Phase of Study
Table 5.7
Example of Stratification: Hypothetical Death Rates after Coronary Bypass Surgery in
Two Hospitals, Stratified by Preoperative Risk
Hospital A Hospital B
Preoperative Risk Patients Deaths Rate (%) Patients Deaths Rate (%)
High 500 30 6 400 24 6
Medium 400 16 4 800 32 4
Low 300 2 0.67 1,200 8 0.67
Total 1,200 48 4 2,400 64 2.7
potential subjects, cohorts may no longer be repre- are especially strongly related to outcome, investiga-
sentative of most patients with the condition. Also, tors rely on other ways of controlling for bias as well.
after restriction, it is no longer possible, in that study,
to learn anything more about the effects of Stratification
excluded variables.
With stratification, data are analyzed and results
presented according to subgroups of patients, or
Matching strata, of similar risk or prognosis (other than the
Matching is another way of making patients in exposure of interest). An example of this approach is
two groups similar. In its simplest form, for each the analysis of differences in hospital morality for
patient in the exposure group, one or more patients a common surgical procedure, coronary bypass
with the same characteristics (except for the factor surgery (Table 5.7). This is especially relevant today
of interest) would be selected for a comparison because of several high-profile examples of “report
group. Matching is typically done for variables that cards” for doctors and hospitals, and the concern
are so strongly related to outcome that investigators that the reported differences may be related to
want to be sure they are not different in the groups patient rather than surgeon or hospital
being compared. Often, patients are matched for characteristics.
age and sex because these variables are strongly Suppose we want to compare the operative
related to risk or prognosis for many diseases, but mor- tality rates for coronary bypass surgery at
matching for other variables, such as stage or sever- Hospitals A and B. Overall, Hospital A noted 48
ity of disease and prior treatments, may also be deaths in 1,200 bypass operations (4%), and
useful. Hospital B experienced 64 deaths in 2,400
Although matching is commonly done and can be operations (2.6%).
very useful, it has limitations. Matching controls bias The crude rates suggest that Hospital B is supe-
only for those variables involved in the match. Also, rior. But is it really superior if everything else is
it is usually not possible to match for more than a equal? Perhaps the preoperative risk among patients
few variables because of practical difficulties in find- in Hospital A was higher than in Hospital B and
ing patients who meet all of the matching criteria. that, rather than hospital care, accounted for the
Moreover, if categories for matching are relatively difference in death rates. To see if this possibility
crude, there may be room for substantial differences accounts for the observed difference in death rates,
between matched groups. For example, if women in patients in each of these hospitals are grouped into
a study of risk for birth of a child with Down strata of similar underlying preoperative risk based
syndrome were matched for maternal age within 10 on age, prior myocardial function, extent of occlu-
years, there could be a nearly 10-fold difference in sive disease, and other characteristics. Then the oper-
frequency related to age if most of the women in ative mortality rates within each stratum of risk
one group were 30 years old and most in the other are compared.
39 years old. Finally, as with restriction, once one Table 5.7 shows that when patients are divided
matches on a variable, its effects on outcomes can by preoperative risk, the operative mortality rates
no longer be evaluated in the study. For these in each risk stratum are identical in two hospitals:
reasons, although matching may be done for a few 6% in high-risk patients, 4% in medium-risk
characteristics that patients, and 0.67% in low-risk patients. The crude
rates were mis- leading because of important
differences in the risk
7 Clinical Epidemiology: The
characteristics of the patients treated at the two strategy to control for confounding when multiple
hos- pitals: 42% of Hospital A’s patients and only variables need to be considered.
17% of Hospital B’s patients were high risk.
An advantage of stratification is that it is a rela- Multivariable Adjustment
tively transparent way of recognizing and controlling
In most clinical situations, many variables act
for bias.
together to produce effects. The relationships among
these variables are complex. They may be related to
Standardization one another as well as to the outcome of interest.
If an extraneous variable is especially strongly related The effect of one might be modified by the pres-
to outcomes, two rates can be compared without ence of others, and the joint effects of two or more
bias related to this variable if they are adjusted to might be greater than their individual effects taken
equalize the weight given to that variable. This together.
Multivariable analysis makes it possible to con-
process, called standardization (or adjustment),
shows what the overall rate would be for each group sider the effects of many variables simultaneously.
if strata-specific rates were applied to a population Other terms for this approach include mathematical
made up of similar proportions of people in each modeling and multivariable adjustment. Modeling
stratum. is used to adjust (control) for the effects of many
To illustrate this process, suppose the operative variables simultaneously to determine the indepen-
mortality in Hospitals A and B can be adjusted to dent effects of one. This method also can select,
a common distribution of risk groups by giving from a large set of variables, those that contribute
each risk stratum the same weight in the two independently to the overall variation in outcome.
hospitals. Without adjustment, the risk strata Modeling can also arrange variables in order of
receive different weights in the two hospitals. The the strength of their contribution. There are several
mortality rate of 6% for high-risk patients receives a kinds of prototypic models, according to the design
weight of 500/1,200 in Hospital A and a much lower and data in the study. Cohort and case-control stud-
weight of 400/2,400 in Hospital B. The other risk ies typically rely on logistic regression, which is
strata were also weighted differently in the two used specifically for dichotomous outcomes. A Cox
proportional hazard model is used when the out-
hospitals. The result is a crude rate for Hospital A,
which is the sum of the rate in each stratum times come is the time to an event, as in survival
analyses (see Chapter 7).
its weight: (500/1,200 0.06) (400/1,200 Multivariable analysis of observational studies is
0.04) (300/1,200 0.0067) 0.04. the only feasible way of controlling for many vari-
Similarly, the crude rate for Hospital B is (400/2,400 ables simultaneously during the analysis phase of a
0.06) (800/2,400 0.04) (1,200/2,400 study. Randomization also controls for multiple vari-
0.0067) 0.027. ables, but during the design and conduct phases of
If equal weights were used when comparing the a study. Matching can account for only a few
two hospitals, the comparison would be fair (free of variables at a time, and stratified analyses of many
the effect of different proportions in the various risk variables run the risk of having too few patients in
groups). The choice of weights does not matter as some strata. The disadvantage of modeling is that
long as it is the same in the two hospitals. Weights for most of us it is a “black box,” making it difficult
could be based on those existing in either of the hos- to recognize where the method might be
pitals or any reference population. For example, if misleading. At its best, model- ing is used in
each stratum were weighted 1/3, then the standard- addition to, not in place of, matching and stratified
ized rate for Hospital A (1/3 0.06) (1/3 analysis.
0.04) (1/3 0.0067) 0.035, which is exactly
the same as the standardized rate for Hospital B. Overall Strategy for
The consequence of giving equal weight to strata Control of Confounding
in each hospital is to remove the apparent excess
risk of Hospital A. Except for randomization, all ways of dealing with
Standardization is commonly used in relatively extraneous differences between groups share a
crude comparisons to adjust for a single variable limita- tion: They are effective only for those
such as age that is obviously different in groups variables that are singled out for consideration.
being com- pared. For example, the crude results They do not deal with risk or prognostic factors
of the folate/ stroke example were adjusted for age, that are not known at the time of the study or those
as discussed earlier. Standardization is less useful as that are known but not taken into account.
a stand-alone
Chapter 5: Risk: Exposure to Disease 77
120
With aspirin Without aspirin
Upper gastrointestinal complications/1,000 person-
110
100
90
80
70
OBSERVATIONAL STUDIES 60
AND CAUSE
50
The end result of a careful observational study, con-
trolling for a rich array of extraneous variables, is 40
to come as close as possible to describing a truly
30
independent effect, one that is separate from all
the other variables that confound the exposure–
20
disease relationship. However, it is always possible
that some important variables were not taken into
account, either because their importance was not History of ulcers10
Age (years)
No Yes NoYes NoYes
known or because they were not or could not be 0
measured. The consequence of unmeasured con- <50 60–69 ≥80
founders is residual confounding. For this rea-
son, in single studies the results should be thought Figure 5.5 ■ Example of effect modification. The ad-
of (and investigators should describe their results) ditional risk of gastrointestinal complications from aspirin is
as “independent associations” and not necessarily modified by age and history of peptic ulcer disease. (Data from
Patrono C, Rodriguez LAG, Landolfi R, et al. Low-dose aspirin
as establishing cause. Chapter 12 describes how to
for the prevention of atherothrombosis. N Engl J Med
build a case for a causal association. 2005;353:2373–2383.)
7 Clinical Epidemiology: The
Example
sharply with age, and even much more with
aspirin use. At the highest level of risk, in
Aspirin has been shown to prevent cardiovascu- men older than age 80 and with an ulcer
lar events. Whether it should be recommended history, aspirin doubles risk from 60 to
depends on a patient’s risk of cardiovascular 120/1,000 person- years.
events and of complications of aspirin, mainly
upper gastrointestinal bleeding. Figure 5.5
shows that the additional rate of gastrointes-
tinal complications (the incidence attributable This example shows that age and history of peptic
to aspirin) depends on two other factors, age ulcer disease modify the effect of aspirin on
and prior history of peptic ulcer disease (12). In gastro- intestinal complications. The additional
men younger than 50 years old with no history information provided by effect modification enables
of ulcers, the rate of complications is about clinicians to tailor their recommendations about the
1/1,000 person-years, and there is virtually no use of aspi- rin more closely to the characteristics of
additional risk related to aspirin. Risk increases an individual patient.
a little with age among men without a history Confounding and effect modification are inde-
of ulcers. However, among men with a history pendent of each other. A given variable might be a
of ulcer disease, the rate of complications rises confounder, effect modifier, both, or neither, depend-
ing on the research question and data.
Revie w Question s
For question 5.1, select the best answer. 5.2. What was the relative risk of stroke of
smokers compared to non-smokers in their
5.1. Which of the following statements is not 40s?
correct for both prospective and retrospective
cohort studies? A. 1.4
B. 4.0
A. They measure incidence of disease C. 22.3
directly. D. 30.2
B. They allow assessment of possible E. 72.8
associations between exposure and many F. 80.7
diseases.
C. They allow investigators to
decide beforehand what data to 5.3. What was the attributable risk per 1,000
collect. people of stroke among smokers compared
D. They avoid bias that might occur if to non-smokers in their 60s?
measurement of exposure is made after
A. 1.4
the outcome of interest is known.
B. 4.0
Questions 5.2–5.4 are based on the following C. 22.3
example: D. 30.2
E. 72.8
A study was done examining the relationship of F. 80.7
smoking, stroke, and age (13). The 12-year inci-
dence per 1,000 persons (absolute risk) of stroke
according to age and smoking status was: 5.4. Which of the following statements about the
study results is incorrect?
Age Non-smokers Smokers A. To calculate population-attributable risk
45–49 7.4 29.7 of smoking among people in their 60s,
65–69 80.2 110.4 additional data are needed.
Chapter 5: Risk: Exposure to Disease 79
B. More cases of stroke due to smoking 5.6. What is the attributable risk of DVT for
occurred in people in their 60s than women taking OCs who do not carry the
in their 40s. mutation for factor V Leiden compared to
C. When relative risk is calculated, the those not taking OCs and not carrying the
results reflect information about the mutation?
incidence in exposed and unexposed
A. 0.8/10,000/yr
persons, whereas the results for
B. 1.3/10,000/yr
attributable risk do not.
C. 2.2/10,000/yr
D. The calculated relative risk is a
D. 9.5/10,000/yr
stronger argument for smoking as
E. 25.5/10,000/yr
cause of stroke for persons in their
40s than the calculated risk for
5.7. What is the attributable risk of DVT
persons in their 60s.
for women taking OCs who carry factor
E. Depending on the question asked, age
V Leiden compared to women on OCs
could be considered either a confounding
but not carrying the mutation?
variable or an effect modifier in the
study. A. 0.8/10,000/yr
B. 1.3/10,000/yr
Questions 5.5–5.11 are based on the following
C. 2.2/10,000/yr
example:
D. 9.5/10,000/yr
E. 25.5/10,000/yr
Deep venous thrombosis (DVT) is a serious
5.8. In a population with 100,000 white women,
condition that occasionally can lead to pulmonary
embolism and death (14). The incidence of DVT all of whom take OCs, what is the population-
is increased by several genetic and environmental attributable risk for DVT in women who are
factors, including oral contraceptives (OCs) and a heterozygous for factor V Leiden?
genetic mutation, factor V Leiden. These two risk A. 0.8/10,000/yr
factors, OCs and factor V Leiden, interact. Hetero- B. 1.3/10,000/yr
zygotes for factor V Leiden have 4 to 10 times the C. 2.2/10,000/yr
risk of DVT of the general population. In women D. 9.5/10,000/yr
without the genetic mutation, incidence of DVT E. 25.5/10,000/yr
rises from about 0.8/10,000 women/yr among those
not on OCs to 3.0/10,000 women/yr for those 5.9. What is the relative risk of DVT in women
taking the pill. The baseline incidence of DVT in taking OCs and heterozygous for factor V
heterozygotes for factor V Leiden is 5.7/10,000 Leiden compared to women who take
women/yr, rising to 28.5/10,000 women/yr among OCs but do not carry the mutation?
those taking OCs. Mutations for factor V Leiden
occur in about 5% of whites but are absent in A. 3.8
Africans and Asians. B. 7.1
C. 9.5
D. 28.5
For questions 5.5–5.10., select the one best E. 35.6
answer.
5.10. What is the relative risk of DVT in women
5.5. What is the absolute risk of DVT in taking OCs and without the mutation
women who do not have the mutation and compared to women without the
do not take OCs? mutation who are not taking OCs?
A. 0.8/10,000/yr A. 3.8
B. 1.3/10,000/yr B. 7.1
C. 2.2/10,000/yr C. 9.5
D. 9.5/10,000/yr D. 28.5
E. 25.5/10,000/yr E. 35.6
8 Clinical Epidemiology: The
5.11. Given the information in this study and users died as often as non-users. However,
calculations for questions 5.5–5.10, which aspirin users were sicker and had illnesses
of the following statements about risk of more likely to be treated with aspirin. Which
developing DVT is incorrect? of the following methods is the best way to
account for the propensity of people to take
A. Factor V Leiden modifies the effect of
aspirin?
OCs on the annual risk of developing
DVT by increasing risk from about 3 A. Calculate the absolute risk of
per 10,000 to about 30 per 10,000 cardiovascular death in the two groups
women. and the risk difference attributable to
B. Being heterozygous for factor V Leiden using aspirin.
confers about twice the risk for DVT as B. Create subgroups of aspirin users and
taking OCs. non-users with similar indications for
C. Women heterozygous for factor V Leiden using the medication and compare death
should be advised against taking OCs rates among the subgroups.
because of the high relative risk for DVT C. For each person using aspirin, match
among such women. a non-user on age, sex, and
comorbidity and compare death rates
For question 5.12, select the best answer in the two groups.
REFERENCES
1. Dawber TR. The Framingham Study: The Epidemiology of
results from the National Osteoporosis Risk Assessment
Atherosclerotic Disease. Cambridge, MA: Harvard University
(NORA), Osteoporosis Int 2006; 17:565–574.
Press; 1980.
8. Hofman A, Vandenbroucke JP. Geoffrey Rose’s big idea. Br
2. Kannel WB, Feinleib M, McNamara PM, et al. An investiga-
Med J 1992;305:1519–1520.
tion of coronary heart disease in families. The Framingham
9. Reulen RC, Winter DL, Frobisher C, et al. long-term
Offspring Study. Am J Epidemiol 1979;110:281–290.
cause- specific mortality among survivors of childhood
3. Wen CP, Wai J PM, Tsai MK, et al. Minimum amount of
cancers. JAMA 2010;304:172–179.
phys- ical activity for reduced mortality and extended life
10. Al-Delaimy WK, Rexrode KM, Hu FB, et al. Folate
expectancy: a prospective cohort study. Lancet
intake and risk of stroke among women. Stroke 2004;35:
2011;378:1244–1253.
1259–1263.
4. Madsen KM, Hviid A, Vestergaard M, et al. A population-
11. Rothman KJ. Modern Epidemiology. Boston: Little Brown
based study of measles, mumps, and rubella vaccination
and Co.; 1986.
and autism. N Engl J Med 2002;347:1477–1482.
12. Patrono C, Rodriguez LAG, Landolfi R, et al. Low-dose aspi-
5. The Editors of The Lancet. Retraction—ileal-lymphoid-nod-
rin for the prevention of atherothrombosis. N Engl J Med
ular hyperplasia, non-specific colitis, and pervasive develop-
2005;353:2373–2383.
mental disorder in children. Lancet 2010;375:445.
13. Psaty BM, Koepsell TD, Manolio TA, et al. Risk ratios
6. Geiger AM, Yu O, Herrinton LJ, et al. (on behalf of the CRN
and risk differences in estimating the effect of risk factors for
PROTECTS Group). A case-cohort study of bilateral prophy-
car- diovascular disease in the elderly. J Clin Epidemiol
lactic mastectomy efficacy in community practices. Am J Epi-
1990;43: 961–970.
demiol 2004;159:S99.
14. Vandenbroucke JP, Rosing J, Bloemenkemp KW, et al.
7. Siris ES, Brenneman SK, Barrett-Connor E, et al. The effect
Oral contraceptives and the risk of venous thrombosis. N
of age and bone mineral density on the absolute, excess, and
Engl J Med 2001;344:1527–1535.
rela- tive risk of fracture in postmenopausal women aged
55–99:
Chapter 5: Risk: Exposure to Disease 81
Chapter 6
Example
Head injuries are relatively common among alpine skiers an
82 Clinical Epidemiology: The
YES
CCAASSEESS
(Have disease)
NO
Time
Research
YES
CCOONNTTRROOLLSS
(Do not have disease)
NO
Estimate of relative
risk
Figure 6.1 ■ Design of case-control studies.
USED HELMET
Head
Injur
y
DID NOT CONTROLLED
USE HELMET
Age, Sex
Nationality Skiers and
Skill level snowboarders at 8 major
Equipment used Ski Norweigian
school attendance
ski resorts
USED HELMET Rented or owned equipment
No
Head
DID NOT Injur
USE HELMET y
Figure 6.2 ■ A case-control study of helmet use and head injuries among skiers and snowboarders.
ESTIMATE
(Summary of Sulheim S, Holme I, Ekeland A, et al. Helmet use and risk of head injuries in alpine skiers and
OF RELATIVE
snowboarders. JAMA 2006;295:919–924.)
Chapter 6: Risk: From Disease to Exposure 83
with unusual exposures—the wrong sample if the
The word control comes up in other situations, underly-
too. It is used in experimental studies to refer to
people, animals, or biologic materials that have
not been exposed to the study intervention. In
diagnostic laboratories, “controls” refer to specimens
that have a known amount of the material being
tested for. As a verb, control is used to describe the
process of taking into account, neutralizing, or
subtracting the effects of variables that are
extraneous to the main research question. Here,
the term is used in the context of case-control
studies to refer to people who do not have the
disease or outcome under study.
DESIGN OF CASE-CONTROL
STUDIES
The validity of case-control studies depends on the
care with which cases and controls are selected, how
well exposure is measured, and how completely
potentially confounding variables are controlled.
Selecting Cases
The cases in case-control research should be new (inci-
dent) cases, not existing (prevalent) ones. The
reasons are based on the concepts discussed in
Chapter 2. The prevalence of a disease at a point in
time is a function of both the incidence and
duration of that disease. Duration is in turn
determined by the rate at which patients leave the
disease state (because of recovery or death) or
persist in it (because of a slow course or successful
palliation). It follows from these relation- ships that
risk factors for prevalent disease may be risk factors
for incidence, duration, or both; the relative
contributions of the two cannot be determined. For
example, if prevalent cases were studied, an exposure
that caused a rapidly lethal form of the disease would
result in fewer cases that were exposed, reducing
rela- tive risk and thereby suggesting that exposure is
less harmful than it really is or even that it is
protective.
At best, a case-control study should include all
the
cases or a representative sample of all cases that arise
in a defined population. For example, the bisphos-
phonates study included all residents of Sweden in
2008 and the helmets study all skiers and snowboard-
ers in eight major resorts in Norway (accounting
for 55% of all ski runs in the country).
Some case-control studies, especially older ones,
have identified cases in hospitals and referral cen-
ters where uncommon diseases are most likely to
be found. This way of choosing cases is convenient,
but it raises validity problems. These centers may
attract particularly severe or atypical cases or those
84 Clinical Epidemiology: The
of disease in the cases.
ing research question in case-control studies is
about ordinary occurrences of disease and
exposures.
Also, it is difficult in this situation to be
confident that controls, however they are chosen,
are truly simi- lar to cases in all ways other than
exposure, which is critical to the validity of this
kind of study (see the Selecting Controls section).
Fortunately, it is rarely necessary to take this
scientific risk because there are many databases
that make true population sampling possible.
However the cases might be identified, it
should be possible for both them and controls to be
exposed to the risk factor and to experience the
outcome. For example, in a case-control study of
exercise and sud- den death, cases and control
would have to be equally able to exercise (if they
chose to) to be eligible.
It goes without saying that diagnosis should
be rigorously confirmed for cases (and excluded
for controls), and the criteria made explicit. In the
bisphosphonates study, investigators agreed on
explicit criteria for atypical fractures of the femur
and reviewed all radiographs, not just reports of
them, to classify fracture type. One investigator then
reviewed a random sample of radiographs for a
second time without knowing how each had been
classified, and there was complete agreement
between the original and the second
classifications.
Selecting Controls
Above all, the validity of case-control studies
depends on the comparability of cases and
controls. To be comparable, cases and controls
should be members of the same base population and
have an equal opportu- nity of being exposed. The
best approach to meeting these requirements is to
ensure that controls are a ran- dom sample of all
non-cases in the same population or cohort that
produced the cases.
Example
In the helmets and head injury example (2), the main control gro
Chapter 6: Risk: From Disease to Exposure 87
might be highly related to each other because they
Multiple Controls per Case have similar root causes; education,
Having several control groups per case group should
not be confused with having several controls for each
case. If the number of cases is limited, as is often
so with rare diseases, the study can provide more
infor- mation if there is more than one control per
case. More controls produce a gain in the ability to
detect an increase or decrease in risk if it exists, a
property of a study called “statistical power” (see
Chapter 11). As a practical matter, the gain is
worthwhile up to about three or four controls per
case, after which little is gained by including even
more controls.
Matching
If some characteristics seem especially strongly
related to either exposure or disease, such that one
would want to be sure that they occur similarly in
cases and controls, they can be matched. With
matching, for each case with a set of
characteristics, the study includes one or more
controls that possess the same characteristics.
Researchers commonly match for age and sex,
because these are often strongly related to both
exposure and disease, but matching may extend
beyond these demographic characteristics (e.g., to
risk profile or disease severity) when other factors
are known to be strongly associated with an
exposure or outcome. Matching increases the
useful information obtainable from a set of cases and
controls by reducing differences between groups in
determinants of disease other than the one being
considered, thereby allowing a more powerful
(sensitive) measure of association.
Sometimes, cases and controls are made
compa-
rable by umbrella matching, matching on a vari-
able such as hospital or community that is a proxy
for many other variables that could confound the
exposure–disease relationship and would be difficult
to measure one at a time, if that were possible at
all. Examples of variables that might be captured
under an umbrella include social disadvantage
related to income, education, race, and ethnicity;
propensity to seek health care or follow medical
advice; and local patterns of health care.
Matching can be overdone, biasing study results.
Overmatching can occur if investigators match on
variables so closely related to exposure that expo-
sure rates in cases and controls becomes more
simi- lar than they are in the population. The result
is to make the observed estimate of relative risk
closer to 1 (no effect). There are many reasons why
the match- ing variable might be related to
exposure. It may be part of the chain of events
leading from exposure to disease. Other variables
88 Clinical Epidemiology: The
Measuring Exposure
The validity of case-control studies also depends on
avoiding misclassification when measuring expo-
sure. The safest approach is to depend on
complete, accurate records that were collected
before disease developed. Examples include
pharmacy records for studies of prescription drug
risks, surgical records for studies of surgical
complications, and stored blood specimens for
studies of risk related to biomolecular
abnormalities. With such records, knowledge of dis-
ease status cannot bias reporting of exposure.
However, many important exposures can only
be measured by asking cases and controls or their
proxies about them. Among these are exercise, diet,
and over- the-counter and recreational drug use.
The following example illustrates how investigators
can “harden” data from interviews that are
inherently vulnerable to bias.
Example
What are the risk factors for suicide in China? Investigators stu
Chapter 6: Risk: From Disease to Exposure 89
Multiple Exposures
Thus far, we have described case-control studies of a bleeding, loss of appetite, increased urinary
single, dichotomous exposure, but case-control stud- fre- quency, abdominal pain, rectal
ies are an efficient means of examining a far richer bleeding, and abdominal bloating. After
array of exposures: the effects of multiple exposures, excluding symptoms reported in the 180
various doses of the same exposure, and exposures days before diagnosis (to get a better
that are early symptoms (not risk factors) of disease. estimate of “early” symptoms), three
remained independently associated with
ovarian cancer: abdominal distension, urinary
Example
Ovarian cancer is notoriously difficult to diag- THE ODDS RATIO: AN
nose early in its course when treatment might ESTIMATE OF RELATIVE RISK
be more effective. Investigators in England did a
case-control study of symptoms of ovarian cancer Figure 6.3 shows the dichotomous classification of
in primary care (6). Cases were 212 women over exposure and disease typical of both cohort and case-
40 years of age diagnosed with primary ovarian control studies and compares how risk is
cancer in 39 general practices in Devon, England, calculated differently for the two. These concepts
2000–2007; 1,060 controls without ovarian can- are illustrated with the bisphosphonates study,
cer were matched to cases by age and practice.
which had both a cohort and a case-control
Symptoms were abstracted from medical records
for the year before diagnosis. Seven symptoms component.
were independently associated with ovarian In the cohort study, participants were divided
cancer: abdominal distension, postmenopausal into two groups at the outset—exposed to
bisphosphonates (a b) and not exposed to
bisphosphonates (c d ). Cases of atypical fracture
emerged naturally over time in the exposed group (a)
and the unexposed group (c). This provides
appropriate numerators and denominators to calculate
the incidence of atypical fracture in the exposed
Cases Noncases
a+b
Exposed a b
Not c+d
exposed c d
a+c b+d
c/(a + c) = a/c = ad
b/(b + d)b/dbc d/(b + d)
Figure 6.3 ■ Calculation of relative risk from a cohort study and odds
ratio (estimated relative risk) from a case-control study.
Chapter 6: Risk: From Disease to Exposure 91
[a/(a b)] and unexposed [c/(c d)] members of The meaning of the odds ratio is analogous to
the cohort. It was also possible to calculate the relative that of relative risk obtained from cohort studies.
risk: If the frequency of exposure is higher among cases,
the odds ratio will exceed 1, indicating increased
Relative risk risk. The stronger the association is between the
Incidence of disease in the exposed exposure and disease, the higher the odds ratio.
=Incidence of disease in the unexposed Conversely, if the frequency of exposure is lower
among cases, the odds ratio will be 1, indicating
a / (a + b) protection. Because of the similarity of the
=c / (c + d ) information conveyed by an odds ratio and the
relative risk, and the meaning more readily attached
The case-control study, on the other hand,
to relative risk, odds ratios are often reported as
began with the selection of a group of cases of
estimated relative risks.
atypical frac- ture (a c) and a group of controls An odds ratio is approximately equal to a
with other fractures (b d ). There is no way of relative risk when the incidence of disease is low. To
knowing disease rates because these groups are see this mathematically, look at the formula for
determined not by nature but by the investigators’ relative risk in Figure 6.3. If the number of cases
selection criteria. Therefore, it is not possible to in the exposed group (a) is small relative to the
compute the incidence of disease among people number of non-cases in that group (b) then a/(a b)
exposed and not exposed to bisphosphonates; and it is approximately equal to a/b. Similarly, if the
is not possible to calculate a relative risk. What does number of cases in the non-exposed group (c) is
have meaning, however, is the relative frequency of small relative to non-cases in that group (d), then
exposure to bisphosphonates among cases and c/(c d) is approximated by c/d. Then, relative risk
controls. One approach to comparing the frequency a/b divided by c/d, which simpli- fied to ad/bc, the
of exposure among cases and controls provides a odds ratio.
mea- sure of risk that is conceptually and How low must the rates be for the odds ratio to be
mathematically similar to relative risk. The odds an accurate estimate of relative risk? The answer
ratio is defined as the odds that a case is exposed depends on the size of the relative risk (7). In
divided by the odds general, bias in the estimate of relative risk
that a control is exposed: becomes large enough to matter as disease rates in
unexposed people become greater than about 1/100
a ac or perhaps 5/100. As out- comes become more
frequent, the odds ratio tends to overestimate the
a c
c relative risk when it is 1 and underes- timate the
relative risk when it is 1. Fortunately, most diseases,
particularly those examined by means of case- control
bdb studies, have considerably lower rates.
b d
Earlier in this chapter, we described why case-
control studies should be about incident (new onset)
cases, not prevalent ones. Nevertheless, prevalence
Which simplifies to: odds ratios are commonly calculated for prevalence
studies and reported in the medical literature. The
bc ,
a prevalence odds ratio is a measure of association but
not a very informative one, not only because of
Example
A large outbreak of gastroenteritis, with many cases complicated by hemolytic-uremic syn- drome (a potentially fatal co
This example also illustrates how case-control and
cohort studies, laboratory studies of the responsible
organism, and “shoe-leather” epidemiology during
trace-back acted in concert to identify the underlying
cause of the epidemic.
94 Clinical Epidemiology: The
250
200
150
Number of
100
50
0
5 10 15 20 25 30 5 10 15 20 25 30 5
May June July
Date of disease
onset
Figure 6.4 ■ Epidemic curve of an outbreak of Shiga-toxin-producing Escherichia coli infection in
Germany. (Redrawn with permission from Frank C, Werber D, Cramer JP, et al. Epidemic profile of shiga-toxin-
producing Escherichia coli 0104:H4 outbreak in Germany. N Engl J Med 2011;365:1771–1780.)
Revie w Question s
Read the following and select the best response. more than 520,000 participants, of vitamin
D concentration and the risk of colon
6.1. In a case-control study of oral contraceptives cancer. They studied 1,248 cases of incident
and myocardial infarction (heart attack), colon cancer arising in the cohort and an
exposure to birth control pills was abstracted equal number of controls, sampled from the
from medical records at the time of the myo- same cohort and matched by age, sex, and
cardial infarction. Results might be biased study center. Vitamin D was measured in
toward finding an association by all of the blood samples taken years before diagnosis.
following except: Vitamin D levels were lower in patients
A. Physicians might have asked about use with colon cancer, independent of a rich
of birth control pill use more carefully in array of poten- tially confounding variables.
cases. The study results could be described by any
B. Having a myocardial infarction might of the following except:
have led to oral contraceptive use. A. Vitamin D levels were associated
C. Physicians might have been more likely to with colorectal cancer.
record birth control use in cases. B. Vitamin D deficiency was a risk factor for
D. Medical record abstractors might have colorectal cancer.
looked for evidence of oral contraceptive C. Nesting the study in a large cohort was a
use more carefully if they knew a patient strength of the study.
had had a myocardial infarction. D. The results might have been confounded
E. Patients might have recalled exposure with unmeasured variables related to
more readily when they had a heart attack. vitamin D levels and colorectal cancer.
E. Vitamin D deficiency was a cause
6.2. Investigators in Europe did a case-control of colorectal cancer.
study, nested in a multicountry cohort of
Chapter 6: Risk: From Disease to Exposure 95
systematically different from cases (other
6.3. Which of the following is the most direct than on the exposure of interest).
result of a case-control study?
A. Prevalence
B. Risk difference
C. Relative risk
D. Incidence
E. Odds ratio
6.4. The epidemic curve for an acute infectious
disease describes:
A. The usual incubation period for the causal
agent
B. A comparison of illness over time in
exposed versus non-exposed people
C. The onset of illness in cases over time
D. The duration of illness, on average, in
affected individuals
E. The distribution of time from infection
to first symptoms
6.5. Which of the following is the best reason for
doing a case-control analysis of a cohort
study?
A. Case-control studies are a feasible way
of controlling for confounders not
found in the cohort dataset.
B. Case-control studies can provide all the
same information more easily.
C. Case-control studies can determine
incidence of disease in exposed and non-
exposed members of the cohort.
D. Case-control studies are in general
stronger than cohort studies.
6.6. The best way to identify cases is to obtain
them from:
A. A sample from the general (dynamic)
population
B. Primary care physicians’ offices
C. A community
D. A cohort representative of the population
E. A hospital
6.13. In an outbreak of acute gastroenteritis, a case- B. Are complications more common with
control study would be especially useful for fiberoptic cholecystectomy than with
identifying: conventional (open) surgery?
C. Is drinking alcohol a risk factor for breast
A. Characteristics of the people affected
cancer?
B. The number of people affected over time
D. How often do complications occur after
C. The microbe or toxin causing the
fiberoptic cholecystectomy?
outbreak
E. How effective are antibiotics for otitis
D. The mode of transmission
media?
E. Where the causative agent originated
6.16. In a case-control study of airplane flight
6.14. Sampling cases and controls from a defined
and thrombophlebitis, all of the following
population or cohort accomplishes which of
conditions should be met for the odds ratio
the following?
to be a reasonable estimate of relative risk
A. It is the only way of including incident except:
(new) cases of disease.
A. Controls were sampled from the same
B. It avoids the need for inclusion and
population as cases.
exclusion criteria.
B. Cases and controls met the same inclusion
C. It tends to include cases and controls
and exclusion criteria.
that are similar to each other except for
C. Other variables that might be related to
exposure.
air travel and thrombosis were controlled
D. It matches cases and controls on
for.
important variables.
D. Cases and controls were equally
E. It ensures that the results are generalizable.
susceptible to developing thrombophlebitis
(e.g., were equally mobile, weight, recent
6.15. Case-control studies would be useful for
trauma, previous VTE) other than for air
answering all of the following questions
travel.
except:
E. The incidence of thrombophlebitis was
A. Do cholesterol-lowering drugs prevent more than 5/100.
coronary heart disease?
Answers are in Appendix A.
REFERENCES
1. Schilcher J, Michaelsson K, Aspenberg P. Bisphosphonate use
6. Hamilton W, Peters TJ, Bankhead C, et al. Risk of ovarian
and atypical fractures of the femoral head. New Engl J Med
cancer in women with symptoms in primary care:
2011;364:1728–1737.
2. Sulheim S, Holme I, Ekeland A, et al. Helmet use and risk population- based case-control study. BMJ 2009;339:b2998.
doi:10.1136/ bmj.b2998
of head injuries in alpine skiers and snowboarders. JAMA
7. Feinstein AR. The bias caused by high value of incidence for
2006;295:919–924.
3. Knol MJ, Vandenbroucke JP, Scott P, et al. What do case- p1 in the odds ratio assumption that 1-p1 is approximately
equal to 1. J Chron Dis 1986;39:485–487.
control studies estimate? Survey of methods and assumptions
8. Frank C, Werber D, Cramer JP, et al. Epidemic profile of
in published case-control research. Am J Epidemiol 2008;168:
shiga-toxin-producing Escherichia coli 0104:H4 outbreak in
1073–1081.
4. Phillips MR, Yang G, Zhang Y, et al. Risk factors for Germany. N Engl J Med 2011;365:1771–1780.
9. Buchholz U, Bernard H, Werber D et al. German outbreak
suicide in China: a national case-control psychological autopsy
of Escherichia coli 0104:H4 associated with sprouts. N Engl J
study. Lancet 2002;360:1728–1736.
5. Psaty BM, Koepsell TD, LoGerfo JP, et al. Beta-blockers and Med 2011;365:1763–1770.
primary prevention of coronary heart disease in patients
with high blood pressure. JAMA 1989;261:2087–2094.
98 Clinical Epidemiology: The
Chapter 7
Prognosis
He, who would rightly distinguish those that will survive or die, as well as those that
will be subject to disease a longer or shorter time, ought, from his knowledge and
attention, to be able to form an estimate of all symptoms, and rationally to weigh their
powers by comparison.
—Hippocrates
460–375 B.C.
93
9 Clinical Epidemiology: The
The Patients Are Different Clinicians can often form good estimates of short-
term prognosis from their own personal experience.
Studies of risk factors usually deal with healthy people, However, they may be less able to sort out,
whereas studies of prognostic factors are of sick people. without the assistance of research, the various
factors that are related to long-term prognosis or
The Outcomes Are Different the complex ways in which prognostic factors are
For risk, the event being counted is usually the onset related to one another.
of disease. For prognosis, consequences of disease
are counted, including death, complications, CLINICAL COURSE AND NATURAL
disability, and suffering. HISTORY OF DISEASE
The Rates Are Different Prognosis can be described as either the clinical
course or natural history of disease. The term clinical
Risk factors are usually for low-probability events.
course describes the evolution (prognosis) of a
Yearly rates for the onset of various diseases are on
disease that has come under medical care and has
the order of 1/1,000 to 1/100,000 or less. As a result,
been treated in a vari- ety of ways that affect the
relationships between exposure and disease are diffi-
subsequent course of events. Patients usually receive
cult to confirm in the course of day-to-day clinical
medical care at some time in the course of their illness
experiences, even for astute clinicians. Prognosis, on
when they have diseases that cause symptoms such as
the other hand, describes relatively frequent events.
pain, failure to thrive, disfigurement, or unusual
For example, several percent of patients with acute
behavior. Examples include type 1 diabetes mellitus,
myocardial infarction die before leaving the hospital.
carcinoma of the lung, and rabies. After such a
disease is recognized, it is likely to be treated.
The Factors May be Different
The prognosis of disease without medical
Variables associated with an increased risk are not interven- tion is termed the natural history of
necessarily the same as those marking a worse progno- disease. Natural history describes how patients fare if
sis. Often, they are considerably different for a given nothing is done about their disease. A great many
disease. For example, the number of well-established health conditions do not come under medical care,
risk factors for cardiovascular disease (hypertension, even in countries with advanced health care
smoking, dyslipidemia, diabetes, and family history systems. They remain unrecog- nized because they
of coronary heart disease) is inversely related to are asymptomatic (e.g., many can- cers of the
the risk of dying in the hospital after a first prostate are occult and slow growing) and are,
myocardial infarction (Fig. 7.1) (1). therefore, unrecognized in life. For others, such as
osteoarthritis, mild depression, or low-grade anemia,
people may consider their symptoms to be one of the
ordinary discomforts of daily living, not a disease
and, therefore, not seek medical care for them.
2
Risk
Exposed
NO
COHORT Time
YES
Not exposed
NO
Figure 7.2 ■ Design of a cohort study of risk.
ELEMENTS OF PROGNOSTIC
STUDIES
less, complication rates in newborns were
Figure 7.2 shows the basic design of a cohort study much higher than in the general population.
of prognosis. At best, studies of prognosis are of a Neona- tal morbidity (one or more
defined clinical or geographic population, begin complications) oc- curred in 80% of infants
observation at a specified point in time in the and rates of congenital malformations and
course of disease, follow-up all patients for an unusually large newborns (macrosomia)
adequate period of time, and measure clinically were three-fold to 12-fold higher than in
important outcomes. the general population. This study sug-
gests that good control of blood sugar alone
Patient Sample was not sufficient to prevent complications
of pregnancy in women with type 1
The purpose of representative sampling from a
defined population is to assure that study results
have the greatest possible generalizability. It is Even without national medical records, popula-
sometimes possi- ble to study prognosis in a complete tion-based studies are possible. In the United
sample of patients with new-onset disease in large States, the Network of Organ Sharing collects data
regions. In some coun- tries, the existence of national on all patients with transplants, and the
medical records makes population-based studies of Surveillance, Epi- demiology, and End Results
prognosis possible. (SEER) program collects incidence and survival data
on all patients with new- onset cancers in several
large areas of the country, comprising 28% of the
Example
Dutch investigators studied the risk of compli- cations of pregnancy in women with type 1 dia- betes mellitus (4). The sa
Example
Positron emission tomography (PET) scans, a sensitive test for metastases, are now used to stage non–small cell lung canc
60 60
40
40
20
20
012345 012345
YearsYears
Percent
80 80
60 60
40 40
20
20
012345 012345
Years Years
Figure 7.3 ■ A limitation of 5-year survival rates: Four conditions with the
same 5-year survival rate of 10%.
80 80
60 60
Number of
40
40
20
20
012345 012345
Time (years)Time (years)
Figure 7.4 ■ Survival of two cohorts, small and large, when all members are
observed for the full period of follow-up.
8 at risk
Probability
20 5 at risk
1 died
3 Censored 4 still alive 2 at risk
of
10 1 died
2 Censored 1 at risk
1 still alive
0 Censored
3 4 5
Time (years)
100
80
Probability
60
of
40
20
0
1 2 3 4 5
Time (years)
Figure 7.5 ■ Example of a survival curve, with detail for one part of the curve.
Chapter 7: Prognosis 101
no one dies and the probability of surviving is 1. that the estimates on the left-hand side of the curve
When a patient dies, the probability of surviving at are sound, because more patients are at risk early
that moment is calculated as the ratio of the in follow-up. But on the right-hand side, at the tail
number of patients surviving to the number at risk of the curve, the number of patients on whom esti-
of dying at that point in time. Patients who have mates of survival are based may become relatively
already died, dropped out, or have not yet been small because deaths, dropouts, and late entrants
followed up to that point are not at risk of dying to the study, so that fewer and fewer patients are
and are, therefore, not used to estimate survival for fol- lowed for that length of time. As a result,
that time. The probability of surviving does not estimates of survival toward the end of the follow-
change during intervals in which no one dies, so it up period are imprecise and can be strongly
is recalculated only when there is a death. Although affected by what happens to relatively few
the probability at any given interval is not very patients. For example, in Figure 7.5, only one
accurate, because either nothing has happened or patient was under observation at year 5. If that one
there has been only one event in a large cohort, the remaining patient happened to die, the probability
overall probability of surviving up to each point in of surviving would fall from 8% to zero. Clearly,
time (the product of all preceding probabilities) is this would be a too literal reading of the data.
remarkably accurate. When patients are lost from Therefore, estimates of survival at the tails of
the study at any point in time, they are referred to survival curves must be interpreted with caution.
as censored and are no longer counted in the Finally, the shape of many survival curves gives
denominator from that point forward. the impression that outcome events occurs more fre-
A part of the survival curve in Figure 7.5 (from quently early in follow-up than later on, when the
3 to 5 years after zero time) is presented in detail slope approaches a plateau. But this impression is
to illustrate the data used to estimate survival: deceptive. As time passes, rates of survival are being
patients at risk, patients no longer at risk applied to a diminishing number of patients, causing
(censored), and patients experiencing outcome the slope of the curve to flatten even if the rate of
events at each point in time. outcome events did not change.
Variations on basic survival curves increase the As with any estimate, Kaplan-Meier estimates of
amount of information they convey. Including the time to event depend on assumptions. It is
num- bers of patients at risk at various points in assumed that being censored is not related to
time gives some idea of the contribution of chance to prognosis. To the extent that this is not true, a
the observed rates, especially toward the end of survival analysis may yield biased estimates of
follow-up. The vertical axis can show the survival in cohorts. The Kaplan-Meier method may
proportion with, rather than without, the outcome not be accurate enough if there are competing
event; the resulting curve will sweep upward and risks—more than one kind of outcome event—and
to the right. The precision of survival estimates, the outcomes are not indepen- dent of each other
which declines with time because fewer and fewer such that one event changes the probability of
patients are still under observation, can be shown by experiencing the other. For example, patients with
confidence intervals at various points in time (see cancer who develop an infection related to
Chapter 11). Tics are sometimes added to the aggressive chemotherapy and drop out for that
survival curves to indicate each time a patient is reason may have had a different chance of dying
censored. of the cancer. There are other methods for
estimating cumulative incidence in the presence of
Interpreting Survival Curves competing risks.
Example
60 Group 2
Survival
40 Group 3
Physicians caring for children with rattlesnake
20 Group 4 bites would be grateful for this and other case
Group 5 series if there were no better information to guide
0 their care, but that is not to say that the case series
0 24 pro- vided a complete and fully reliable picture of
48 72
Months snake-
96
bite care. Of all children bitten in that region, some
Figure 7.6 ■ Example of prognostic stratification. Sur- an integrated staging system. J Clin Oncol 2001;19:1649–
vival from surgery in a patient with renal cell cancer 1657.)
accord- ing to prognostic strata. (Redrawn with
permission from Zisman A, Pantuck AJ, Dorey F, et al.
Improved prognosti- fication for renal cell carcinoma using
Chapter 7: Prognosis 103
may have been doing so well after a bite that they
were not sent to the referral center. Others might
have been doing so badly that they were rushed to
the nearest hospital or even died before reaching
a hospital at all. In other words, the case series
does not describe the clinical course of all
children from
1 Clinical Epidemiology: The
They are a separate consideration from confounding Sampling bias can also be misleading when prog-
and effect modification (Chapter 5). nosis is compared across groups and sampling has
There are an almost infinite variety of systematic produced groups that are systematically different
errors, many of which are given specific names, with respect to prognosis, even before the factor of
but some are more basic. They can be recognized interest is considered. In the Bell palsy example,
more easily when one knows where they are most older patients might have had a worse prognosis
likely to occur in the course of a study. With that in because they are the ones who had an underlying
mind, we describe some possibilities for bias in herpes virus infection, not because of their age.
cohort studies and discuss them in relation to the Is this not confounding? Strictly speaking, it is not
following study of prognosis. because the study is for the purpose of prediction, not
to identify independent “causes” of recovery. Also, it
Example
Bell’s palsy is the sudden, one-sided, unex- plained onset of weakness of the face in the area innervated by the facial ne
Measurement Bias
Measurement bias is present when members of cigarette products in saliva. Yet in a cohort
the cohort are not all assessed similarly for outcome. study PERHAPS,
BIAS, of cigarette smoking
BUT DOES and coronary
In the Bell palsy study, all members of the cohort were heart disease (CHD), misclassification of
IT smoking
MATTER? could not be different in people
exam- ined by a common protocol every month
until they were no longer improving, ruling out this who did or did not develop
Clinical epidemiology is not an CHDerror-finding
because
possibility. If it had been left to individual patients the outcome was not known
game. Rather, it is meant to characterizeat the time the
and physicians whether, when, and how they were credibility of a study so that clinicians can the
exposure was as- sessed. Even so, to decide
examined, this would have diminished confidence in howextent
much tothatrely onsmoking is incorrectly
its results when making high-
the description of time to and completeness of classified, it reduces whatever differences
stakes decisions about patients. It would be
recovery. Measurement bias also comes into play if in CHD rates in smokers and non- smokers
irresponsible to ignore results of studies that meet
prognostic groups are com- pared and patients in one highthat might have existed if all patients had
standards, just as clinical decisions need not be
group have a systematically better chance of having been correctly classified, making a “null”
bound by the results of weak studies.
outcomes detected than those in another. Some effect more likely. At the extreme, if
With this in mind, it is not enough to recognize
outcomes, such as death, cardiovas- cular classifying smoking status were totally at
that bias might be present in a study. One must go on
catastrophes, and major cancers, are so obvious that random, there could be no association
to determine if bias is actually present in the particu-
they are unlikely to be missed. But for less clear- cut between smoking and CHD.
lar study. Beyond that, one must decide whether
outcomes, including specific cause of death, sub- the consequences of bias are sufficiently large that
clinical disease, side effects, or disability, measurement they change the conclusions in a clinically important
bias can occur because of differences in the methods way. If damage to the study’s conclusions is not very
with which the outcome is sought or classified. great, then the presence of bias is of little practical
Measurement bias can be minimized in three conse- quence and the study is still useful.
gen-
eral ways: (i) examine all members of the cohort equally
for outcome events; (ii) if comparisons of SENSITIVITY ANALYSIS
prognostic groups are made, ensure that researchers
are unaware of the group to which each patient One way to decide how much bias might change
belongs; and (iii) set up careful rules for deciding if the conclusions of a study is to do a sensitivity
an outcome event has occurred (and follow the anal- ysis, that is, to show how much larger or
rules). To help readers under- stand the extent of these smaller the observed results might have been
kinds of biases in a given study, it is usual practice to under various assumptions about the missing data
include, with reports of the study, a flow diagram or potentially biased measurements. A best-
describing how the number of partici- pants changed case/worst-case analysis tests the effects of the
as the study progressed and why. It is also helpful most extreme possible assumptions but is an
to compare the characteristics of patients in and out unreasonably severe test for the effects of bias in
of the study after sampling and follow-up. most situations. More often, sensi- tivity analyses
test the effects of somewhat unlikely values, as in
Bias from “Non-differential” the following example.
Misclassification
Until now, we have been discussing how the results
of a study can be biased when there are systematic
differ- ences in how exposure or disease groups are
classified. But bias can also result if
misclassification is “non- differential,” that is, it
occurs similarly in the groups being compared. In
this case, the bias is toward find- ing no effect.
Example
When cigarette smoking is assessed by simply asking people whether they smoke, there is substantial misclassification re
1 Clinical Epidemiology: The
Example
syndrome
Poliomyelitis has been eradicated in many parts of the world but as those
late effects who were,
of infection con-the true
tinue. ratepatients deve
Some
would have
If the missing members of the cohort had different rates of post-polio been from
syndrome (137those
194)/939
who were followed-up,
35%. how
(That is, all 137 patients known to have the
syndrome plus 50% of those who were not
followed up divided by all members of the
original cohort.) If the missing patients
were half as likely to get the syndrome, the
true rate would have been (137 48)/939
20%. Thus, even with an improbably large
difference in post-polio syn- drome rates in
missing patients, the true rate would still
have been in the 20% to 35% range, a useful
Revie w Question s
Read the following and select the best of the children were no longer in the study
response. when outcome (a second seizure) was
assessed at 1 year. Which of the following
7.1. For a study of the risk of esophageal cancer in would have the greatest effect on study
patients with Barrett esophagus (a precancer- results?
ous lesion), which of the following times in
the course of disease is the best example of A. Why the children dropped out
zero time? B. When in the course of follow-up
the children left the study
A. Diagnosis of Barrett esophagus for C. Whether dropping out was related
each patient to prognosis
B. Death of each patient D. Whether the number of children
C. Diagnosis of esophageal cancer for dropping out is similar in the groups
each patient
D. Calendar time when the first patient is 7.3. A cohort study of prostate cancer care com-
enrolled in the study pares rates of incontinence in patients who
E. Calendar time when no patient remains were treated with surgery versus medical
in the study care alone. Incontinence is assessed from
review of medical records. Which of the
7.2. A cohort study describes the recurrence of following is not an example of
seizure within 1 year in children hospital- measurement bias?
ized with a first febrile seizure. It compared
recurrence in children who had infection A. Men were more likely to tell their surgeon
versus immunization as an underlying cause about incontinence.
for fever at the time of the first seizure. Some B. Surgeons were less likely to record
complications of their surgery in the
record.
Chapter 7: Prognosis 109
C. Men who got surgery were more likely to 7.7. Which of the following kinds of studies
have follow-up visits. cannot be used to identify prognostic factors?
D. Chart abstractors used their judgment
A. Prevalence study
is deciding whether incontinence was
B. Time-to-event analysis
present or not.
C. Case-control study
E. Rates of incontinence were higher in the
D. Cohort study
men who got surgery.
7.8. Which of the following best describes the
7.4. A clinical prediction rule has been developed
information in a survival curve?
to classify the prognosis of community-
acquired pneumonia. Which of the following A. An unbiased estimate of survival even
is most characteristic of such a clinical predic- if some patients leave the study
tion rule? B. The estimated probability of survival from
A. Calculating a score is simple. zero time
B. The clinical data are readily available. C. The proportion of a cohort still alive at
C. Multiple prognostic factors are included. the end of follow-up
D. The rate at which original members of the
D. The results are used to guide further
cohort leave the study
management of the patient.
E. The cumulative survival of a cohort
E. All of the above.
over time
7.5. A study describes the clinical course of
7.9. Which of the following is the most appropri-
patients who have an uncommon neurologic
ate sample for a study of prognosis?
disease. Patients are identified at a referral
center that specializes in this disease. Their A. Members of the general population
medical records are reviewed for patients’ B. Patients in primary care in the community
characteristics, treatments, and their current C. Patients admitted to a
disease status. Which of the following best community hospital
describes this kind of study? D. Patients referred to a specialist
E. It depends on who will use the results of
A. Cohort study
the study.
B. Case-control study
C. Case series
7.10. Investigators wish to describe the clinical
D. Cross-sectional study
course of multiple sclerosis. They take advan-
E. A randomized controlled trial
tage of a clinical trial, already completed, in
which control patients received usual care.
7.6. A study used time-to-event analysis to
Patients in the trial had been identified at
describe the survival from diagnosis of 100
referral centers, had been enrolled at the time
patients with congestive heart failure. By the
of diagnosis, and had met rigorous entry
third year, 60 patients have been censored.
criteria. After 10 years, all patients had been
Which of the following would not be a reason
examined yearly and remained under observa-
for one of these patients being censored?
tion, and 40% were still able to walk. Which
A. The patient died of another cause before of the following most limits the credibility of
year 3. this study?
B. The patient decided not to continue in
A. Inconsistent zero time
the study.
B. Generalizability
C. The patient developed another disease that
C. Measurement bias
could be fatal.
D. Migration bias
D. The patient had been enrolled in the
E. Failure to use time-to-event methods
study for less than 3 years.
1 Clinical Epidemiology: The
7.11. Many different clinical prediction rules A. In this study, it is the rate of continued
have been developed to assess the severity smoking divided by the rate of quitting.
of community-acquired pneumonia. Which B. It can be estimated from a case-control
of the following is the most important study of smoking and amputation.
reason for choosing one to use? C. It cannot be adjusted for the presence
A. The prediction rule classifies patients into or absence of other factors related to
groups with very different prognosis. prognosis.
B. The prediction rule has been validated in D. It conveys information similar to relative
different settings. risk.
C. Many variables are included in the rule. E. It is calculated from the cumulative
D. Prognostic factors include state-of-the- incidence of amputation in smokers and
science diagnostic tests. quitters.
E. The score is calculated using computers.
7.13. In a time-to-event analyses, the event:
7.12. In a study of patients who smoke and A. Can occur only once
developed peripheral arterial disease, the B. Is dichotomous
hazard ratio for amputation in patients C. Both A and B
who continued to smoke, relative to those D. Neither A nor B
who quit smoking, is 5. Which of the
following best characterizes the hazard Answers are in Appendix
ratio?
A.
REFERENCES
1. Canto JG, Kiefe CI, Rogers WJ, et al. Number if coronary cer. The Will Rogers phenomenon revisited. Arch Intern Med
heart disease risk factors and mortality in patients with 2008;168:1541–1549.
first myocardial infarction. JAMA 2011;306:2120–2127. 7. Zisman A, Pantuck AJ, Dorey F, et al. Improved
2. Talley NJ, Zinsmeister AR, Van Dyke C, et al. Epidemiol- prognosti- fication for renal cell carcinoma using an
ogy of colonic symptoms and the irritable bowel integrated staging system. J Clin Oncol 2001;19:1649–
syndrome. Gastroenterology 1991;101:927–934. 1657.
3. Ford AC, Forman D, Bailey AG, et al. Irritable bowel syn- 8. Shaw BA, Hosalkar HS. Rattlesnake bites in children: anti-
drome: A 10-year natural history of symptoms and factors venin treatment and surgical indications. J Bone Joint Surg
that influence consultation behavior. Am J Gastroenterol Am 2002;84-A(9):1624.
2007; 103:1229–1239. 9. Gage BF, Waterman AD, Shannon W, et al. Validation of
4. Evers IM, de Valk HW, Visser GHA. Risk of complications clinical classification schemes for predicting stroke. Results
of pregnancy in women with type 1 diabetes: Nationwide from the National Registry of Atrial Fibrillation. JAMA 2001;
prospective study in the Netherlands. BMJ 2004;328:915– 285:2864–2870.
918. 10. Go AS, Hylek EM, Chang Y, et al. Anticoagulation therapy
5. Feinstein AR, Sosin DM, Wells CK. The Will Rogers for stroke prevention in atrial fibrillation. How well do
phenom- enon: stage migration and new diagnostic techniques randomized trials translate into clinical practice? JAMA
as a source of misleading statistics for survival in cancer. N 2003;290:2685–2692.
Engl J Med 1985;312:1604–1608. 11. Peitersen E. Bell’s palsy: the spontaneous course of 2,500
6. Chee KG, Nguyen DV, Brown M, et al. Positron emission peripheral facial nerve palsies of different etiologies. Acta
tomography and improved survival in patients with lung can- Otolaryngol 2002;(549):4–30.
12. Ramlow J, Alexander M, LaPorte R, et al. Epidemiology
of post-polio syndrome. Am J Epidemiol 1992;136:769–785.
Chapter 7: Prognosis 111
Chapter 8
Diagnosis
Appearances to the mind are of four kinds. Things either are what they appear to be;
or they neither are, nor appear to be; or they are, and do not appear to be; or they
are not, yet appear to be. Rightly to aim in all these cases is the wise man’s task.
—Epictetus†
2nd century
A.D.
Example DISEASE
The use of blood pressure data to decide about therapy is an example of how informa- tion can be simplified for practic
Present Absent
True False
Positive
positive positive
a b
TEST
c d
Negative False True
negative negative
and two that are wrong (false). The test has given the
correct result when it is positive in the presence of
dis- ease (true positive) or negative in the absence of
the disease (true negative). On the other hand, the
test has been misleading if it is positive when the
disease is absent (false positive) or negative when
the disease is present (false negative).
Lack of
Information on
Negative Tests
The goal of all clinical studies aimed at describing
the value of diagnostic tests should be to obtain
data for all four of the cells shown in Figure 8.1.
With- out all these data, it is not possible to fully
evaluate the accuracy of the test. Most information
about the value of a diagnostic test is obtained from
clinical, not research, settings. Under these
circumstances, physi- cians are using the test in the
care of patients. Because of ethical concerns, they
usually do not feel justified in proceeding with more
exhaustive evaluation when preliminary diagnostic
tests are negative. They are naturally reluctant to
initiate an aggressive workup, with its associated
risk and expense, unless prelimi- nary tests are
positive. As a result, data on the number of true
negatives and false negatives generated by a test
(cells c and d in Fig. 8.1) tend to be much less
complete in the medical literature than data collected
about positive test results.
This problem can arise in studies of screening
tests
because individuals with negative tests usually are
not subjected to further testing, especially if the
testing involves invasive procedures such as
biopsies. One method that can get around this
problem is to make use of stored blood or tissue
banks. An investigation of prostate-specific antigen
(PSA) testing for prostate cancer examined stored
blood from men who subse- quently developed
prostate cancer and men who did not develop
prostate cancer (2). The results showed that for a
PSA level of 4.0 ng/mL, sensitivity over the
Chapter 8: Diagnosis 111
without requiring further testing on people
with negative test results. (See the following
text for defini- tions of sensitivity and
specificity.)
Lack of Information
on Test Results in the
Nondiseased
Some types of tests are commonly
abnormal in peo- ple without disease or
complaints. When this is so, the test’s
performance can be grossly misleading when
the test is applied to patients with the
condition or complaint.
Example
Lack of Objective
Standards for Disease
Magnetic resonance imaging (MRI) of the lum- bar spine is used in
For some conditions, there are simply no
hard-and- fast criteria for diagnosis. Angina
pectoris is one of these. The clinical
manifestations were described nearly a
century ago, yet there is still no better way
to substantiate the presence of angina pectoris
than a carefully taken history. Certainly, a
great many objec- tively measurable
phenomena are related to this clini- cal
syndrome, for example, the presence of
coronary artery stenosis on angiography,
delayed perfusion on a thallium stress test,
and characteristic abnormalities on
electrocardiograms both at rest and with
exercise. All are more commonly found in
patients believed to have angina pectoris, but
none is so closely tied to the clinical
syndrome that it can serve as the standard by
which the condition is considered present or
absent.
Other examples of medical conditions
difficult to diagnose because of the lack of
simple gold standard
11 Clinical Epidemiology: The
Example
Computed tomographic (“virtual”) colonos- copy was compared to traditional (optical) colonoscopy in screening for c
Chapter 8: Diagnosis 113
DISEASE
Present Absent
+PV = a
Positive a b a+b
TEST
Negative c d –PV = d
c+d
Se = a Sp = d P =a + c
a+c b+d a+b+c+d
a c Se = Sensitivity
a+c a+c Sp = Specificity
LR+ = LR– = P = Prevalence
b d LR = Likelihood ratio
b+d b+d PV = Predictive value
DISEASE
DVT according to gold standard
(Compression ultrasonography
and/or 3-month follow-up)
Present Absent
+PV = 55 = 22%
Positive 55 198 253
TEST
D-dimer
assay for
diagnosis
–PV = 302 = 100%
of DVT Negative 1 302
303
Se = Sensitivity
55 1 Sp = Specificity P
55 + 1 55 + 1 = Prevalence
LR+ = = 2.5 LR– = = 0.03
198 302 LR = Likelihood ratio
198 + 302 302 + 198 PV = Predictive value
Table 8.1
Trade-Off between Sensitivity and
Specificity When Using BNP Levels
to Diagnose Congestive Heart
Failure
BNP Level
(ph/mL) Sensitivity (%) Specificity (%)
50 97 62
80 93 74
100 90 76
125 87 79
150 85 83
Adapted with permission from Maisel AS, Krishnaswamy P, Nowak
RM, et al. Rapid measurement of B-type natriuretic peptide in the
emergency diagnosis of heart failure. N Engl J Med 2002;347: 161–
167.
11 Clinical Epidemiology: The
Specificity (%)
80 60 40 20 0
100
50
80
100
125
80 150 20
Cutoff points (pg/mL)
60 40
Sensitivity (%)
(True-positive
1-Sensitivity
40 60
20 80
0
20 40 60 80 100
1-Specificity (%)
(False-positive rate)
Figure 8.4 ■ A receiver operator characteristic (ROC) curve. The accuracy of
B-type natriuretic peptide (BNP) in the emergency diagnosis of heart failure with vari-
ous cutoff levels of BNP between dyspnea due to congestive heart failure and other
causes. (Adapted with permission from Maisel AS, Krishnaswamy P, Nowak RM, et
al. Rapid measurement of B-type natriuretic peptide in the emergency diagnosis of
heart failure. N Engl J Med 2002;347:161–167.)
100
BNP
50
80 EF
100
125
80 150 60
55
50
60
45
Sensitivity
40
35
40
20
0
0 20 40 60 80 100
1-Specificity (%)
Figure 8.5 ■ ROC curves for the BNP and left ventricular ejection fractions
by echocardiograms in the emergency diagnosis of congestive heart fail-
ure in patients presenting with acute dyspnea. Overall, BNP is more sensi-
tive and more specific than ejection fractions, resulting in more area under the
curve. (Redrawn with permission from Steg PG, Joubin L, McCord J, et al. B-type
natriuretic peptide and echocardiographic determination of ejection fraction in
the diagnosis of congestive heart failure in patients with acute dyspnea. Chest
2005;128:21–29.)
Example
A study was made of the sensitivity and speci- ficity of the clin
Bias
Some people in whom disease is suspected may The sensitivity and specificity of a test should be
have other conditions that cause a positive test, estab- lished independently of the means by which
thereby increasing the false-positive rate and the true diagnosis is established. Otherwise, there
decreas- ing specificity. In the example of guidelines could be a biased assessment of the test’s properties.
for ovar- ian cancer evaluation, specificity was low for As already pointed out, if the test is evaluated using
all cancer stages (60%). One reason for this is that data obtained during the course of a clinical
levels of the cancer marker, CA-125, evaluation of patients suspected of having the
recommended by guide- lines, are elevated by disease in question, a positive test may prompt the
many diseases and conditions clinician to continue pursuing
Chapter 8: Diagnosis 119
Sensitivity
60
a test are being assessed, the test result should not be
part of the information used to establish the diag-
nosis. In studying DVT diagnosis by D-dimer 40
assay (discussed earlier), the investigators made sure
that the physicians performing the gold standard 20
tests (ultrasonography and follow-up assessment)
were unaware of the results of the D-dimer assays
so that the results of the D-dimer assays could not 0
influence (bias) the interpretation of 0 10 20 30 40 50
ultrasonography (10).
In the course of routine clinical care, this kind
of
bias can be used to advantage, especially if the test Number of people observed
result is subjectively interpreted. Many radiologic Figure 8.6 ■ The precision of an estimate of
imaging interpretations are subjective, and it is sensitivity. The 95% confidence interval for an observed
easy to be influenced by the clinical information sensitivity of 75%, according to the number of people
pro- vided. All clinicians have experienced having observed.
imaging studies overread because of a clinical
impression or, conversely, of going back over old
of sensitivity and specificity. Therefore, reported val-
studies in which a finding was missed because a
ues for sensitivity and specificity should not be taken
clinical fact was not communicated at the time and,
too literally if a small number of patients is studied.
therefore, attention was not directed to a particular
Figure 8.6 shows how the precision of estimates
area. Both to mini- mize and to take advantage of
of sensitivity increases as the number of people on
these biases, some radiologists prefer to read
which the estimate is based increases. In this particu-
imaging studies twice, first without, then with the
lar example, the observed sensitivity of the
clinical information.
diagnostic test is 75%. Figure 8.6 shows that if this
All the biases discussed tend to increase the
estimate is based on only 10 patients, by chance
agree- ment between the test and the gold standard.
alone, the true sensitivity could be as low as 45%
That is, they tend to make the test seem more
and as high as nearly 100%. When more patients
accurate than it actually is.
are studied, the 95% confidence interval narrows
Chance and the precision of the estimate increases.
Values for sensitivity and specificity are usually esti- PREDICTIVE VALUE
mated from observations on relatively small samples
of people with and without the disease of interest. Sensitivity and specificity are properties of a test that
Because of chance (random variation) in any one should be taken into account when deciding whether
sample, particularly if it is small, the true sensitiv- to use the test. However, once the results of a
ity and specificity of the test can be diag- nostic test are available, whether positive or
misrepresented, even if there is no bias in the negative, the sensitivity and specificity of the test are
study. The particular values observed are no longer relevant because these values are obtained
compatible with a range of true values, typically in persons known to have or not to have the
characterized by the “95% confi- dence intervals” disease. But if one knew the disease status of the
(see Chapter 12).† The width of this range of values patient, it would not be necessary to order the test!
defines the precision of the estimates For the clinician, the dilemma is to determine if
the patient has the dis-
ease, given the results of a test. (In fact, clinicians are
†
The 95% confidence interval of a proportion is easily estimated people observed. To be more exact, multiply by 1.96 instead of 2.
by the following formula, based on the binomial theorem:
p(1 p)
p 2 N
Definitions
The probability of disease, given the results of a test,
is called the predictive value of the test (see Fig.
8.2).
Chapter 8: Diagnosis 121
Example
Referral Process
Referral to teaching hospital wards, clinics,
and emergency departments increases the
chance that significant disease will underlie
12 Clinical Epidemiology: The
45-year-old asymptomatic
man with no risk factors
100
Positive
stress test
60
40
Negative
stress test
20
0
20 40 60 80 100
the intensity of their diagnostic evaluations may need These terms should be familiar to most readers
to be adjusted to suit the specific setting. because they are used in everyday conversation. For
example, we may say that the odds are 4:1 that the
Implications for Interpreting New England Patriots football team will win tonight
the Medical Literature or that they have an 80% probability of winning.
Table 8.2
Distribution of Ratios for Pleural
Fluid Protein to Serum Protein in
Patients with Exudates and
Transudates, with Calculation of
Likelihood Ratios
Ratio of
Pleural
Fluid Number of Patients
Protein with Test Result
to Serum Likelihood
Protein Exudates Transudates Ratio
0.70 475 1 168.65
0.66–0.70 150 1 53.26
0.61–0.65 117 6 6.92
0.56–0.60 102 12 3.02
0.50–0.55 70 14 1.78
0.46–0.50 47 34 0.49
0.41–0.45 27 34 0.28
0.36–0.40 13 37 0.12
0.31–0.35 8 44 0.06
0.30 19 182 0.04
.1 99 7 —
6 35
.2 5 30
4 25
.5 3 20
95
2 15
1
1,000 90 1 No Change
2 500 0.5 –15
200 80
0.4 –20
100
5 70 0.3 –25
50
60 0.2 –30
10 20
10 50 0.1 –45
%
5 40 %
20 30
2 Adapted with permission from McGee S. Simplifying likelihood
30 1 ratios. J Gen Intern Med 2002;17:646–649.
20
40 .5
50 .2 10
60 .1 MULTIPLE TESTS
.05
70 5 Because clinicians commonly use imperfect diagnos-
.02
80 .01 tic tests with 100% sensitivity and specificity and
.005 2 intermediate likelihood ratios, a single test frequently
90
results in a probability of disease that is neither
.002
.001 1 high nor low enough for managing the patient (e.g.,
95
some- where between 10% and 90%). Usually, it is
.5 not acceptable to stop the diagnostic process at
such a point. Would a physician or patient be
.2 satisfied with the conclusion that the patient has
99 even a 20% chance of having carcinoma of the
Pretest
.1 colon? Or that an asymp- tomatic, 45-year-old man
probability Likelihood Posttest with multiple risk factors has about a 30% chance
ratio probability
of coronary heart disease after a positive ECG
(prevalence) (predictive value) stress test? Even for less deadly
Figure 8.9 ■ B. An example: Calculating the posttest diseases, tests resulting in intermediate posttest prob-
probability of a positive D-dimer assay test for DVT (see abilities require more investigation. The physician is
Fig. 8.3). (Adapted with permission from Fagan TJ. Nomo- ordinarily bound to raise or lower the probability
gram for Bayes’s theorem. N Eng J Med 1975;293:257.) of disease substantially in such situations—unless, of
course, the diagnostic possibilities are all trivial,
noth- ing could be done about the result, or the risk
of pro- ceeding further is prohibitive. When these
exceptions do not apply, the clinician will want to
find a way to rule in or rule out the disease more
decisively.
13 Clinical Epidemiology: The
C +
Figure 8.10 ■ Parallel and serial testing. In parallel testing, all tests are done at the
same time. In serial testing, each subsequent test is done only when the previous test
result is positive.
When multiple different tests are performed referral centers seem to diagnose disease that local
and all are positive or all are negative, the physicians miss.) However, false-positive diagnoses
interpretation is straightforward. All too often, are also more likely to be made (thus, the propen-
however, some are positive and others are negative. sity for overdiagnosing in such centers as well). In
Interpretation is then more complicated. This summary, parallel testing is particularly useful when
section discusses the principles by which multiple the clinician is faced with the need for a very sen-
tests are applied and interpreted. sitive testing strategy but has available only two or
Multiple diagnostic tests can be applied in two more relatively insensitive tests that measure
basic ways (Fig. 8.10). They can be used in different clinical phenomena. By using the tests in
parallel testing (i.e., all at once), and a positive parallel, the net effect is a more sensitive diagnostic
result of any test is considered evidence for strategy. The price, however, is further evaluation
disease. Or they can be done in serial testing (i.e., or treatment of some patients without the disease.
consecutively), with the decision to order the next
test in the series based on the results of the
previous test. For serial testing, all tests must give Example
a positive result in order for the diagnosis to be Neither of two tests used for diagnosing ovar-
made because the diagnostic process is stopped with ian cancer, CA-125 and transvaginal ultrasound,
a negative result. is a sensitive test when used alone. A study
was done in which 28,506 women underwent
both tests in a trial of ovarian cancer screening
Parallel Testing (15). If either test result was abnormal (a par-
Physicians usually order tests in parallel when rapid allel testing strategy), women were referred
assessment is necessary, as in hospitalized or emer- for further evaluation. Positive predictive val-
ues were determined for each test individually
gency patients, or for ambulatory patients who as well as when both tests were abnormal (a
can- not return easily because they are not mobile or serial testing strategy). The authors did not cal-
have come from a long distance for evaluation. culate sensitivities and specificities of the tests;
Multiple tests in parallel generally increase the to do so, they would have needed to include
sensitivity and, therefore, the negative predictive any cases of ovarian cancer that occurred in the
value for a given disease prevalence, above those of
each individual test. On the other hand, specific-
ity and positive predictive value are lower than for
each individual test. That is, disease is less likely to
be missed. (Parallel testing is probably one reason
why
Chapter 8: Diagnosis 133
Table 8.4
Test Characteristics of CA-125 and Transvaginal Ultrasound (TVU)
PRETEST PROBABILITY
POSTTEST PROBABILITY
Figure 8.11 ■ Use of likelihood ratios in serial testing. As each test is completed, its posttest
odds become the pretest odds for the subsequent test.
Chapter 8: Diagnosis 135
account the information contributed by all the tests test included in the decision rule contributed inde-
in the series. pendently to the diagnosis. To the degree that the two
tests are not contributing independent information,
Assumption of Independence multiple testing is less useful. For example, if two tests
are used in parallel with 60% and 80%
When multiple tests are used, the accuracy of the sensitivities, and the better test identifies all the
final result depends on whether the additional cases found by the less sensitive test, the combined
infor- mation contributed by each test is somewhat sensitivity cannot be higher than 80%. If they are
inde- pendent of that already available from the completely independent of each other, then the
preceding ones so that the next test does not sensitivity of parallel testing would be 92% (80%
simply duplicate known information. For example, [60% 20%]).
in the diagnosis of endocarditis, it is likely that The premise of independence underlies the entire
fever (an indication of inflammation), a new heart approach to the use of multiple tests. However, it
murmur (an indication of valve destruction), and seems unlikely that tests for most diseases are fully
Osler nodes (an indication of emboli) each add independent of each other. If the assumption that the
independent, useful information. In the example of tests are completely independent is wrong, calcula-
pharyngitis, the investigators used statistical tion of the probability of disease from several tests
techniques to ensure that each diagnostic would tend to overestimate the tests’ value.
Revie w Question s
Questions 8.1–8.10 are based on the following 8.4. If the doctor thought the patient did not
example. For each question, select the best have sinusitis because the patient did not
answer. have facial pain, for what percent of patients
A study was made of symptoms and physical findings would she be correct?
in 247 patients evaluated for sinusitis. The final A. 38%
diagnosis was made according to x-ray findings (gold B. 48%
standard) (17). Ninety-five patients had sinusitis, and C. 52%
49 of them also had facial pain. One hundred fifty-two D. 61%
did not have sinusitis, and 79 of these patients had
facial pain. 8.5. How common was sinusitis in this study?
8.1. What is the sensitivity of facial pain for A. 38%
sinusitis in this study? B. 48%
A. 38% C. 52%
B. 48% D. 61%
C. 52%
D. 61% 8.6. What is the posttest probability of sinusitis in
patients with facial pain in this study?
8.2. What is the A. 38%
specificity? A. 38% B. 48%
B. 48% C. 52%
C. 52% D. 61%
D. 61% Both the positive and negative likelihood ratios for
8.3. If the doctor thought the patient had facial pain were 1.0. When clinicians asked several
sinusitis because the patient had facial pain, other questions about patient symptoms and took
for what percent of patients would she be physical examination results into account, the likeli-
correct? hood ratios for their overall clinical impressions as to
whether patients had sinusitis were as follows: “high
A. 38% probability,” LR 4.7; “intermediate probability,” LR
B. 48% 1.4, and “low probability,” LR 0.4.
C. 52%
D. 61%
13 Clinical Epidemiology: The
8.7. What is the probability of sinusitis in patients C. Predictive value of facial pain will be
assigned a “high probability” by clinicians? higher in a clinical setting in which the
A. 10% prevalence of sinusitis is 10%.
B. 20%
C. 45%
For questions 8.11 and 8.12, select the best
D. 75%
answer.
E. 90%
8.11. Which of the following statements is most
8.8. What is the probability of sinusitis in patients
assigned an “intermediate probability” by correct?
clinicians? A. Using diagnostic tests in parallel
A. 10% increases specificity and lowers positive
B. 20% predictive value.
C. 45% B. Using diagnostic tests in series increases
D. 75% sensitivity and lowers positive
E. 90% predictive value.
C. Using diagnostic tests in parallel
8.9. What is the probability of sinusitis in patients increases sensitivity and positive
assigned a “low probability” by clinicians? predictive value.
D. Using diagnostic tests in series
A. 10% increases specificity and positive
B. 20% predictive value.
C. 45%
D. 75% 8.12. Which of the following statements is most
E. 90% correct?
8.10. Given the answers for questions 8.7–8.9, A. When using diagnostic tests in parallel
which of the following statements is or series, each test should contribute
incorrect? information independently.
B. When using diagnostic tests in series, the
A. A clinical impression of “high probability” of test with the lowest sensitivity should be
sinusitis is more useful in the management used first.
of patients than one of “low probability.” C. When using diagnostic tests in parallel,
B. A clinical impression of “intermediate the test with the highest specificity should
probability’ is approximately equivalent to be used first.
a coin toss.
Answers are in Appendix A.
REFERENCES
1. Chobanian AV, Bakris GL, Black HR, et al. The Seventh
6. Maisel AS, Krishnaswamy P, Nowak RM, et al. Rapid mea-
Report of the Joint National Committee on Prevention,
surement of B-type natriuretic peptide in the emergency diag-
Detection, Evaluation, and Treatment of High Blood Pressure:
nosis of heart failure. N Engl J Med 2002;347:161–167.
The JNC 7 Report. JAMA 2003:289:2560–2571. 7. Steg PG, Joubin L, McCord J, et al. B-type natriuretic peptide
2. Gann PH, Hennekens CH, Stampfer MJ. A prospective evalu-
and echocardiographic determination of ejection fraction in
ation of plasma prostate-specific antigen for detection of pros-
the diagnosis of congestive heart failure in patients with acute
tatic cancer. JAMA 1995;273:289–294.
dyspnea. Chest 2005;128:21–29.
3. Wheeler SG, Wipf JE, Staiger TO, et al. Approach to the
8. Dearking AC, Aletti GD, McGree ME, et al. How relevant
diag- nosis and evaluation of low back pain in adults. In:
are ACOG and SGO guidelines for referral of adnexal
Basow DS, ed. UpToDate. Waltham, MA; UpToDate;
mass? Obstet Gynecol 2007;110:841–848.
2011.
9. Bobo JK, Lee NC, Thames SF. Findings from 752,081 clini-
4. Pickhardt PJ, Choi JR, Hwang I, et al. Computed tomo-
cal breast examinations reported to a national screening pro-
graphic virtual colonoscopy to screen for colorectal neoplasia
gram from 1995 through 1998. J Natl Cancer Inst 2000;92:
in asymptomatic adults. N Engl J Med 2003;349:2191–2200.
971–976.
5. Bates SM, Kearon C, Crowther M, et al. A diagnostic
10. Bates SM, Grand’Maison A, Johnston M, et al. A latex D-
strat- egy involving a quantitative latex D-dimer assay
dimer reliably excludes venous thromboembolism. Arch
reliably excludes deep venous thrombosis. Ann Intern Med
Intern Med 2001;161:447–453.
2003;138: 787–794.
Chapter 8: Diagnosis 137
11. Wells PS, Owen C, Doucette S, et al. Does this patient 15. Buys SS, Partridge E, Greene M, et al. Ovarian cancer screen-
have deep vein thrombosis? The rational clinical ing in the Prostate, Lung, Colorectal and Ovarian (PLCO)
examination. JAMA 2006;295:199–207. cancer screening trial: findings from the initial screen of a
12. Garber AM, Hlatky MA. Stress testing for the diagnosis of randomized trial. Am J Obstet Gynecol 2005;193:1630–
cor- onary heart disease. In Basow DS, ed. UpToDate. 1639.
Waltham, MA; UpToDate; 2012. 16. McIsaac WJ, Kellner JD, Aufricht P, et al. Empirical
13. Heffner JE, Sahn SA, Brown LK. Multilevel likelihood ratios validation of guidelines for the management of pharyngitis
for identifying exudative pleural effusions. Chest 2002;121: in children and adults. JAMA 2004;291:1587–1595.
1916–1920. 17. Williams JW, Simel DL, Roberts L, et al. Clinical evaluation
14. McGee S. Simplifying likelihood ratios. J Gen Intern Med for sinusitis. Making the diagnosis by history and physical
2002; 17:646–649. examination. Ann Intern Med 1992;117:705–710.
Chapter 9
Treatment
Treatments should be given “not because they ought to work, but because they do work.”
—L.H. Opie
1980
Case reports
Tests of hypotheses
Biology
Imagination
Reasoning
Testing Ideas
Some treatment effects are so prompt and
powerful that their value is self-evident even
without formal testing. Clinicians do not have
reservations about the effectiveness of antibiotics for
bacterial meningitis, or diuretics for edema. Clinical
experience is sufficient.
In contrast, many diseases, including most
chronic diseases, involve treatments that are
considerably less dramatic. The effects are smaller,
especially when an effective treatment is tested
against another effective treatment. Also outcomes
take longer to develop. It is then necessary to put
ideas about treatments to a formal test, through
clinical research, because a vari- ety of
circumstances, such as coincidence, biased
comparisons, spontaneous changes in the course of
disease, or wishful thinking, can obscure the true
rela- tionship between treatment and outcomes.
When knowledge of the pathogenesis of disease,
based on laboratory models or physiologic studies in
humans, has become extensive, it is tempting to
pre- dict effects in humans on this basis alone.
However, relying solely on current understanding of
mechanisms without testing ideas using strong
clinical research on intact humans can lead to
13 Clinical Epidemiology: The
Example
Control of elevated blood sugar has been a keystone in the care of patients with diabe- tes mellitus, in part to prevent cardio
YES
Treated group
NO
SAMPLERandomization
YES
Control group
NO
Figure 9.2 ■ The structure of a randomized controlled trial.
4. Patients who refuse to participate in a trial are care is otherwise the same as usual, without a great
excluded, for ethical reasons described earlier in deal of extra testing that is part of some trials.
the chapter. Follow-up is for a simple, clinically important
5. Patients who do not cooperate during the early outcome, such as discharge from the hospital alive.
stages of the trial are also excluded. This avoids This approach not only improves generalizability, it
wasted effort and the reduction in internal validity also makes it easier to recruit large numbers of
that occurs when patients do not take their participants at a reason- able cost so that moderate
assigned intervention, move in and out of effect sizes (large effects are unlikely for most
treatment groups, or leave the trial altogether. clinical questions) can be detected.
Practical clinical trials (also called pragmatic
For these reasons, patients in clinical trials are
clinical trials) are designed to answer real-world
usu- ally a highly selected, biased sample of all patients
questions in the actual care of patients by
with the condition of interest. As heterogeneity is
including the kinds of patients and interventions
restricted, the internal validity of the study is
found in ordi- nary patient care settings.
improved; in other words, there is less opportunity
for differences in out- come that are not related to
treatment itself. However, exclusions come at the Example
price of diminished general- izability: Patients in
the trial are not like most other patients seen in
day-to-day care. Severe ankle sprains are a common problem
among patients visiting emergency depart-
ments. Various treatments are in common
Example use. The Collaborative Ankle Support Trial
Figure 9.3 summarizes how patients were Group enrolled 584 patients with severe ankle
selected for a randomized controlled trial of
sprain in eight emergency departments in the
asthma management (3). Investigators invited
1,410 patients with asthma in 81 general United Kingdom in a randomized trial of four
practices in Scotland to participate. Only 458 com- monly used treatments: tubular
of those invited, about one-third, agreed to compression bandage and three types of
participate and could be contacted. An addi- mechanical sup- port (4). Quality of ankle
tional 199 were excluded, mainly because they function at 3 months was best after a below-
did not meet eligibility criteria, leaving 259 the-knee cast was used for 10 days and worst
patients (18% of those invited) to be random- when a tubular compres- sion bandage was
ized. Although the study invited patients from used; the tubular compres- sion bandage was
community practices, those who actually par- being used in 75% of centers in the United
ticipated in the trial were highly selected and
Kingdom at the time. Two less effective forms
perhaps unlike most patients in the community.
of mechanical support were several times
more expensive than the cast. All treatment
groups improved over time and there was no
difference in outcome among them at 9
months.
Because of the high degree of selection in
trials, it may require considerable faith to Practical trials are different from typical efficacy
generalize the results of clinical trials to ordinary tri- als where, in an effort to increase internal validity,
practice settings. severe restrictions are applied to enrollment,
If there are not enough patients with the disease of intervention, and adherence, limiting the relevance
interest, at one time and place, to carry out a of their results for usual patient care decisions. Large
scientifi- cally sound trial, then sampling can be from simple trials may be of practical questions too, but
multiple sites with common inclusion and practical trials need not be so large.
exclusion criteria. This is done mainly to achieve
adequate sample size, but it also increases Intervention
generalizability, to the extent that the sites are
somewhat different from each other. The intervention can be described in relation to three
Large simple trials are a way of overcoming general characteristics: generalizability, complexity,
the generalizability problem. Trial entry criteria are and strength.
simpli- fied so that most patients developing the First, is the intervention one that is likely to be
study con- dition are eligible. Participating patients implemented in usual clinical practice? In an
have to have accepted random allocation of effort
treatment, but their
13 Clinical Epidemiology: The
Randomized 259
Figure 9.3 ■ Sampling of patients for a randomized controlled trial of asthma management. (Data from Hawkins
G, McMahon AD, Twaddle S, et al. Stepping down inhaled corticosteroids in asthma: randomized controlled trial. BMJ
2003;326: 1115–1121.)
Chapter 9: Treatment 139
to standardize the intervention so that it can be easily special interest and attention because of the study,
described and reproduced in other settings, regardless of the specific nature of the interven-
investi- gators may cater to their scientific, not tion they might be receiving. This phenomenon is
their clinical colleagues by studying treatments that called the Hawthorne effect. The reasons are not
are not feasible in usual practice. clear, but some seem likely: Patients want to
Second, does the intervention reflect the normal please them and make them feel successful. Also,
complexity of real-world treatment? Clinicians regu- patients who volunteer for trials want to do their
larly construct treatment plans with many compo- part to see that “good” results are obtained.
nents. Single, highly specific interventions make for ■ Usual Care. Do patients given the experimental
tidy science because they can be described precisely treat- ment do better than those receiving usual
and applied in a reproducible way, but they may have care— whatever individual doctors and patients
weak effects. Multifaceted interventions, which are decide? This is the only meaningful (and ethical)
often more effective, are also amenable to careful question if usual care is already known to be
eval- uation as long as their essence can be effective.
communicated and applied in other settings. For ■ Placebo Treatment. Do treated patients do bet-
example, a random- ized trial of fall prevention in ter than similar patients given a placebo—an
acute care hospitals stud- ied the effects of a fall risk intervention intended to be indistinguishable (in
assessment scale, with inter- ventions tailored to each physical appearance, color, taste, or smell) from
patient’s specific risks (5). the active treatment but does not have a
Third, is the intervention in question suffi- specific, known mechanism of action? Sugar
ciently different from alternative managements that pills and saline injections are examples of
it is reasonable to expect that the outcome will be placebos. It has been shown that placebos, given
affected? Some diseases can be reversed by treating with convic- tion, relieve severe, unpleasant
a single, dominant cause. Treating hyperthyroidism symptoms, such as postoperative pain, nausea, or
with radioisotope ablation or surgery is one example. itching, in about one-third of patients, a
However, most diseases arise from a combination of phenomenon called the placebo effect.
factors acting in concert. Interventions that change Placebos have the added advan- tage of making
only one of them, and only a small amount, cannot it difficult for study patients to know which
be expected to result in strong treatment effects. If intervention they have received (see “Blinding”
the conclusion of a trial evaluating such interventions in the following text).
is that a new treatment is not effective when used ■ Another Intervention. The comparator may be the
alone, it should come as no surprise. For this reason, current best treatment. The point of a
the first trials of a new treatment tend to enroll “compara- tive effectiveness” study is to find
those patients who are most likely to respond to out whether a new treatment is better than the
treatment and to maximize dose and compliance. one in current use.
Changes in outcome related to these comparators
Comparison Groups
are cumulative, as diagrammed in Figure 9.4.
The value of an intervention is judged in relation
to some alternative course of action. The question
is not only whether a comparison is used, but also New
how appropriate it is for the research question. Intervention
Results can be measured against one or more of
several kinds of comparison groups. Placebo
effect
■ No Intervention. Do patients who are offered
Improveme
Allocating Treatment
Table 9.1
To study the effects of a clinical intervention free of
confounding, the best way to allocate patients to Example of a Table Comparing
treat- ment groups is by means of random allocation Baseline Characteristics: A Randomized
(also referred to as randomization). Patients are Trial of Liberal versus Restrictive
assigned to either the experimental or the control Transfusion in High-Risk Patients after
treatment by one of a variety of disciplined Hip Surgery
procedures—analogous to flipping a coin—whereby Percent with
each patient has an equal (or at least known) chance Characteristic for
of being assigned to any one of the treatment Each Group
groups. % Liberal % Restricted
Random allocation of patients is preferable to (1,007 (1,009
Characteristics Patients) Patients)
other methods of allocation because only randomiza-
tion has the ability to create truly comparable groups. Age (mean) 81.8 81.5
All factors related to prognosis, regardless of Male 24.8 23.7
whether they are known before the study takes Any cardiovascular 63.3 62.5
place or have been measured, tend to be equally disease
distributed in the comparison groups. Tobacco use 600 mg/d 11.6 11.3
In the long run, with a large number of patients
Anesthesiology risk score 3.0 2.9
in a trial, randomization usually works as just
described. However, random allocation does not General anesthesia 54.0 56.2
guarantee that the groups will be similar; Lived in nursing home 10.3 10.9
dissimilarities can arise by chance alone, particularly Data from Carson JL, Terrin ML, Noveck H, et al. Liberal or restrictive
when the number of patients randomized is small. To transfusion in high-risk patients after hip surgery. N Engl J Med
assess whether “bad luck” has occurred, authors of 2011; 365:2453–2462.
randomized controlled trials often present a table
comparing the frequency in the treated and control
groups of a variety of characteristics, espe- cially It is reassuring to see that important prognostic
those known to be related to outcome. These are variables are nearly equally distributed in the groups
called baseline characteristics because they are being compared. If the groups are substantially dif-
pres- ent before randomization and, therefore, ferent in a large trial, it suggests that something
should be equally distributed in the treatment has gone wrong with the randomization process.
groups. Smaller differences, which are expected because of
chance, can be controlled for during data analyses
(see Chapter 5). In some situations, especially
Example small trials, to reduce the risk of bad luck, it is
best
Table 9.1 shows some of the baseline character- istics for a study of to make
liberal sure
versus that at least
restrictive bloodsome of the in high-r
transfusion
characteristics known to be strongly associated
with outcome occur equally in treated and control
patients. Patients are gathered into groups (strata)
that have similar levels of a prog- nostic factor (e.g.,
age for most chronic diseases) and are randomized
separately within each of the strata, a process called
stratified randomization (Fig. 9.5). The final
groups are sure to be comparable, at least for the
characteristics that were used to create the strata.
Some investigators do not favor stratified randomiza-
tion, arguing that whatever differences arise from
bad luck are unlikely to be large and can be
adjusted for
mathematically after the data have been collected.
Differences Arising
after Randomization
Not all patients in clinical trials participate as
origi- nally planned. Some are found to not have the
disease
Chapter 9: Treatment 141
Treatment and
control groups
Strata Final study test were available. Of 499 patients with
T
groups sep- tic shock enrolled in the trial, 233
1
R (47%) had adrenal insufficiency. The main
C T analysis was of this subgroup, those who it
was believed might respond to
T
hydrocortisone, not of all patients
Eligible randomized. Among patients with poor
2
R response to corticotrophin, there was no
patients
C difference in survival in patients treated
with hydrocortisone, compared with
T C
3
R
Because response to corticotrophin was a charac-
C teristic that existed before randomization, it had
been
Stratification Randomization randomly allocated, so the advantages of a
randomized
Figure 9.5 ■ Diagram of stratified randomization. T controlled trial were preserved. However, the ineffi-
ciency of enrolling and gathering data on patients
treated group, C control group, and R who would not contribute to the study’s results could
randomization.
not be avoided. Also, the number of patients in the
study was reduced, making it more difficult to detect
differences in survival if they existed. Nevertheless,
they were thought to have when they entered the this kind of trial has the important advantage of
trial. Others drop out, do not take their providing infor- mation on both the consequences
medications, are taken out of the study because of of a decision that a clinician must make before all
side effects or other illnesses, or somehow obtain the relevant informa- tion is available as well as
the other study treatment or treatments that are not effectiveness of treatment in the subset of patients
part of the study at all. In this way, treatment most likely to respond (see the Intention-to-Treat
groups that might have been comparable just after and Explanatory Trials section in this chapter).
randomization become less so as time passes.
Compliance
Patients May Not Have the
Disease Being Studied Compliance is the extent to which patients follow
medical advice. The term adherence is preferred by
It is sometimes necessary (both in clinical trials some people because it connotes a less subservient
and in practice) to begin treatment before it is relationship between patient and doctor. Compliance
certain whether the patient actually has the disease is another characteristic that comes into play after
for which the treatment is designed. randomization.
Although noncompliance suggests a kind of
willful neglect of good advice, other factors also
Example contribute. Patients may misunderstand which
Hydrocortisone may improve survival from drugs and doses are intended, run out of
septic shock, especially in patients with abnor- prescription medications, confuse various
mal adrenal response to shock, as measured preparations of the same drug, or have no money or
by an inappropriately small rise in plasma
cortisol after administration of corticotropin.
insurance to pay for drugs. Taken together, non-
Treatment must begin before the results of compliance may limit the usefulness of treatments
the test are available. Investigators in Israel that have been shown to work under favorable
and Europe studied whether hydrocortisone conditions.
improved survival to 28 days in patients with In general, compliance marks a better
septic shock (7). Patients were randomized to prognosis, apart from treatment. Patients in
hydrocortisone or placebo, and treatment was randomized trials who were compliant with
begun before results of the corticotrophin placebo had better out- comes than those who
were not (8).
Compliance is particularly important in medical
care outside the hospital. In hospitals, many
14 Clinical Epidemiology: The
factors act to constrain patients’ personal
behavior and render
Chapter 9: Treatment 143
them compliant. Hospitalized patients are gener- For example, in a study of asthma treatment, they
ally sicker and more frightened. They are in may receive not only the experimental drug but
strange surroundings, dependent upon the skill and also different doses of their usual drugs and make
atten- tion of the staff for everything, including greater efforts to control allergens in the home. If
their life. What is more, doctors, nurses, and these occur unequally in the two groups and affect
pharmacists have developed a well-organized system outcomes, they can introduce systematic differences
for ensuring that patients receive what is ordered for between the groups that were not present when the
them. As a result, clinical experience and medical groups were formed.
literature developed on the wards may
underestimate the importance of compliance Blinding
outside the hospital, where most patients and doctors
Participants in a trial may change their behavior or
are and where following doctors’ orders is less
reporting of outcomes in a systematic way (i.e., be
common.
biased) if they are aware of which patients are receiv-
In clinical trials, patients are typically selected to
ing which treatment. One way to minimize this effect
be compliant. During a run-in period, in which
is by blinding, an attempt to make the various par-
placebo is given and compliance monitored, non-
ticipants in a study unaware of the treatment group
compliant patients can be detected and excluded
patients have been randomized to so that this knowl-
before randomization.
edge cannot cause them to act differently, and
thereby diminish the internal validity of the study.
Cross-over Masking is a more appropriate metaphor, but
Patients may move from one randomly allocated blinding is the time-honored term.
treat- ment to another during follow-up, a Blinding can take place in a clinical trial at four
phenomenon called cross-over. If exchanges levels (Fig. 9.6). First, those responsible for
between treatment groups take place on a large allocating patients to treatment groups should not
scale, it can diminish the observed differences in know which treatment will be assigned next making
treatment effect compared to what might have been it impossible for them to break the randomization
observed if the original groups had remained intact. plan. Alloca- tion concealment is a term for this
form of blind- ing. Without it, some investigators
might be tempted to enter patients in the trial out of
Cointerventions
order to ensure that individuals get the treatment
After randomization, patients may receive a variety that seems best for them. Second, patients should
of interventions other than the ones being studied. be unaware of
YES
Treated group
NO
SAMPLERandomization
YES
Control group
NO
Treatment allocation
LOCATION OF BLINDING Patients Clinicians Measurement of outcome
Compliance
Types of patients
Practicality
Efficacy trials usually precede effectiveness trial. of whether these patients actually received the
The rationale is that if treatment under the best cir- treatment they were supposed to receive. This way
cumstances is not effective, then effectiveness under of analyzing trial results is called an intention-to-
ordinary circumstances is impossible. Also, if an treat analysis. An advantage of this approach is
effectiveness trial were done first and it showed no that the question corresponds to the one actually
effect, the result could have been because the faced by clinicians; they either offer a treatment
treat- ment at its best is just not effective or that or not. Also, the groups compared are as origi-
the treat- ment really is effective but was not nally randomized, so this comparison has the full
received. strength of a randomized trial. The disadvantage is
that to the extent that many patients do not receive
Intention-to-Treat and the treatment to which they were randomized, dif-
Explanatory Trials ferences in effectiveness will tend to be obscured,
increasing the chances of observing a misleadingly
A related issue is whether the results of a random- small effect or no statistical effect at all. If the study
ized controlled trial should be analyzed and shows no difference, it will be uncertain whether
presented according to the treatment to which the the problem is the treatment itself or that it was
patients were randomized or according to the one not received.
they actually received (Fig. 9.8). Another question is whether the experimen-
One question is: Which treatment choice is best tal treatment itself is better. For this question, the
at the time the decision must be made? To answer proper analysis is according to the treatment each
this question, analysis is according to which group patient actually received, regardless of the treatment
the patients were assigned (randomized), regardless
INTENTION TO TREAT
Experimental
Analysis
according to
SAMPLE treatment
assigned
Control
EXPLANATORY
Experimental
Analysis
according to
SAMPLE treatment
received
Control
SUPERIORITY, EQUIVALENCE,
AND NON-INFERIORITY
Until now, we have been discussing superiority
trials, ones that seek to establish that one
treatment is better than another, but sometimes the
most impor- tant question is whether a treatment is
no less effec- tive than another. A typical example
is when a new drug is safer, cheaper, or easier to Non-inferiority trials usually require a larger sam-
administer than the established one and would, ple size than comparable superiority trials, especially
therefore, be preferable if it were as effective. In if the inferiority margin is small or one wants to
non-inferiority trials, the pur- pose is to show that a rule out small differences. Also, any aspect of the
new treatment is unlikely to be less effective, at least trial that tends to minimize differences between
to a clinically important extent, than the currently comparison groups, such as intention-to-treat
accepted treatment, which has been shown in other analyses in trials where many patients have
studies to be more effective than placebo. The dropped out or crossed over or when
question is one-directional—whether a new measurements of outcomes are impre- cise,
treatment is not worse—without regard to whether artificially increase the likelihood of finding non-
it might be better. inferiority regardless of whether it is truly present—
It is statistically impossible to establish that a that is, they result in a weak test for non-
treatment is not at all inferior to another. However, inferiority.
a study can rule out an effect that is less than a
pre- determined “minimum clinically important VARIATIONS ON BASIC
differ- ence,” also called an inferiority margin, the RANDOMIZED TRIALS
smallest difference in effect that is still considered
clinically important. The inferiority margin actually In cluster randomized trials, naturally occurring
takes into account both this clinical difference plus groups (“clusters”) of patients (defined by the doc-
the statis- tical imprecision of the study. The tors, hospitals, clinics, or communities that
following is an example of a non-inferiority trial. patients are affiliated with) are randomized, and
outcome
14 Clinical Epidemiology: The
events are counted in patients according to the the subgroups existed before randomization, patients
treat- ment their cluster was assigned. in each subgroup have been randomly allocated to
Randomization of groups, not the individuals in treatment groups. As a consequence, results in
them, may be prefera- ble for several reasons. It each subgroup represent, in effect, a small trial
may just be more practical to randomized clusters within a trial. The characteristics of a given patient
than individuals. Patients within clusters may be (e.g., the patient might be elderly and have severe
more similar to each other than to patients in other disease but no comorbidity) can be matched more
clusters, and this source of variation, apart from the specifically to those of one of the subgroups than it
study treatments themselves, should be taken into can to the trial as a whole. Treatment effectiveness
account in the study. If patients were randomized in the matched subgroup will more closely
within clusters, they or their physicians might approximate that of the individual patient and will
learn from each other about both treatments and be limited mainly by sta- tistical risks of false-
this might affect their behaviors. For example, positive and false-negative con- clusions, which are
how successfully could a physician be at randomly described in Chapter 11.
treating some patients with urinary tract infection one
way and other patients another way? Similarly, could Effectiveness in Individual Patients
a hospital establish a new plan to prevent intravenous
A treatment that is effective on average may not
catheter infections in some of its intensive care units
work on an individual patient. Therefore, results of
and not others when physicians see patients in
valid clinical research provide a good reason to
both settings over time? For these reasons,
begin treat- ing a patient, but experience with that
randomizing clusters rather than patients can be the
patient is a better reason to continue or not
best approach in some circumstances.
continue. When managing an individual patient, it
There are other variations on the usual
(“parallel is prudent to ask the following series of questions:
group”) randomized controlled trials. Cross-over ■ Is the treatment known (by randomized controlled
trials expose patients first to one of two randomly trials) to be efficacious for any patients?
allocated treatments and later to the other. If it can be ■ Is the treatment known to be effective, on
assumed that effects of the first exposure are no average, in patients like mine?
longer present by the time of the second exposure, ■ Is the treatment working in my patient?
perhaps because treatment is short-lived or there ■ Are the benefits worth the discomforts and
has been a “wash-out” period between exposures, risks (according to the patient’s values and
then each patient will have been exposed to each preferences)?
treatment in random order. This controls for
differences in respon- siveness among patients not By asking these questions and not simply follow-
related treatment effects. ing the results of trials alone, one can guard
against ill-founded choice of treatment or stubborn
persis- tence in the face of poor results.
TAILORING THE RESULTS OF
TRIALS TO INDIVIDUAL PATIENTS Trials of N = 1
Clinical trials describe what happens on average. They Rigorous clinical trials, with proper attention to bias
involve pooling the experience of many patients who and chance, can be carried out on individual patients,
may be dissimilar, both to one another and to the one at a time. The method, called trials of N = 1,
patients to whom the trial results will be generalized. is an improvement over the time-honored process
How can estimates of treatment effect be obtained of trial and error. A patient is given one treatment
that more closely match individual patients? or another, such as an active treatment or placebo,
in random order, each for a brief period of time.
Subgroups The patient and physician are blinded to which
treatment is given. Outcomes, such as a simple
Patients in clinical trials can be sorted into preference for a treatment or a symptom score, are
subgroups, each with a specific characteristic (or assessed after each period. After many repetitions
combination of characteristics) such as age, severity patterns of responses are analyzed statistically,
of disease, and comorbidity that might cause a much as one would for a more usual randomized
different treatment effect. That is, the data are controlled trial. This method is useful for deciding on
examined for effect modifi- cation. The number of the care of individual patients when activity of
such subgroups is limited only by the number of disease is unpredictable, response to treatment is
patients in the subgroups, which has to be large prompt, and there is no carryover effect from
enough to provide reasonably stable estimates. As period to period. Examples of diseases for which
long as the characteristics used to define
Chapter 9: Treatment 149
treated in
the method can be used include migraine headaches,
asthma, and fibromyalgia. For all their intellectual
appeal, however, trials of N 1 are rarely done
and even more rarely published.
ALTERNATIVES TO RANDOMIZED
CONTROLLED TRIALS
Randomized controlled trials are the gold standard
for studies of the effectiveness of interventions.
Only large randomized trials can definitively elimi-
nate confounding as an alternative explanation for
observed results.
example, patients may be offered a new surgical pro- database is not part of formal research, there is no
cedure because they are a good surgical risk or have accounting for confounding and effect modification,
less aggressive disease and, therefore, seem but the predictions do have the advantage of being
especially likely to benefit from the procedure. To about real-world patients, not those as highly
the extent that the reasons for treatment choice are selected as in most clinical trials.
known, they can be taken into account like any other
confounders. Randomized versus
Observational Studies?
Example
Influenza can cause worsening of broncho- spasm in children with asthma. Influenza vac- cine is recommended, but many c
Phase I trials are intended to identify a dose range rates of common side effects. They include enough
that is well tolerated and safe (at least for high- patients, sometimes thousands, to detect clinically
frequency, severe side effects) and include very important treatment effects and are usually published
small numbers of patients (perhaps a dozen) in biomedical journals.
without a control group. Phase II trials provide Phase III trials are not large enough to detect
preliminary information on whether the drug is dif- ferences in the rate, or even the existence, of
efficacious and the relationship between dose and uncom- mon side effects (see discussion of statistical
efficacy. These tri- als may be controlled but include power in Chapter 11). Therefore, it is necessary to
too few patients in treatment groups to detect any fol- low up very large numbers of patients after a
but the largest treat- ment effects. Phase III trials drug is in general use, a process called
are randomized trials and can provide definitive postmarketing surveillance.
evidence of efficacy and
Revie w Question s
Read the following and select the best 9.3. In a randomized trial, patients with
response. meningitis who were treated with
corticosteroids had lower rates of death,
9.1. A randomized controlled trial compares two hearing loss, and neurologic sequelae. Which
drugs in common use for the treatment of of the following is a randomized comparison?
asthma. Three hundred patients were entered
into the trial, and eligibility criteria were A. The subset of patients who, at the time of
broad. No effort was made to blind patients randomization, were severely affected by
to their treatment group after enrollment. the disease
Except for the study drugs, care was decided B. Patients who experienced other treatments
by each individual physician and patient. The versus those who did not
outcome measure was a brief questionnaire C. Patients who remained in the trial
assessing asthma-related quality of life. versus those who dropped out after
Which of the following best describes this randomization
trial? D. Patients who responded to the drug versus
those who did not
A. Practical clinical trial E. Patients who took the drug
B. Large simple trial compared with those who did not
C. Efficacy trial
D. Equivalence trial 9.4. A patient asks for your advice about whether
E. Non-inferiority trial to begin an exercise program to reduce
his risk of sudden death. You look for
9.2. A randomized controlled trial compared randomized controlled trials but find only
angioplasty with fibrinolysis for the treatment observational studies of this question. Some
of acute myocardial infarction. The authors are cohort studies comparing sudden death
state that “analysis was by intention to treat.” rates in exercisers with rates of sudden death
Which of the following is an advantage of in sedentary people; others are case-control
this approach? studies comparing exercise patterns in people
A. It describes the effects of treatments that who had experienced sudden death and
patients have actually received. matched controls. Which of the following
B. It is unlikely to underestimate treatment is not an advantage of observational
effect. studies of treatments like these over
C. It is not affected by patients dropping out randomized controlled trials?
of the study. A. Reported effects are for patients who have
D. It describes the consequences of offering actually experienced the intervention.
treatments regardless of whether they are B. It may be possible to carry out these
actually taken. studies by using existing data that was
E. It describes whether treatment can work collected for other purposes.
under ideal circumstances.
15 Clinical Epidemiology: The
C. The results can be generalized to more 9.8. A randomized controlled trial is analyzed
ordinary, real world settings. according the treatment each patient actually
D. Treatment groups would have had a received. Which of the following best
similar prognosis except for treatment describes this approach to analysis?
itself.
A. Superiority
E. A large sample size is easier to achieve.
B. Intention-to-treat
C. Explanatory
9.5. In a randomized controlled trial of a program
D. Phase I
to reduce lower extremity problems in
E. Open-label
patients with diabetes mellitus, patients were
excluded if they were younger than age 40,
9.9. In a randomized controlled trial, a beta-
were diagnosed before becoming 30 years
blocking drug is found to be more effective
old, took specific medication for
than placebo for stage fright. Patients taking
hyperglycemia, had other serious illness or
the beta-blocker tended to have a lower pulse
disability, or were not compliant with
rate and to feel more lethargic, which are
prescribed treatment during a run-in period.
known effects of this drug. For which of the
Which of the following is an advantage of
following is blinding possible?
this
approach? A. The patients’ physicians
B. The investigators who assigned patients to
A. It makes it possible to do an intention-
treatment groups
to- treat analysis.
C. The patients in the trial
B. It avoids selection bias.
D. The investigators who assess outcome
C. It improves the generalizability of the
study.
9.10. Which of the following best describes
D. It makes an effectiveness trial possible.
“equipoise” as the rationale for a randomized
E. It improves the internal validity of the
trial of two drugs?
study.
A. The drugs are known to be equally
9.6. Which of the following is not accomplished effective.
by an intention-to-treat analysis? B. One of the drugs is known to be
more toxic.
A. A comparison of the effects of
C. Neither drug is known to be
actually taking the experimental
more effective than the other.
treatments
D. Although one drug is more effective, the
B. A comparison of the effects of offering
other drug is easier to take with fewer side
the experimental treatments
effects.
C. A randomized comparison of treatment
effects
9.11. Antibiotic A is the established treatment for
community-acquired pneumonia, but it is
9.7. You are reading a report of a randomized
expensive and has many side effects. A new
controlled trial and wonder whether stratified
drug, antibiotic B, has just been developed
randomization, which the trial used, was
for community-acquired pneumonia and
likely to improve internal validity. For which
is less expensive and has fewer side effects,
of the following is stratified randomization
but its efficacy, relative to drug A, is not well
particularly helpful?
established. Which of the following would be
A. The study includes many patients. the best kind of trial for evaluating drug B?
B. One of the baseline variables is
A. Superiority
strongly related to prognosis.
B. Cross-over
C. Assignment to treatment group is
C. Cluster
not blinded.
D. Non-inferiority
D. Many patients are expected to drop
E. Equivalence
out.
E. An intention-to-treat analysis is
planned.
Chapter 9: Treatment 155
A.
REFERENCES
1. Action to Control Cardiovascular Risk in Diabetes Study
9. The AIM-HIGH Investigators. Niacin in patients with low
Group, Gerstein HC, Miller ME, et al. Effects of intensive
HDL cholesterol levels receiving intensive statin therapy. N
glucose lowering in type 2 diabetes. N Engl J Med 2008;
Engl J Med 2011;365:2255–2267.
358:2545–2559.
10. Feldman T, Foster E, Glower DG, et al. Percutaneous
2. Allen C, Glasziou P, Del Mar C. Bed rest: a potentially
repair or surgery for mitral regurgitation. N Engl J Med
harmful treatment needing more careful evaluation. Lancet
2011;364: 1395–1406.
1999;354:1229–1233.
11. Mitja O, Hayes R, Ipai A, et al. Single-dose azithromycin ver-
3. Hawkins G, McMahon AD, Twaddle S, et al. Stepping down
sus benzathine benzylpenicillin for treatment of yaws in chil-
inhaled corticosteroids in asthma: randomized controlled trial.
dren in Papua New Guinea: an open-label, non-inferiority,
BMJ 2003;326:1115–1121.
4. Lamb SE, Marsh JL, Hutton JL, et al. Mechanical supports for randomized trial. Lancet 2012;379:342–347.
12. Chrischilles EA, Pendergast JF, Kahn KL, et al. Adverse events
acute, severe ankle sprain: a pragmatic, multicentre, random-
among the elderly receiving chemotherapy for advanced non-
ized controlled trial. Lancet 2009;373:575–581.
5. Dykes PC, Carroll DL, Hurley A, et al. Fall prevention in small cell lung cancer. J Clin Oncol 2010;28:620–627.
13. Kramarz P, DeStefano F, Gargiullo PM, et al. Does influenza
acute care hospitals. A randomized trial. JAMA
vaccination exacerbate asthma. Analysis of a large cohort of
2010;304:1912–1918.
6. Carson JL, Terrin ML, Noveck H, et al. Liberal or children with asthma. Vaccine Safety Datalink Team. Arch
Fam Med 2000;9:617–623.
restrictive transfusion in high-risk patients after hip
14. Cates CJ, Jefferson T, Rowe BH. Vaccines for preventing
surgery. N Engl J Med 2011;365:2453–2462.
7. Sprung CL, Annane D, Keh D, et al. Hydrocortisone therapy influ- enza in people with asthma. Available at
http://summaries. cochrane.org/CD000364/vaccines-for-
for patients with septic shock. N Engl J Med 2008;358:111–
preventing-influenza- in-people-with-asthma. Accessed July
124.
8. Avins AL, Pressman A, Ackerson L, et al. Placebo adherence 26, 2012.
15. Feinstein AR, Horwitz RI. Double standards, scientific meth-
and its association with morbidity and mortality in the studies
ods, and epidemiologic research. N Engl J Med 1982;307:
of left ventricular dysfunction. J Gen Intern Med 2010;25:
1611–1617.
1275–1281.
C h a p t e r 10
Prevention
If a patient asks a medical practitioner for help, the doctor does the best he can. He is
not responsible for defects in medical knowledge. If, however, the practitioner
initiates
screening procedures, he is in a very different situation. He should have conclusive evidence
that screening can alter the natural history of disease in a significant proportion of those
screened.
—Archie Cochrane and Walter Holland
1971
Screening
Screening is the identification of asymptomatic dis-
ease or risk factors. Screening tests start in the pre-
natal period (such as testing for Down syndrome
in the fetuses of older pregnant women) and
continue throughout life (e.g., when inquiring
about hearing in the elderly). The latter half of this
chapter discusses scientific principles of screening.
Chemoprevention
Chemoprevention is the use of drugs to prevent dis-
ease. It is used to prevent disease early in life (e.g.,
folate during pregnancy to prevent neural tube
defects and ocular antibiotic prophylaxis in all
newborns to prevent gonococcal ophthalmia
neonatorum) but is also common in adults (e.g.,
low-dose aspirin pro- phylaxis for myocardial
infarction, and statin treat- ment for
hypercholesterolemia).
LEVELS OF PREVENTION
Merriam-Webster’s dictionary defines prevention as
“the act of preventing or hindering” and “the act
or practice of keeping something from happening”
(2). With these definitions in mind, almost all
15 Clinical Epidemiology: The
Figure 10.1 ■ Levels of prevention. Primary prevention
prevents disease from occurring. Secondary prevention de- tects
and cures disease in the asymptomatic phase. Tertiary prevention
reduces complications of disease.
Primary Prevention
Primary prevention keeps disease from occurring
at all by removing its causes. The most common
clinical primary care preventive activities involve
immuniza- tions to prevent communicable
diseases, drugs, and behavioral counseling.
Recently, prophylactic surgery has become more
common, with bariatric surgery to prevent
complications of obesity, and ovariectomy and
mastectomy to prevent ovarian and breast cancer in
women with certain genetic mutations.
Primary prevention has eliminated many
infec- tious diseases from childhood. In
American men, primary prevention has prevented
many deaths from two major killers: lung cancer
and cardiovascular disease. Lung cancer mortality
in men decreased by 25% from 1991 to 2007, with
an estimated 250,000 deaths prevented (3). This
decrease followed smok- ing cessation trends
among adults, without orga- nized screening and
without much improvement in survival after
treatment for lung cancer. Heart disease mortality
rates in men have decreased by half over the past
several decades (4) not only because medi- cal
care has improved, but also because of primary
prevention efforts such as smoking cessation and
use of antihypertensive and statin medications.
Primary prevention is now possible for cervical,
hepatocel- lular, skin and breast cancer, bone
fractures, and alcoholism.
Chapter 10: Prevention 155
commit-
A special attribute of primary prevention
involving efforts to help patients adopt healthy
lifestyles is that a single intervention may prevent
multiple diseases. Smoking cessation decreases not
only lung cancer but also many other pulmonary
diseases, other cancers, and, most of all,
cardiovascular disease. Maintaining an appropriate
weight prevents diabetes and osteoarthri- tis, as well
as cardiovascular disease and some cancers.
Primary prevention at the community level can
also be effective. Examples include immunization
requirements for students, no-smoking regulations
in public buildings, chlorination and fluoridation
of the water supply, and laws mandating seatbelt use
in automobiles and helmet use on motorcycles and
bicy- cles. Certain primary prevention activities
occur in specific occupational settings (use of
earplugs or dust masks), in schools (immunizations),
or in specialized health care settings (use of tests to
detect hepatitis B and C or HIV in blood banks).
For some problems, such as injuries from automo-
bile accidents, community prevention works best.
For others, such as prophylaxis in newborns to
prevent gonococcal ophthalmia neonatorum, clinical
settings work best. For still others, clinical efforts
can comple- ment community-wide activities. In
smoking preven- tion efforts, clinicians help
individual patients stop smoking and public
education, regulations, and taxes prevent teenagers
from starting to smoke.
Secondary Prevention
Secondary prevention detects early disease when it
is asymptomatic and when treatment can stop it from
progressing. Secondary prevention is a two-step pro-
cess, involving a screening test and follow-up diag-
nosis and treatment for those with the condition of
interest. Testing asymptomatic patients for HIV and
routine Pap smears are examples. Most secondary
prevention is done in clinical settings.
As indicated earlier, screening is the identification
of an unrecognized disease or risk factor by history
taking (e.g., asking if the patient smokes), physical
examination (e.g., a blood pressure measurement),
laboratory test (e.g., checking for proteinuria in a
diabetic), or other procedure (e.g., a bone mineral
density examination) that can be applied
reasonably rapidly to asymptomatic people.
Screening tests sort out apparently well persons (for
the condition of inter- est) who have an increased
likelihood of disease or a risk factor for a disease
from people who have a low likelihood. Screening
tests are part of all secondary and some primary and
tertiary preventive activities.
A screening test is usually not intended to be
diag- nostic. If the clinician and/or patient are not
15 Clinical Epidemiology: The
hypertension, hyperlipidemia, obesity, and certain
ted to further investigation of abnormal genetic abnormalities. Treating risk factors as disease
results and treatment, if necessary, the broadens the definition of secondary prevention
screening test should not be performed at into the domain of traditional primary prevention.
all. In some disciplines, such as cardiology, the term
secondary prevention is used when discussing
tertiary prevention. “A new era of secondary
Tertiary Prevention prevention” was
Tertiary prevention describes clinical
activities that prevent deterioration or
reduce complications after a disease has
declared itself. An example is the use of
beta-blocking drugs to decrease the risk of
death in patients who have recovered from
myocardial infarction. Tertiary prevention
is really just another term for treatment, but
treatment focused on health effects
occurring not so much in hours and days
but months and years. For example, in
diabetic patients, good treatment requires
not just control of blood glucose. Searches
for and successful treatment of other
cardiovascular risk factors (e.g., hypertension,
hypercholesterolemia, obesity, and
smoking) help prevent cardiovascular disease
in diabetic patients as much, and even more,
than good control of blood glucose. In
addition, diabetic patients need regu- lar
ophthalmologic examinations for detecting
early diabetic retinopathy, routine foot care,
and monitor- ing for urinary protein to
guide use of angiotensin- converting enzyme
inhibitors to prevent renal failure. All these
preventive activities are tertiary in the sense
that they prevent and reduce complications
of a dis- ease that is already present.
declared when treating patients with acute understanding of what is being sought or prevented.
coronary syndrome (myocardial infarction or For instance, physicians performing routine check-
unstable angina) with a combination of antiplatelet ups on their patients may order a urinalysis. How-
and anticoagu- lant therapies to prevent ever, a urinalysis might be used to search for any
cardiovascular death (5). Similarly, “secondary number of medical problems, including diabetes,
prevention” of stroke is used to describe asymptomatic urinary tract infections, renal
interventions to prevent stroke in patients with cancer, or renal failure. It is necessary to decide
transient ischemia attacks. which, if any, of these conditions is worth screening
Tests used for primary, secondary, and tertiary for before undertaking the test. One of the most
prevention, as well as for diagnosis, are often identi- important sci- entific advances in clinical
cal, another reason for confusing the levels of pre- prevention has been the development of methods
vention (and confusing prevention with diagnosis). for deciding whether a pro- posed preventive
Colonoscopy may be used to find a cancer in a activity should be undertaken (6). The remainder of
patient with blood in his stool (diagnosis); to find an this chapter describes these methods and concepts.
early asymptomatic colon cancer (secondary preven- Three criteria are important when judging whether
tion); remove an adenomatous polyp, which is a risk a condition should be included in preventive care
factor for colon cancer (primary prevention); or to (Table 10.1):
check for cancer recurrence in a patient treated for 1. The burden of suffering caused by the condition.
colon cancer (a tertiary preventive activity referred to 2. The effectiveness, safety, and cost of the preventive
as surveillance). intervention or treatment.
Regardless of the terms used, an underlying 3. The performance of the screening test.
rea- son to differentiate levels among preventive
activities is that there is a spectrum of probabilities
of disease and adverse health effects from the
condition(s) being sought and treated during Table 10.1
preventive activities, as well as different probabilities
of adverse health effects from interventions that are
used for prevention at the vari- ous levels. The
underlying risk of certain health prob- lems is usually
much higher in diseased than healthy people. For
example, the risk of cardiovascular dis- ease in
diabetics is much greater than in asymptom- atic
non-diabetics. Identical tests perform differently
depending on the level of prevention.
Furthermore, the trade-offs between effectiveness
and harms can be quite different for patients in
different parts of the spectrum. False-positive test
results and overdiagnosis (both discussed later in
this chapter) among people without the disease
being sought are important issues in secondary
prevention, but they are less important in
treatment of patients already known to have the
disease in question. The terms primary, secondary,
and tertiary prevention are ways to consider these
dif- ferences conceptually.
SCIENTIFIC APPROACH TO
CLINICAL PREVENTION
When considering what preventive activities to per-
form, the clinician must first decide with the patient
which medical problems or diseases they should
try to prevent. This statement is so clear and obvi-
ous that it would seem unnecessary to mention,
but the fact is that many preventive procedures,
espe- cially screening tests, are performed without
a clear
15 Clinical Epidemiology: The
Criteria for Deciding Whether a Medical
Condition Should Be Included in
Preventive Care
1. How great is the burden of suffering caused by the
condition in terms of:
Death Discomfort
Disease Dissatisfaction
Disability Destitution
2. How good is the screening test, if one is to be
performed, in terms of:
Sensitivity Safety
Specificity Acceptability
Simplicity
Cost
3. A. For primary and tertiary prevention, how good is the
therapeutic intervention in terms of:
Effectiveness
Safety
Cost-effectiveness
Or
B. For secondary prevention, if the condition is found, how
good is the ensuing treatment in terms of:
Effectiveness
Safety
Early treatment after screening being more effective than
later treatment without screening, when the patient
becomes symptomatic
Cost-effectiveness
Chapter 10: Prevention 159
EFFECTIVENESS OF TREATMENT
As pointed out in Chapter 9, randomized
controlled trials are the strongest scientific evidence
for estab- lishing the effectiveness of treatments. It
is usual practice to meet this standard for tertiary
prevention (treatments). On the other hand, to
conduct ran- domized trials when evaluating
primary or secondary prevention requires very large
studies on thousands, often tens of thousands, of
patients, carried out over many years, sometimes
decades, because the outcome of interest is rare and
often takes years to occur. The task is daunting;
16 Clinical Epidemiology: The
Randomized Trials
Virtually all recommended immunizations are
backed by evidence from randomized trials,
sometimes rela- tively quickly when the
outcomes occur within weeks or months, as in
childhood infections. Because phar-
maceuticals are regulated, primary and
secondary preventive activities involving
drugs (e.g., treatment of hypertension and
hyperlipidemia in adults) also usually have
been evaluated by randomized trials.
Randomized trials are less common when
the pro- posed prevention is not regulated, as
is true with vita- mins, minerals, and food
supplements, or when the intervention is
behavioral counseling.
Observational Studies
Observational studies can help clarify the
effective- ness of primary prevention when
randomization is not possible.
Example
determine if decades later it also was the electronic health databases of more
effective against cancer. A study was done than 9 million persons. Comparing the
in Taiwan, where nationwide HBV incidence of GBS occurring up to 6 weeks
vaccination was be- gun in 1984. The rates after vaccination to that of later
of hepatocellular cancer rates were (background) GBS occurrence in the same
compared among people who were group of vaccinated individuals, the
immunized at birth between 1984 and 2004 attributable risk of the 2009 vaccine was
to those born between 1979 and 1984 when esti- mated to be an additional five cases of
no vaccination program existed. (The GBS per million vaccinations. Although GBS
comparison was made possible by thorough incidence increased after vaccination, the
national health databases on the island.) effect in 2009 was about half that seen in
Hepatocellular cancer incidence decreased the 1970s (9). The very low estimated
almost 70% among young people in the 20 attributable risk in 2009 is reassuring. If
years after the introduction of HBV rare events, occurring in only a few people
As pointed out in Chapter 5, observational studies per million are to be detected in near real
are vulnerable to bias. The conclusion that HBV vac- time, population-based surveillance systems
cine prevents hepatocellular carcinoma is reasonable are required. Even so, associations found in
from a biologic perspective and from the dramatic surveillance systems are weak evi- dence
result. It will be on even firmer ground if studies for a causal relationship because they are
of other populations who undergo vaccination observational in nature and electronic da-
confirm the results from the Taiwan study. tabases often do not have information on
Building the case for causation in the absence of
Counseling
randomized trials is covered in Chapter 12.
U.S. laws do not require rigorous evidence of
Safety effec- tiveness of behavioral counseling methods.
Never- theless, clinicians should require scientific
With immunizations, the occurrence of adverse
evidence before incorporating routine counseling into
effects may be so rare that randomized trials
preven- tive care; counseling that does not work
would be unlikely to uncover them. One way to
wastes time, costs money, and may harm patients.
study this question is to track illnesses in large
Research has demonstrated that certain counseling
datasets of mil- lions of patients and to compare the
methods can help patients change some health
frequency of an adverse effect linked temporally to
behaviors. Smoking cessation efforts have led the
the vaccination among groups at different time
way with many random- ized trials evaluating
periods.
different approaches.
Example
Guillain-Barré syndrome (GBS) is a rare, seri-
ous immune-mediated neurological disorder
characterized by ascending paralysis that can
temporarily involve respiratory muscles so that
intubation and ventilator support are required.
A vaccine developed against swine flu in the
1970s was associated with an unexpected
sharp rise in the number of cases of GBS and
contributed to suspension of the vaccination
program. In 2009, a vaccine was developed to
protect against a novel influenza A (H1N1) vi-
rus of swine origin, and a method was need-
ed to track GBS incidence. One way this was
done was to utilize a surveillance system of
questions were addressed by a panel that re-
viewed all studies done on smoking cessation,
focusing on randomized trials (10). They found
43 trials that assessed amounts of counseling
16 contact
Clinicaland found a dose-response—the
Epidemiology: The more
contact time the better the abstinence rate
(Fig. 10.2). In addition, the panel found that ran-
domized trials demonstrated that pharmaco-
therapy with bupropion (a centrally acting drug
that decreases craving), varenicline (a nicotine
Chapter 10: Prevention 163
30
10
0
Counciling sessions (No.)
0–1 2–3 4–8 >8
Medication used
– + – + – + – +
Example
Figure 10.2 ■ Dose response of smoking cessation rates according to
the number of counseling sessions clinicians have with patients and use
Lung cancer is the leading cause of cancer-relat- ed death in the
of medication. (Data from Fiore MC, Jaén CR, Baker TB, et al. Treating tobacco use and
3.9 per 1,000 persons-years in men not offered screening (12). Ho
dependence: 2008 Update. Clinical Practice Guideline. Rockville, MD: U.S.
Department of Health and Human Services. Public Health Service. May 2008.)
Treatment in Secondary
receptor agonist), nicotine Prevention
gum, nasal
spray, orinpatches
Treatments were
secondary effective.
prevention areCombining
generally the
counsel- ing and medication (evaluated
same as treatments for curative medicine. Like in 18
inter-
trials) in- creased the smoking cessation rate
ventions for symptomatic disease, they should be
bothstill further. On the other hand, there was no
efficacious and effective. Unlike usual interven-
effect of anx- iolytics, beta-blockers, or
tions for disease, however, it typically takes years to
establish that a secondary preventive intervention is
effective, and it requires large numbers of people to
be studied. For example, early treatment after
colorectal cancer screening can decrease colorectal
cancer deaths by approximately one-third, but to
show this effect, a study of 45,000 people with 13
years of follow-up was required (11).
A unique requirement for treatment in
secondary prevention is that treatment of early,
asymptomatic disease must be superior to treatment
of the disease when it would have been diagnosed
in the usual course of events, when a patient seeks
medical care for symptoms. If outcome in the two
situations is the same, screening does not add
value.
16 Clinical Epidemiology: The
Round of screening
1 2 3
Dx
Dx
Dx
Dx
Dx
Dx
Dx
Dx
Dx
Dx
Dx
Dx
5 3 3
Number of
Figure 10.3 ■ Thenewly decreasing yield of a screening test
after the first round of screening. The first round (preva-
lence screening) detects prevalent cases. The second and third rounds
(incidence screenings) detect incident cases. In this fig- ure, it is
assumed that the test detects all cases and that all people in the
population are screened. When this is not so, cases missed in the
first round are available for detection in sub- sequent rounds—and
the yield would be higher. O onset of disease; Dx diagnosis
time, if screening were not carried out.
Chapter 10: Prevention 165
UNSCREENED Dx
SCREENED
Early treatment Dx
not effective
SCREENED
Improved
Early treatment Dx survival
is effective
Figure 10.4 ■ How lead time affects survival time after screening; O = onset of disease.
Pink- shaded areas indicate length of survival after diagnosis (Dx).
Length-Time Bias Dx
Dx
Length-time bias occurs because the proportion of
slow-growing lesions diagnosed during screening is
greater than the proportion of those diagnosed dur- Dx
ing usual medical care. As a result, length-time Dx
bias makes it seem that screening and early Dx
treatment are more effective than usual care. Dx
Length-time bias occurs in the following way. Dx
Screening works best when a medical condition
Dx
develops slowly. Most types of cancers, however, dem-
onstrate a wide range of growth rates. Some of Dx
them grow slowly, some very fast. Screening tests Figure 10.5 ■ Length-time bias. Cases that progress
are likely to find mostly slow-growing tumors rapidly from onset (O) to symptoms and diagnosis (Dx) are less
because they are present for a longer period of time likely to be detected during a screening examination.
before they cause symptoms. Fast-growing tumors
are more likely to cause symptoms that lead to
diagnosis in the inter- val between screening Compliance Bias
examinations, as illustrated in Figures 10.5 and
10.6. Screening, therefore, tends to find tumors The third major bias that can occur in prevention
with inherently better prognoses. As a result, the studies is compliance bias. Compliant patients
mortality rates of cancers found through screening tend to have better prognoses regardless of preven-
may be better than those not found through tive activities. The reasons for this are not completely
screening, but screening is not protective in this clear, but on average, compliant patients are more
situation.
D D D
Diagnosis after
S symptoms
Rapid growth
S
S
Size/stage of
Detection possible
by screening
Slow growth
Onset Screened
Time
Figure 10.6 ■ Relationship between length-time bias and speed of tumor growth.
Rapidly growing tumors come to medical attention before screening is performed, whereas more
slowly growing tumors allow time for detection. D diagnosis after symptoms; S detection
after screening.
Chapter 10: Prevention 167
interested in their health and are generally be made up of similar populations, the control popu-
healthier than non-compliant ones. For example, a lation should not have access to screening, and
random- ized study that invited people for both populations must have careful follow-up to
screening found that volunteers from the control docu- ment all cases of the outcome being studied.
group who were not invited but requested Because randomized controlled trials and
screening had better mortality rates than the prospec- tive population-based studies are difficult to
invited group, which contained both compliant conduct, take a long time, and are expensive,
people who wanted screening and those who investigators some- times try to use other kinds of
refused (15). The effect of patient compliance, as studies, such as histori- cal cohort studies (Chapter
distinct from treatment effect, has primarily involved 5) or case control studies (Chapter 6), to investigate
medication adherence in the placebo group, and preventive maneuvers.
has been termed placebo adherence.
Example Example
To test
An analysis was done to determine if health out- comes differed in whether periodic
the placebo arm ofscreening
a random-with sig-
ized moidoscopy
trial reducesw
among patients
10 years
35%) to active treatment (enalapril) or placebo. The analysis among
showed thatpatients dying of
after 3 years, colorectal
patients cancer and
randomized among we
to placebo wh
PERFORMANCE OF
SCREENING TESTS
cancer (18). Thirteen of 18 men who were
Tests used for screening should meet the criteria di- agnosed with prostate cancer within a
for diagnostic tests laid out in Chapter 8. year af- ter the blood sample had elevated
The following criteria for a good screening test PSA levels (4.0 ng/mL) and would have
apply to all types of screening tests, whether they are been diagnosed after an abnormal PSA
history, physical examination or laboratory tests. result; the other five had normal PSA
results and developed interval cancers
High Sensitivity and Specificity during the first year after a normal PSA
The very nature of searching for a disease in test. Thus, sensitivity of PSA was calculated
people without symptoms means that prevalence is
usually very low, even among high-risk groups A key challenge is to choose a correct period of
who were selected because of age, sex, and other follow-up. If the follow-up period is too short,
risk character- istics. A good screening test must, disease missed by the screening test might not have a
therefore, have a high sensitivity so that it does not chance to make itself obvious, so the test’s
miss the few cases of disease present. It must also sensitivity may be overestimated. On the other
be sensitive early in the disease, when the hand, if the follow-up period is too long, disease
subsequent course can still be altered. If a not present at the time of screening might be
screening test is sensitive only for late- stage found, resulting in a falsely low estimation of the
disease, which has progressed too far for effective test’s sensitivity.
treatment, the test would be useless. A screening test
should also have a high specificity to reduce the Detection and Incidence Methods
num- ber of people with false-positive results who for Calculating Sensitivity
require diagnostic evaluation.
Sensitivity and specificity are determined for Calculating sensitivity by counting cancers detected
screening tests much as they are for diagnostic tests, during screening as true positives and interval
with one major difference. As discussed in Chapter cancers as false negatives is sometimes referred to as
8, the sensitivity and specificity of a diagnostic test the detection method (Table 10.3). The method
are determined by comparing the results to another works well for many screening tests, but there are
test (the gold standard). In screening, the gold two difficulties with it for some cancer screening
standard for the presence of disease often is not tests. First, as already pointed out, it requires that the
only another, more accurate, test but also a period of appropriate amount of follow-up time for interval
time for follow- up. The gold standard test is cancers be known; often, it is not known and must be
routinely applied only to people with positive guessed. The detection method also assumes that the
screening test results, to dif- ferentiate between abnormalities detected by the screening test would go
true- and false-positive results. A period of follow-up on to cause trouble if left alone. This is not necessar-
is applied to all people who have a negative ily so for several cancers, particularly prostate
screening test result, in order to differenti- ate cancer.
between true- and false-negative test results.
Follow-up is particularly important in cancer Example
screening, where interval cancers, cancers not Histologic prostate cancer is common in men, especially older
detected during screening but subsequently discov-
ered over the follow-up period, occur. When interval
cancers occur, the calculated test sensitivity is lowered.
Example
In Chapter 8, we presented a study in which prostate-specific antigen (PSA) levels were measured in stored blood sam
Chapter 10: Prevention 169
Table 10.3
Calculating Sensitivity of a Cancer Screening Test According to the Detection
Method and the Incidence Method
Theoretical Example
A new screening test is introduced for pancreatic cancer. In a screening group, cancer is detected in 200 people; over the ensuing year,
another 50 who had negative screening tests are diagnosed with pancreatic cancer. In a concurrent control group with the same
characteristics and the same size, members did not undergo screening; 100 people were diagnosed with pancreatic cancer during the
year.
Sensitivity of the Test Using the Detection Method
Number of screen-detected cancers
Sensitivity Number of screen-detected cancers plus number of interval cancers
200
(200 50)
.80 or 80%
Sensitivity of the Test Using the Incidence Method
Sensitivity 1 – (interval rate in the screening group/incidence rate in the control group)
50
1a b 0.50 or 50%
100
50
breast cancer to those with breast
Example
40
20
10
Safety
require an appointment and bowel preparation, are It is reasonable and ethical to accept a certain risk for
best suited for diagnostic testing in patients with diagnostic tests applied to sick patients seeking help
symptoms and clinical indications. Nevertheless, for specific complaints. The physician cannot avoid
screening colonoscopy has been found to be action when the patient is severely ill, and does his
highly effective in decreasing colorectal or her best. It is quite another matter to subject
mortality, and a negative test does not have to be presumably well people to risks. In such
repeated for several years. Other tests, such as circumstances, the proce- dure should be especially
visual field testing for the detection of glaucoma safe. This is partly because the chances of finding
and audiograms for the detec- tion of hearing loss, disease in healthy people are so low. Thus, although
fall between these two extremes. The financial colonoscopy is hardly thought of as a dangerous
“cost” of the test depends not only on the cost of procedure when used on patients with
(or charge for) the procedure itself but also on the gastrointestinal complaints, it can cause bowel perfo-
cost of subsequent evaluations performed on ration. In fact, when colonoscopy, with a rate of
patients with positive test results. Thus sensitiv- two perforations per 1,000 examinations, is used to
ity, specificity, and predictive value affect cost. Cost screen for colorectal cancer in people in their 50s,
is also affected by whether the test requires a special perfora- tions occur more often than cancers are
visit to the physician. Screening tests performed found.
while the patient is seeing his or her physician for Concerns have been raised about possible long-
other reasons (as is frequently the case with blood term risks with the increasing use of CT scans to
pressure mea- surements) are much cheaper for screen for coronary artery disease or, in the case of
patients than tests requiring special visits, extra time whole-body scans, a variety of abnormalities. The
off work, and addi- tional transportation. Cost also is radiation dose of CT scans varies by type, with a CT
determined by how scan for coronary calcium on average being the
often a screening test must be repeated.
equivalent of about 30, and a whole-body scan
about 120, chest x-rays. One
Chapter 10: Prevention 171
surgery as part of the diagnostic evaluation of the test result.
estimate of risk projected 29,000 excess cancers as a Because of false-positive screening tests, five
result of 70 million CT scans performed in the
United States in 2007 (22). If these concerns are
correct, CT scans used to screen for early cancer
could themselves cause cancer over subsequent
decades.
Acceptable to
Patients and
Clinicians
If a screening test is associated with discomfort, it
usually takes several years to convince large percent-
ages of patients to obtain the test. This has been
true for Pap smears, mammograms,
sigmoidoscopies, and colonoscopies. By and large,
however, the American public supports screening.
The acceptability of the test to clinicians may
be overlooked by all but the ones performing it.
Clini- cian acceptance is especially relevant for
screening tests that involve clinical skill, such as
mammogra- phy, sigmoidoscopy, or colonoscopy.
In a survey of 53 mammography facilities, 44%
indicated shortages of mammographers. The authors
speculated that low reimbursement for screening
mammograms, high levels of malpractice litigation
in breast imaging, and administrative regulations all
may be reasons (23).
UNINTENDED CONSEQUENCES
OF SCREENING
Adverse effects of screening tests include discomfort
during the test procedure (the majority of women
undergoing mammography say that the procedure is
painful, although usually not so severe that
patients refuse the test), long-term radiation effects
after expo- sure to radiographic procedures, false-
positive test results (with resulting needless workups
and negative labeling effects), overdiagnosis, and
incidentalomas. The last three will be discussed in
this section.
Table 10.4
Relation between Number of Different
Screening Tests Ordered and Percentage
of Normal People with at Least One
Abnormal Test Result
Example
In a clinical trial of lung cancer screening with low-dose spiral CT
Chapter 10: Prevention 173
Size
Nonprogressive
Abnormal cell
Time
Death from
other causes
Figure 10.8 ■ Mechanism of overdiagnosis in cancer screening. Note that non-
progressive, as well as some very slow-growing, cancers will never cause clinical harm. When these
cancers are found on screening, overdiagnosis has occurred. Overdiagnosis is an ex- treme form of
length-time bias. (Redrawn with permission from Welsh HG. Should I Be Tested for Cancer? Maybe
Not and Here’s Why. Berkeley, CA: University of California Press; 2004.)
can help determine the amount of overdiagnosis, it them in prevention. For example, CT scans and mag-
is impossible to identify it in an individual patient. netic resonance imaging were developed for
diagnostic purposes in patients with serious
Incidentalomas complaints or known disease and PSA was
developed to determine whether treatment for
Over the past couple of decades, using CT as a
prostate cancer was successful. All of these tests are
screening test has become more common. CT has
now commonly used as screening tests, but most
been evaluated rigorously as “virtual colonoscopy”
became common in practice without careful
for colorectal cancer screening and also for lung
evaluation. Only low-dose CT scans for lung
cancer screening. It has been advocated as a
cancer screening underwent careful evaluation prior
screening test for coronary heart disease (with
to wide- spread use. PSA screening became so
calcium scores) and for screening in general with
common in the United States that when it was
full-body CT scans. Unlike most screening tests, CT
subjected to a careful randomized trial, more than
often visualizes much more than the targeted area of
half the men assigned to the control arm had a PSA
interest. For example, CT colonography visualizes
test during the course of a trial. When tests are so
the abdomen and lower thorax. In the process,
commonly used, it is difficult to determine rigorously
abnormalities are sometimes detected outside the
whether they are effective.
colon. Masses or lesions detected incidentally by an
Over time, improvements in screening tests, treat-
imaging examination are called incidentalomas.
ments, and vaccinations may change the need for
screening. As indicated earlier, effective secondary
Example
A systematic review of 17 studies found that incidentalomas were common in CT colonog- raphy; 40% of 3,488 patients h
Conceptually, the decision should be based on the Cost-effectiveness analysis is a method for
weighing the magnitude of benefits against the assessing the costs and health benefits of an inter-
magni- tude of harms that will occur as a result of vention. All costs related to disease occurrence
the action. This approach has become common and treatment should be counted, both with and with-
when making treatment decisions; reports of out the preventive activity, as well as all costs related to
randomized trials rou- tinely include harms as well the preventive activity itself. The health benefits
as benefits. of the activity are then calculated, and the incremen-
A straightforward approach is to present the ben- tal cost for each unit of benefit is determined.
efits and harms for a particular preventive activity
in some orderly and understandable way. When-
ever possible, these should be presented using abso-
lute, not relative risks. Figure 10.9 summarizes the Example
estimated key benefits and harms of annual mam-
mography for women in their 40s, 50s, and 60s Cervical cancer is caused by persistent infection of epithelial cell
(7,32,33). Such an approach can help clinicians $58,500 per QALY. (Upper limits of acceptable cost-effectiveness
and patients understand what is involved when
mak- ing the decision to screen. It can also help
clarify why different individuals and expert groups
come to different decisions about a preventive
activity, even when looking at the same set of
information. Dif- ferent people put different
values on benefits and harms (34).
Another approach to weighing benefits and harms
is a modeling process that expresses both benefits
and harms in a single metric and then subtracts
harms from benefits. (The most common metric used
is the quality adjusted life year [QALY]). The
advantage of this approach is that different types
of prevention (e.g., vaccinations, colorectal cancer
screening, and tertiary treatment of diabetes) can all
be compared to each other, which is important for
policymakers with limited resources. The
disadvantage is that for most clinicians and
policymakers, it is difficult to under- stand the
process by which benefits and harms are handled.
Regardless of the method used in weighing the
benefits and harms of preventive activities, the qual-
ity of the evidence for each benefit and harm must be
evaluated to prevent the problem of “garbage in, gar-
bage out.” Several groups making recommendations
for clinical prevention have developed explicit meth-
ods to evaluate the evidence and take into account
the strength of evidence when making their recom-
mendations (see Chapter 14).
If the benefits of a preventive activity outweigh
the harms, the final step is to determine the eco-
nomic effect of using it. Some commentators like
to claim that “prevention saves money,” but it
does so only rarely. (One possible exception is
screening for colorectal cancer. Chemotherapy for
the disease has become so expensive that some
analyses now find screening for this cancer saves
money.) Even so, most preventive services
recommended by groups who have carefully
evaluated the data are as cost-effective as other
clinical activities.
Chapter 10: Prevention 179
A
700 ≥1 False-positive mammogram
615 615
600
500
400
400
Number of
300
B
45 Development of breast cancer
38
40
Breast cancer cured by treatment regardless
35 of screening
25
30
25
25
Number of
The effort to gather all the information needed recommendation has taken into account the
to make a decision whether to conduct a strength of the evidence. They can also look for
preventive activity in clinical practice is not estimates of cost-effectiveness. With these facts,
something a single clinician can accomplish, but they should be able to share with their patients the
when reviewing recom- mendations about information they need. Patients can then make an
prevention, individual clinicians can determine if informed decision about preventive activities that
the benefits and harms of the activ- ity are takes into account the scientific information and their
presented in an understandable way and if the individual values.
Revie w Question s
For questions 10.1–10.6, read the related
B. The positive predictive value of the
scenarios and select the best answer.
test was low.
C. The negative predictive value of the
A study was conducted to determine whether a test was low.
fecal occult blood screening test reduced mortality
from colorectal cancer (11). People ages 50 to 60 In a randomized controlled trial of screening chest
years were randomized to the screening test or to x-rays and sputum cytology for lung cancer, approxi-
a control group and followed for 13 years. Over mately 9,000 men were randomized to screening for
this time, there were 323 cancer cases and 82 6 years or a control group (16). After 20 years, the
colorectal cancer deaths in the 15,570 people lung cancer mortality was the same in both groups
randomized (4.4/1,000 person-years in the screened group and
to annual screening; there were 356 cancers and 3.9/1,000 person-years in the control group).
121 colorectal cancer deaths in the 15,394 How- ever, the median survival for patients
people randomized to the control group. diagnosed with lung cancer was 1.3 years in the
Investigations of positive tests found that about screened group and
30% of the screened group had colon polyps. The 0.9 years in the control group. Also, screening found
sensitivity and specificity of the test for colon more lung cancer—206 cancers were diagnosed in
cancer were both about 90%. the screened group and 160 in the control group.
10.1. What is the relative risk reduction of 10.4. What is the best conclusion after reading
colorectal mortality in the screened group? such a study?
A. 33% A. Finding a better survival rate but not
B. 39% a change in the mortality rate of
C. 48% lung cancer makes no sense and the
study must be flawed.
10.2. How many patients would you need to B. Because mortality did not change,
screen over the next 13 years to prevent one screening may have resulted in more
death from colorectal cancer? “disease time” for those diagnosed
with lung cancer.
A. 43 C. Improved survival demonstrates
B. 194 screening was effective in the study.
C. 385
10.5. What bias is the best explanation for the
10.3. The fact that 30% of the screened group improved survival in the face of no improve-
had colon polyps suggests all of the ment in the mortality rate in this study?
following except:
A. Lead-time bias
A. At least 30% of the screened group was B. Survival bias
investigated for positive fecal occult C. Compliance bias
blood tests. D. Length-time bias
Chapter 10: Prevention 181
10.6. What is the most likely reason for the fact A. People who refuse screening are
that 206 lung cancers were found in the usually healthier than those who
screened group and only 160 in the control accept.
group? B. Volunteers for screening are more likely
to need screening than those who refuse.
A. There were more smokers in the screened
C. Volunteers tend to be more interested
group.
in their health than those who do
B. Screening found cancers earlier and the
not participate in preventive
number of cancers in the control
activities.
group will catch up over time.
C. Screening picked up some cancers
10.10. All of the following statements are correct
that would not have come to
except:
medical attention without
screening. A. The gold standard for a test used for
diagnosis may be different than that for
the same test when used for screening.
For questions 10.7–10.11, choose the best
B. The incidence method cannot be used
answer.
to calculate sensitivity for cancer
screening tests.
10.7. When assessing a new vaccine, which of
C. When a screening program is begun,
the following is least important:
more people with disease are found on
A. Efficacy in preventing the disease the first round of screening than on later
B. Safety of the vaccine rounds.
C. Danger of the disease
D. Cost of giving the vaccine 10.11. For a cost-effectiveness analysis of a
preven- tive activity, which kinds of costs
10.8. When the same test is used in diagnostic and should be included?
screening situations, which of the following
A. Medical costs, such as those
statements is most likely correct?
associated with delivering the
A. The sensitivity and specificity will likely preventive intervention
be the same in both situations. B. All medical costs, including diagnostic
B. The positive predictive value will follow up of positive tests and treatment
be higher in a screening situation. for persons diagnosed with disease, with
C. Disease prevalence will be higher in and without the preventive activity
the diagnostic situation. C. Indirect costs, such as loss of income due
D. Overdiagnosis is equally likely in both to time off from work, among patients
situations. receiving the prevention and those who
develop the disease
10.9. A study found that volunteers for a new D. Indirect costs for both patients and care
screening test had better health outcomes givers
than people who refused testing. Which of E. All of the above
the following statements most likely explains
the finding? Answers are in Appendix
REFERENCES A.
MD. Available at http://seer.cancer.gov/csr/1975_2008/, based 21. Lansdorp-Vogelaar I, Knudsen AB, Brenner H. Cost-
on November 2010 SEER data submission, posted to the effective- ness of colorectal cancer screening. Epi Rev
SEER Web site, 2011. Accessed January 13, 2012. 2011;33:88–100.
8. Chang MH, You SL, Chen CJ, et al. Decreased incidence 22. Berrington de González A, Mahesh M, Kim K-P, et al. Radia-
of hepatocellular carcinoma in hepatitis B vaccinees: a 20 tion dose associated with common computed tomography
year follow-up study. J Natl Cancer Inst 2009;101:1348– examinations and the associated lifetime attributable risk of
1355. cancer. Arch Intern Med 2009;169:2071–2077.
9. Greene SK, Rett M, Weintraub ES, et al. Risk of 23. D’Orsi CD, Shin-Ping Tu, Nakano C. Current realities of
confirmed Guillain-Barré Syndrome following receipt of delivering mammography services in the community: do chal-
monovalent inactivated influenza A (H1N1) and seasonal lenges with staffing and scheduling exist? Radiology 2005;235:
influenza vac- cines in the Vaccine Safety Datalink Project, 391–395.
2009-2010. Am J Epidemiol 2012;175:1100–1109. 24. Buys SS, Partridge E, Black A, et al. Effect of screening
10. Fiore MC, Jaén CR, Baker TB, et al. Treating tobacco use on ovarian cancer mortality. The Prostate, Lung, Colorectal
and dependence: 2008 Update. Clinical Practice Guideline. and Ovarian (PLCO) cancer screening randomized
Rockville, MD: U.S. Department of Health and Human controlled trial. JAMA 2011;305:2295–2303.
Ser- vices. Public Health Service. May 2008. 25. Meador CK. The last well person. N Engl Med J 1994;330:
11. Mandel JS, Bond JH, Church TR, et al. (for the Minnesota 440–441.
Colon Cancer Control Study). Reducing mortality from 26. Croswell JM, Baker SG, Marcus PM, et al. Cumulative
colorectal cancer by screening for fecal occult blood. N inci- dence of false-positive test results in lung cancer
Engl J Med 1993;328:1365–1371. screening: a randomized trial. Ann Intern Med
12. Marcus PM, Bergstralh EJ, Fagerstrom RM, et al. Lung 2010;152:505–512.
can- cer mortality in the Mayo Lung Project: impact of 27. Croswell JM, Kramer BS, Kreimer AR. Cumulative incidence
extended follow-up. J Natl Cancer Inst. 2000;92:1308– of false-positive results in repeated multimodal cancer screen-
1316. ing. Ann Fam Med 2009;7:212–222.
13. The National Lung Screening Trial Research Team. Reduced 28. Fowler FJ, Barry MJ, Walker-Corkery BS. The impact of a
lung-cancer mortality with low-dose computed tomographic sus- picious prostate biopsy on patients’ psychological, socio-
screening. N Engl J Med 2011;365:395–409. behav- ioral, and medical care outcomes. J Gen Intern Med
14. Gæ´de P, Lund-Andersen H, Hans-Henrik P, et al. Effect of a 2006; 21: 715–721.
multifactorial intervention on mortality in type 2 diabetes. 29. Schilling FH, Spix C, Berthold F, et al. Neuroblastoma
N Engl J Med 2008;358:580–591. screen- ing at one year of age. N Engl J Med
15. Friedman GD, Collen MF, Fireman BH. Multiphasic 2002;346:1047–1053.
health checkup evaluation: a 16-year follow-up. J Chron Dis 30. Woods WG, Gao R, Shuster JJ, et al. Screening of infants and
1986;39: 453–463. mortality due to neuroblastoma. N Engl J Med 2002;346:
16. Avins AL, Pressman A, Ackerson L, et al. Placebo adherence 1041–1046.
and its association with morbidity and mortality in the studies 31. Xiong T, Richardson M, Woodroffe R, et al. Incidental
of left ventricular dysfunction. J Gen Intern Med 2010;25: lesions found on CT colonography: their nature and
1275–1281. frequency. Br J Radiol 2005;78:22–29.
17. Selby JV, Friedman GD, Quesenberry CP, et al. A case- 32. Hubbard RA, Kerlikowske K, Flowers CI, et al.
control study of screening sigmoidoscopy and mortality from Cumulative probability of false-positive recall or biopsy
colorec- tal cancer. N Eng J Med 1992;326:653–657. recommendation after 10 years of screening
18. Gann PH, Hennekens CH, Stampfer MJ. A prospective evalu- mammography. Ann Intern Med 2011;155:481–492.
ation of plasma prostate-specific antigen for detection of pros- 33. Mandelblatt JS, Cronin KA, Bailey S, et al. Effects of
tatic cancer. JAMA 1995;273:289–294. mam- mography screening under different screening
19. Delongchamps NB, Singh A, Haas GP. The role of prevalence schedules: model estimates of potential benefits and
in the diagnosis of prostate cancer. Cancer Control harms. Ann Intern Med 2009;151:738–747.
2006;13: 158–168. 34. Gillman MW, Daniels SR. Is universal pediatric lipid screen-
20. Carney PA, Miglioretti DL, Yankaskas BC, et al. ing justified? JAMA 2012;307:259–260.
Individual and combined effects of age, breast density, and 35. Goldie SJ, Kohli M, Grima D. Projected clinical benefits and
hormone replacement therapy use on the accuracy of cost-effectiveness of a human papillomavirus 16/18 vaccine. J
screening mam- mography. Ann Intern Med Natl Cancer Inst 2004;96:604–615.
2003;138:168–175.
18 Clinical Epidemiology: The
C h a p t e r 11
Chance
It is a common practice to judge a result significant, if it is of such a magnitude that it
would have been produced by chance not more frequently than once in twenty trials.
This is an arbitrary, but convenient, level of significance for the practical investigator, but
it does not mean that he allows himself to be deceived once in every twenty
experiments.
—Ronald Fisher
1929 (1)
175
17 Clinical Epidemiology: The
(called the “null hypothesis”) that there is no dif- more effective. Error of this kind, resulting in a
ference. This traditional way of assessing the role of “false- positive” conclusion that the treatment is
chance, associated with the familiar “P value,” has effective, is referred to as a type I error or a error,
been popular since statistical testing was introduced the probability of saying that there is a difference in
at the beginning of the 20th century. The treatment effects when there is not. On the other
hypothesis testing approach leads to dichotomous hand, the new treat- ment might be more effective,
conclusions: Either an effect is present or there is but the study concludes that it is not. This “false-
insufficient evi- dence to conclude an effect is negative” conclusion is called a type II error or β
present. error—the probability of saying that there is no
The other approach, called estimation, uses difference in treatment effects when there is. “No
sta- tistical methods to estimate the range of values difference” is a simplified way of saying that the
that is likely to include the true value—of a rate, true difference is unlikely to be larger than a
measure of effect, or test performance. This approach certain size, which is considered too small to be of
has gained popularity recently and is now favored by prac- tical consequence. It is not possible to
most medi- cal journals, at least for reporting main establish that there is no difference at all between
effects, for reasons described below. two treatments.
Figure 11.1 is similar to 2 2 tables comparing
HYPOTHESIS TESTING the results of a diagnostic test to the true diagnosis
(see Chapter 8). Here, the “test” is the conclusion
In the usual situation, the principal conclusions of of a clinical trial based on a statistical test of results
a trial are expressed in dichotomous terms, such as from the trial’s sample of patients. The “gold
a new treatment is either better or not better than standard” for validity is the true difference in the
usual care, corresponding to the results being either treatments being compared—if it could be
statistically significant (unlikely to be purely by established, for example, by making observations on
chance) or not. There are four ways in which the sta- all patients with the illness or a large number of
tistical conclusions might relate to reality (Fig. 11.1). samples of these patients. Type I error is analogous
Two of the four possibilities lead to correct con- to a false-positive test result, and type II error is
clusions: (i) The new treatment really is better, and analogous to a false-negative test result. In the
that is the conclusion of the study; and (ii) the absence of bias, random variation is responsible for
treat- ments really have similar effects, and the the uncertainty of a statistical conclusion.
study con- Because random variation plays a part in all
cludes that a difference is unlikely. obser- vations, it is an oversimplification to ask
whether chance is responsible for the results.
False-Positive and False-Negative Rather, it is a question of how likely random
Statistical Results variation is to account for the findings under the
There are also two ways of being wrong. The new particular conditions of the study. The probability
treatment and usual care may actually have similar of error due to random varia- tion is estimated by
effects, but it is concluded that the new treatment means of inferential statistics, a quantitative
is science that, given certain assumptions about the
mathematical properties of the data, is the basis for
calculations of the probability that the results
TRUE could have occurred by chance alone.
DIFFERENCE Statistics is a specialized field with its own jargon
(e.g., null hypothesis, variance, regression, power,
Present Absent and modeling) that is unfamiliar to many clini-
cians. However, leaving aside the genuine complex-
Significant Correct Type I (α) ity of statistical methods, inferential statistics should
CONCLUSION error be regarded by the non-expert as a useful means to
OF an end. Statistical testing is a means by which the
STATISTICAL effects of random variation are estimated.
TEST Not Type II
Correct The next two sections discuss type I and type II
significant ( β) error
errors and place hypothesis testing, as it is used to
estimate the probabilities of these errors, in context.
Figure 11.1 ■ The relationship between the results of
a statistical test and the true difference between two
treatment groups. (Absent is a simplification. It really means Concluding That a Treatment Works
that the true difference is not greater than a specified
Most statistics encountered in the medical litera-
amount.)
Chapter 11: Chance 177
ture concern the likelihood of a type I error
and are
17 Clinical Epidemiology: The
Example
Table 11.1
donepezil and placebo groups. These Some Statistical Tests Commonly Used in
included entering institutional care and Clinical Research
progression of disability (both primary end
points) as well as behavioral and Test When Used
psychological symptoms, caregiver
To Test the Statistical Significance of a Difference
psychopathology, formal care costs, unpaid
Chi square (2) Between two or more
caregiver time, and adverse events or
proportions (when there are a
death. The authors concluded that the
large number of observations)
benefits of donepezil were “below
Fisher’s exact Between two proportions (when
there are a small number of
observations)
On the other hand, very unimpressive P values
can result from studies with strong treatment Mann-Whitney U Between two medians
effects if there are few patients in the study. Student t Between two means
F test Between two or more means
Statistical Tests To Describe the Extent of Association
Statistical tests are used to estimate the Regression Between an independent
probability of a type I error. The test is applied to coefficient (predictor) variable and a
the data to obtain a numerical summary for those dependent (outcome) variable
data called a test statistic. That number is then Pearson’s r Between two variables
compared to a sam- pling distribution to come up To Model the Effects of Multiple Variables
with a probability of a type I error (Fig. 11.2). The Logistic regression With a dichotomous outcome
distribution is under the null hypothesis, the
Cox proportional With a time-to-event outcome
proposition that there is no true difference in hazards
outcome between treatment groups. This device is
for mathematical reasons, not because “no
difference” is the working scientific hypoth- esis of the
investigators conducting the study. One ends up The chi-square (2) test for nominal data (counts)
rejecting the null hypothesis (concluding there is a is more easily understood than most and can be used
difference) or failing to reject it (concluding that to illustrate how statistical testing works. The extent
there is insufficient evidence in support of a to which the observed values depart from what
difference). Note that not finding statistical would have been expected if there were no treatment
significance is not the same as there being no effect is used to calculate a P value.
difference. Statistical testing is not able to establish
that there is no difference at all.
Some commonly used statistical tests are listed in Example
Table 11.1. The validity of many tests depends on Cardiac arrest outside the hospital has a poor outcome. Animal st
certain assumptions about the data; a typical assump-
tion is that the data have a normal distribution. If the
data do not satisfy these assumptions, the resulting
P value may be misleading. Other statistical tests,
called non-parametric tests, do not make
assump- tions about the underlying distribution of
the data. A discussion of how these statistical tests
are derived and calculated and of the assumptions on
which they rest can be found in any biostatistics
textbook.
Statistical Compare to
test standard
distribution
Figure 11.2 ■ Statistical testing.
Chapter 11: Chance 181
randomized to cooling (hypothermia) or usual expected if there were no treatment effect. Because they a
care (4). The primary outcome was survival to The 2 statistic for these data is:
hospital discharge with relatively good neuro-
logic function.
Observed Rates
Survival with Good
Neurological Function
(21 16.75)2 (9 13.25)2 (22 26.25)2
Yes No Total
16.75 13.25 26.75
Hypothermia 21 22 43 (25 20.75) 2 4.0
Usual care 9 25 34 20.75
tively obvious that the larger the 2,30
Total the more likely
47 chance
77 is to account for the ob- served differences. The resulting P value for a
trials have misrepresented the truth because these Visual presentation of negative results can be con-
particular studies had the bad luck to turn out in a vincing. Alternatively, one can examine confidence
relatively unlikely way? intervals (see Point Estimates and Confidence Inter-
vals, below) and learn a lot about whether the
Example
One of the examples in Chapter 9 was a ran- domized controlled trial of the effects on car- diovascular outcomes of adding
50
Cumulative percent of
30
10
Placebo plus statin
0
0 1 2 3 4
Years
Number at risk
Niacin plus statin 1,718 1,606 1,366 903 428
Placebo plus statin 1,696 1,581 1,381 910 436
and finds no difference. You are aware that to detect the smallest degree of improvement that
random variation can be the reason for whatever would be clinically meaningful?” On the other hand,
differences are or are not observed, and you if one is interested in detecting only very large dif-
wonder if the num- ber of patients in this study is ferences between treated and control groups (i.e.,
large enough to make chance an unlikely strong treatment effects) then fewer patients need
explanation for what was found. Alternatively, you to be studied.
may be planning to do such a study and have the
same question. Either way, you need to understand Type I Error
how many patients would be needed to make a
Sample size is also related to the risk of a type I error
strong comparison of the effects of the two
(concluding that treatment is effective when it is not).
treatments?
The acceptable probability for a risk of this kind is
a value judgment. If one is prepared to accept the
Statistical Power
con- sequences of a large chance of falsely
The probability that a study will find a statistically concluding that the treatment is effective, one can
significant difference when a difference really exists reach conclusions with fewer patients. On the
is called the statistical power of the study. Power other hand, if one wants to take only a small risk
and P are complementary ways of expressing the of being wrong in this way, a larger number of
same concept. patients will be required. As dis- cussed earlier, it
Statistical power 1 – P is customary to set P at 0.05 (1 in 20) or
sometimes 0.01 (1 in 100).
Power is analogous to the sensitivity of a diagnos-
tic test. One speaks of a study being powerful when it Type II Error
has a high probability of detecting differences
when treatments really do have different effects. The chosen risk of a type II error is another
determi- nant of sample size. An acceptable
probability of this error is also a judgment that can
Estimating Sample Size be freely made and changed to suit individual tastes.
Requirements Probability of P is often set at 0.20, a 20% chance of
From the point of view of hypothesis testing of missing true differ- ences in a particular study.
nominal data (counts), an adequate sample size Conventional type II errors are much larger than type
depends on four characteristics of the study: the I errors, reflecting a higher value placed on being
magnitude of the difference in outcome between sure an effect is really present when it is said to be.
treatment groups, P and P (the probability of the
false-positive and false-negative conclusions you Characteristics of the Data
are willing to accept), and the underlying outcome
rate. The statistical power of a study is also determined by
These determinants of adequate sample size the nature of the data. When the outcome is
should be taken into account when investigators plan expressed by counts or proportions of events or
a study, to ensure that the study will have enough sta- time-to-event, its statistical power depends on the
tistical power to produce meaningful results. To the rate of events: The larger the number of events, the
extent that investigators have not done this well, or greater the statistical power for a given number of
some of their assumptions were found to be people at risk. As Peto et al. (6) put it:
inaccu- rate, readers need to consider the same In clinical trials of time to death (or of the time to
issues when interpreting study results. some other particular “event”—relapse, metastasis,
first thrombosis, stroke. recurrence, or time to death
Effect Size from a particular cause—the ability of the trial to
distinguish between the merits of two treatments
Sample size depends on the magnitude of the dif- depends on how many patients die (or suffer a rel-
ference to be detected. One is free to look for dif- evant event), rather than on the number of patients
ferences of any magnitude and of course one entered. A study of 100 patients, 50 of whom die, is
hopes to be able to detect even very small about as sensitive as a study with 1,000 patients,
differences, but more patients are needed to detect 50 of whom die.
small differ- ences, everything else being equal. If the data are continuous, such as blood pres-
Therefore, it is best to ask, “What is a sufficient sure or serum cholesterol, power is affected by the
number of patients
Chapter 11: Chance 185
Table 11.2
Determinants of Sample Size
Determined by
Date Type
Investigator Means Counts
1 1
Sample size varies according to: OR Variability
Effect size, P, P Outcome rate
Interrelationships
The relationships among the four variables that
together determine an adequate sample size are sum-
marized in Table 11.2. The variables can be traded
off against one another. In general, for any given
number of patients in the study, there is a trade-off
between type 1 and type 2 errors. Everything else
being equal, the more one is willing to accept one
kind of error, the less it will be necessary to risk
the other. Neither kind of error is inherently worse
than the other. It is, of course, possible to reduce
both type 1 and type 2 errors if the number of
patients is increased, outcome events are more
frequent, vari- ability is decreased, or a larger
treatment effect is sought.
For conventional levels of P and P, the
relationship between the size of treatment effect
and the number of patients needed for a trial is
illustrated by the following examples. One repre-
sents a situation in which a relatively small num-
ber of patients was sufficient, and the other is one
in which a very large number of patients was too
small.
18 Clinical Epidemiology: The
500
For most of the therapeutic questions encountered
today, a surprisingly large sample size is required.
The value of dramatic, powerful treatments, such as
anti- biotics for pneumonia or thyroid replacement
for hypothyroidism, was established by clinical
expe- rience or studying a small number of 0 20 40 60 80 100
patients, but
such treatments come along rarely and many of them Proportional reduction in event rate (%)
are already well established. We are left with diseases, Figure 11.4 ■ The number of people required in each
many of which are chronic and have multiple, of two treatment groups (of equal size), for various
inter- acting causes, for which the effects of new rates of outcome events in the untreated group, to
treatments are generally small. This makes it especially have an 80% chance of detecting a difference (P =
important to plan clinical studies that are large 0.05) in reduction in outcome event rates in treated
enough to distin- guish real from chance effects. relative to untreated patients. (Calculated from formula
Figure 11.4 shows the relationship between in Weiss NS. Clinical epidemiology. The study of the outcome
sam- ple size and treatment difference for several of illness. New York: Oxford University Press; 1986.)
baseline rates of outcome events. Studies involving
fewer than 100 patients have a poor chance of
detecting statisti- cally significant differences for Statistical precision is expressed as a confidence
even large treatment effects. Looked at another way, interval, usually the 95% confidence interval,
it is difficult to detect effect sizes of 25%. In around the point estimate. Confidence intervals
practice, statistical power can be estimated by means are interpreted as follows: If the study is unbiased,
of readily available formulas, tables, nomograms, there is a 95% chance that the interval includes the
computer programs, or Web sites. true effect size. The more narrow the confidence
interval, the more certain one can be about the size of
the true effect. The true value is most likely to be close
POINT ESTIMATES AND to the point estimate, less likely to be near the
outer limits of the interval, and could (5 times out
CONFIDENCE INTERVALS of 100) fall outside these limits altogether. Statistical
The effect size that is observed in a particular preci- sion increases with the statistical power of the
study (such as treatment effect in a clinical trial or study.
relative risk in a cohort study) is called the point
estimate of the effect. It is the best estimate from
the study of the true effect size and is the summary Example
statistic usually given the most emphasis in reports The Women’s Health Initiative included a randomized controll
of research.
However, the true effect size is unlikely to be
exactly that observed in the study. Because of ran-
dom variation, any one study is likely to find a result
higher or lower than the true value. Therefore, a sum-
mary measure is needed for the statistical precision
of the point estimate, the range of values likely to
encompass the true effect size.
Chapter 11: Chance 187
Risk = 1 1 1 1
100 1,000 10,000 100,000
1.0
0.8
0.6
Probability of
0.4
0.2
This phenomenon is referred to as the multiple comparisons really were made. Sometimes, interest-
comparisons problem. Because of this problem, ing findings have been selected from a larger number
the strength of evidence from clinical research of uninteresting ones that are not mentioned. This
depends on how focused its questions were at the process of deciding after the fact what is and is not
outset. important about a mass of data can introduce con-
Unfortunately, when the results of research are siderable distortion of reality. Table 11.3 summarizes
presented, it is not always possible to know how how this misleading situation can arise.
many How can the statistical effects of multiple com-
19 Clinical Epidemiology: The
parisons be taken into account when
interpreting research? Although ways of
adjusting P values have been proposed,
probably the best advice is to be aware of the
problem and to be cautious about accepting
positive conclusions of studies in which
multiple comparisons were made. As put by
Armitage (16):
If you dredge the data sufficiently deeply
and suf- ficiently often, you will find
something odd. Many of these bizarre
findings will be due to chance. I do not
imply that data dredging is not an
occupation for honorable persons, but
rather that discoveries that were not initially
postulated as among the major objectives of
the trial should be treated with extreme
caution.
Example
Atrial fibrillation is treated with anticoagu- lants, vitamin K antagonists in high-risk pa- tients, to prevent stroke. Investi
Multiple Outcomes
Another version of multiple looks at the data is to
report multiple outcomes—different manifesta-
tions of effectiveness, intermediate outcomes, and
harms. Usually this is handled by naming one of
the outcomes primary and the others secondary and
then being more guarded about conclusions for
the secondary outcomes. As with subgroups,
outcomes
Chapter 11: Chance 193
Age
<65 yr 1,714 19 (2.0) 7 (0.7)
65 to <75 yr 1,987 28 (2.7) 24 (2.0)
≥75 yr 1,897 66 (6.1) 20 (2.0)
Age
Female 2,321 64 (4.9) 25 (1.9)
Male 3,277 49 (2.7) 26 (1.4)
Estimated GFR
<50 mL/min 1,198 36 (5.8) 16 (2.5)
50 to <80 mL/min 2,374 59 (4.5) 22 (1.7)
≥80 mL/min 2,021 18 (1.6) 13 (1.1)
CHADS2 score
0–1 2,026 18 (1.6) 10 (0.9)
2 1,999 40 (3.7) 25 (2.1)
≥3 1,570 55 (6.3) 16 (1.9)
Heart failure No
Yes 3,428 66 (3.6) 28 (1.5)
2,171 45 (3.8) 23 (1.8)
Table 11.4 taken into account, there would only be, at most,
Guidelines for Deciding Whether about 15 patients in each subgroup; if patients
Apparent Differences in Effects were unevenly distributed among subgroups, there
within Subgroups Are Reala would be even fewer in some.
What is needed then, in addition to tables
From the study itself: show- ing multiple subgroups, is a way of
• Is the magnitude of the observed difference examining the effects of several variables together.
clinically important? This is accom- plished by multivariable modeling
• How likely is the effect to have arisen by —developing a mathematical expression of the
chance, taking into account: effects of many vari- ables taken together. It is
the number of subgroups examined? “multivariable” because it examines the effects of
the magnitude of the P value? multiple variables simultane- ously. It is “modeling”
• Was a hypothesis that the effect would be observed because it is a mathematical construct, calculated
made before its discovery (or was justification for the from the data based on assump- tions about
effect argued for after it was found)?
characteristics of the data (e.g., that the variables
• Was it one of a small number of hypotheses?
are all normally distributed or all have the same
From other information:
variance).
• Was the difference suggested by comparisons within Mathematical models are used in two general
rather than between studies? ways in clinical research. One way is to study the
• Has the effect been observed in other studies?
indepen- dent effect of one variable on outcome
• Is there direct evidence that supports the existence
of the effect?
while taking into account the effects of other
a
variables that might confound or modify this
Adapted from Oxman AD, Guyatt GH. A consumer’s guide to
subgroup analysis. Ann Intern Med 1992;116:78–84. relationship (discussed under multivariable
adjustment in Chapter 5). The second way is to
predict a clinical event by calculating the combined
tend to be related to each other biologically (and effect of several variables acting together (introduced
as a consequence statistically), as is the case in the in concept under Clinical Prediction Rules in
above example where stroke and systemic embolism Chapter 7).
are different manifestations of the same clinical The basic structure of a multivariable model is:
phenomenon. Outcome variable constant (1 variable1)
(2 variable2) . . .,
MULTIVARIABLE METHODS where 1, 2, . . . are coefficients determined by
Most clinical phenomena are the result of many vari- the data, and variable1, variable2, . . . are the
ables acting together in complex ways. For example, variables that might be related to outcome. The
coronary heart disease is the joint result of lipid best estimates of the coefficients are determined
abnormalities, hypertension, cigarette smoking, fam- mathematically and depend on the powerful
ily history, diabetes, diet, exercise, inflammation, calculating ability of modern computers.
coagulation abnormalities, and perhaps personality. It Modeling is done in many different ways, but
is appropriate to try to understand these relationships some elements of the process are basic.
by first examining relatively simple arrangements of 1. Identify all the variables that might be related
the data, such as stratified analyses that show to the outcome of interest either as confounders
whether the effect of one variable is changed by the or effect modifiers. As a practical matter, it may
presence or absence of one or more of the other not be possible to actually measure all of them
variables. It is relatively easy to understand the and the missing variables should be mentioned
data when they are displayed in this way. explicitly as a limitation.
However, as mentioned in Chapter 7, it is usu- 2. If there are relatively few outcome events, the
ally not possible to account for more than a few vari- number of variables to be considered in the model
ables using this method because there are not enough might need to be reduced to a manageable size,
patients with each combination of characteristics usually no more than several. Often this is done
to allow stable estimates of rates. For example, if by selecting variables that, when taken one at a
120 patients were studied, 60 in each treatment time, are most strongly related to outcome. If a
group, and just one additional dichotomous statis- tical criterion is used at this stage, it is
variable was usual to err on the side of including variables, for
example, by choosing all variables showing an
association
Chapter 11: Chance 195
with the outcome of interest at a cutoff level of Some commonly used kinds of models are logis-
P 0.10. Evidence for the biologic importance tic regression (for dichotomous outcome variables
of the variable is also considered in making the such as those that occur in case-control studies)
selection. and Cox proportional hazards models (for time-to-
3. Models, like other statistical tests, are based on event studies).
assumptions about the structure of the data. Inves- Multivariable modeling is an essential strat-
tigators need to check whether these assumptions egy for dealing with the joint effects of multiple
are met in their particular data. variables. There is no other way to adjust for or
4. As for the actual models, there are many kinds to include many variables at the same time. How-
and many strategies that can be followed within ever, this advantage comes at a price. Models tend
models. All variables—exposure, outcome, and to be black boxes, and it is difficult to “get inside”
covariates—are entered in the model, with the them and understand how they work. Their
order determined by the research question. For validity is based on assumptions about the data
example, if some are to be controlled for in a that may not be met. They are clumsy at
causal analysis, they are entered in the model recognizing effect modification. An exposure
first, followed by the variable of primary inter- variable may be strongly related to outcome yet not
est. The model will then identify the appear in the model because it occurs rarely—and
independent effect of the variable of primary there is little direct information on the statistical
interest. On the other hand, if the investigator power of the model for that variable. Finally,
wants to make a prediction based on several model results are easily affected by quirks in the
variables, the relative strength of their data, the results of ran- dom variation in the
association to the outcome vari- able is characteristics of patients from sample to sample. It
determined by the model. has been shown, for example, that a model
frequently identified a different set of predictor
variables and produced a different order- ing of
Example
Gastric cancer is the second leading cause of cancer death in the world. Investigators in Europe analyzed data from a cohor
BAYESIAN REASONING
An altogether different approach to the
information contributed by a study is based
on Bayesian inference. We introduced this
approach in Chapter 8 where we applied it to
the specific case of diagnostic testing.
Bayesian inference begins with prior belief
about the answer to a research question,
analogous to pre- test probability of a
diagnostic test. Prior belief is based on
everything known about the answer up to
the point when new information is
contributed by a study. Then, Bayesian
inference asks how much the results of the
new study change that belief.
Chapter 11: Chance 197
Some aspects of Bayesian inference are compel- small number of hypotheses are identified before-
ling. Individual studies do not take place in an hand and multiple comparisons are not as worrisome.
infor- mation vacuum; rather, they are in the Rather, prior belief depends on the plausibility of the
context of all other information available at the assertion rather than whether the assertion was estab-
time. Starting each study from the null hypothesis lished before or after the study was begun.
—that there is no effect—is unrealistic because Although Bayesian inference is appealing, so far
something is already known about the answer to the it has been difficult to apply because of poorly
question before the study is even begun. Moreover, devel- oped ways of assigning numbers to prior
results of individual studies change belief in belief and to the information contributed by a
relation to both their scien- tific strengths and the study. Two exceptions are in cumulative summaries
direction and magnitude of their results. For of research evidence (Chapter 13) and in diagnostic
example, if all preceding studies were negative testing, in which “belief ” is prior probability and the
and the next one, which is of compa- rable new infor- mation is expressed as a likelihood ratio.
strength, is found to be positive, then an effect is However, Bayesian inference is the conceptual basis
still unlikely. On the other hand, a weak prior belief for qualita- tive thinking about cause (see Chapter
might be reversed by a single strong study. Finally, 12).
with this approach it is not so important whether a
Revie w Question s
Read the following and select the best was 238 mg/dL in the group receiving the
response. new drug and 240 mg/dL in the group
receiving the old drug (P 0.001). Which of
11.1. A randomized controlled trial of thrombo-
the following best describes the meaning of
lytic therapy versus angioplasty for acute the P value in this study?
myocardial infarction finds no difference
in the main outcome, survival to discharge A. Bias is unlikely to account for the
from hospital. The investigators explored observed difference.
whether this was also true for subgroups of B. The difference is clinically important.
patients defined by age, number of vessels C. A difference as big or bigger than what
affected, ejection fraction, comorbidity, and was observed could have arisen by
other patient characteristics. Which of the chance one time in 1,000.
following is not true about this subgroup D. The results are generalizable to
analysis? other patients with hypertension.
E. The statistical power of this study
A. Examining subgroups increases the was inadequate.
chance of a false-positive (misleading
statistically significant) result in one 11.3. In a well-designed clinical trial of treatment
of the comparisons. for ovarian cancer, remission rate at 1 year is
B. Examining subgroups increases the 30% in patients offered a new drug and
chance of a false-negative finding in 20% in those offered a placebo. The P value
one of these subgroups, relative to the is 0.4. Which of the following best
main result. describes the interpretation of this result?
C. Subgroup analyses are bad
scientific practice and should not A. Both treatments are effective.
be done. B. Neither treatment is effective.
D. Reporting results in subgroups helps C. The statistical power of this study
clinicians tailor information in the study is 60%.
to individual patients. D. The best estimate of treatment effect
size is 0.4.
11.2. A new drug for hyperlipidemia was com- E. There is insufficient information to
pared with placebo in a randomized con- decide whether one treatment is better
trolled trial of 10,000 patients. After 2 years, than the other.
serum cholesterol (the primary outcome)
19 Clinical Epidemiology: The
REFERENCES
1. Fisher R in Proceedings of the Society for Psychical
11. Mai PL, Wideroff L, Greene MH, et al. Prevalence of
Research, 1929, quoted in Salsburg D. The Lady Tasting Tea.
family history of breast, colorectal, prostate, and lung cancer
New York: Henry Holt and Co; 2001.
2. Johnson AF. Beneath the technological fix: outliers and prob- in a popu- lation-based study. Public Health Genomics
2010;13:495–503.
ability statements. J Chronic Dis 1985;38:957–961.
12. Venge P, Johnson N, Lindahl B, et al. Normal plasma levels
3. Courtney C, Farrell D, Gray R, et al for the AD2000
Collab- orative Group. Long-term donepezil treatment in 565 of cardiac troponin I measured by the high-sensitivity cardiac
troponin I access prototype assay and the impact on the diag-
patients with Alzheimer’s disease (AD2000): randomized
double-blind trial. Lancet 2004:363:2105–2115. nosis of myocardial ischemia. J Am Coll Cardiol 2009;54:
4. Bernard SA, Gray TW, Buist MD, et al. Treatment of 1165–1172.
13. McCormack K, Scott N, Go PMNYH, et al. Laparoscopic
coma- tose survivors of out-of-hospital cardiac arrest with
induced hypothermia. N Engl J Med 2002;346:557–563. techniques versus open techniques for inguinal hernia repair.
5. The AIM-HIGH Investigators. Niacin in patients with low Cochrane Database Syst Rev 2003;(1):CD001785.
14. Goodman SN, Berlin JA. The use of predicted confidence
HDL cholesterol levels receiving intensive statin therapy. N
Engl J Med 2011;365:2255–2267. intervals when planning experiments and the misuse of
6. Peto R, Pike MC, Armitage P, et al. Design and analysis of power when interpreting results. Ann Intern Med
randomized clinical trials requiring prolonged observation of 1994;121: 200–206.
15. Sackett DL, Haynes RB, Gent M, et al. Compliance. In:
each patient. I. Introduction and design. Br J Cancer 1976;34:
Inman WHW, ed. Monitoring for Drug Safety. Lancaster,
585–612.
7. Lind J. A treatise on scurvy. Edinburgh; Sands, Murray UK: MTP Press; 1980.
16. Armitage P. Importance of prognostic factors in the analysis
and Cochran, 1753 quoted by Thomas DP. J Royal Society
of data from clinical trials. Control Clin Trials 1981;1:347–
Med 1997;80:50–54.
8. Weinstein SJ, Yu K, Horst RL, et al. Serum 25- 353.
17. Hunter DJ, Kraft P. Drinking from the fire hose—statisti-
hydroxyvita- min D and risks of colon and rectal cancer in
cal issues in genomewide association studies. N Engl J
Finnish men. Am J Epidemiol 2011;173:499–508.
9. Rossouw JE, Anderson GL, Prentice RL, et al. for the Med 2007;357:436–439.
18. Connolly SJ, Eikelboom J, Joyner C, et al. Apixaban in
Wom- en’s Health Initiative Investigators. Risks and benefits
patients with atrial fibrillation. N Engl J Med 2011;364:
of estro- gen plus progestin in healthy postmenopausal
women: prin- ciple results from the Women’s Health 806–817.
19. Duell EJ, Travier N, Lujan-Barroso L, et al. Alcohol
Initiative randomized controlled trial. JAMA
consump- tion and gastric cancer risk in European
2002;288:321–333.
10. Braitman LE. Confidence intervals assess both clinical Prospective Investi- gation into Cancer and Nutrition
(EPIC) cohort. Am J Clin Nutr 2011;94:1266–1275.
signifi- cance and statistical significance. Ann Intern Med
20. Diamond GA. Future imperfect: the limitations of clinical
1991;114: 515–517.
prediction models and the limits of clinical prediction. J
Am Coll Cardiol 1989;14:12A–22A.
C h a p t e r 12
Cause
In what circumstances can we pass from observed association to a verdict of
causation? Upon what basis should we proceed to do so?
—Sir Austin Bradford Hill
1965
KEY WORDS
Web of causation
Decision analysis Example
In 1843, Oliver Wendell Holmes (then profes-
Aggregate risk studies
Cost-effectiveness sor of anatomy and physiology and later dean
Ecological studies
analysis Cost– of Harvard Medical School), published a study
Ecological fallacy
benefit analysis linking hand washing habits by obstetricians
Time-series studies
Multiple time-series and childbed (puerperal) fever, an often-fatal
studies disease following childbirth. (Puerperal fever
is now known to be caused by a bacterial in-
This book has been about three kinds of clinically fection.) Holmes’s observations led him to
use- con- clude that “the disease known as
ful information. One is description, a simple puerperal fever is so far contagious, as to be
statement of how often things occur, summarized frequently carried from patient to patient by
by metrics such as incidence and prevalence, as well as physicians and nurses (1).”
(in the case of diagnostic test performance) sensitivity, One response to Holmes’s assertion was
specificity, predictive value, and likelihood ratio. that the findings made no sense. “I prefer to
Another is pre- diction, evidence that certain attribute them [puerperal fever cases] to ac-
outcomes regularly fol- low exposures without regard cident, or Providence, of which I can form a
to whether the exposures are independent risk factors, conception, rather than to contagion of which
let alone causes. The third is either directly or I cannot form any clear idea, at least as to
implicitly about cause and effect. Is a risk factor an this particular malady,” wrote Charles Meigs,
independent cause of disease? Does treat- ment cause pro- fessor of midwifery and the diseases of
patients to get better? Does a prognostic factor women and children at Jefferson Medical
cause a different outcome, everything else being College. Around that time, a Hungarian
equal? This chapter considers cause in greater depth. physician, Ignaz Semmelweis, showed that
Another word for the study of the origination of disinfecting physi- cians’ hands reduced rates
disease is “etiology,” now commonly used as a syn- of childbed fever, and his studies were also
onym for cause, as in “What is the etiology of this dismissed because he had no generally
disease?” To the extent that the cause of disease is not accepted explanation for his findings.
known, the disease is said to be “idiopathic” or of Holmes’s and Semmelweis’s assertions were
“unknown etiology.” made decades before pioneering work— by
There is a longstanding tendency to judge the Louis Pasteur, Robert Koch, and Joseph Lister
legitimacy of a causal assertion by whether it makes —established the germ theory of disease.
sense according to beliefs at the time, as the
following historical example illustrates.
194
Chapter 12: Cause 195
The importance attached to a cause-and-effect retrovirus causes AIDS; and the discovery in 2003
relationship “making sense,” usually in terms of a that a coronavirus caused an outbreak of severe acute
bio- logic mechanism, is still imbedded in current respiratory syndrome (SARS) (2).
think- ing. For example, in the 1990s, studies
showing that eradication of Helicobacter pylori
infection prevented peptic ulcer disease were met Multiple Causes
with skepticism because everyone knew that ulcers For some diseases, one cause appears to be so
of the stomach and duo- denum were not an dom- inant that we speak of it as the cause. We
infectious disease. Now, H. pylori infection is say that Mycobacterium tuberculosis causes
recognized as a major cause of this disease. In this tuberculosis or that an abnormal gene coding for the
chapter, we review concepts of cause in clinical metabolism of phe- nylalanine, an amino acid, causes
medicine. We discuss the broader array of phenylketonuria. We may skip past the fact that
evidence, in addition to biologic plausibility, that tuberculosis is also caused by host and
strengthens or weakens the case that an association environmental factors and that the disease
represents a cause-and-effect relationship. We also phenylketonuria develops because there is
briefly deal with a kind of research design not yet phenylalanine in the diet.
considered in this book: studies in which exposure to More often, however, various causes make a more
a possible cause is known only for groups and not for balanced contribution to the occurrence of disease
the individuals in the groups. such that no one stands out. The underlying assump-
tion of Koch’s postulates, one cause– one disease, is
too simplified. Smoking causes lung cancer,
BASIC PRINCIPLES coronary artery disease, chronic obstructive
pulmonary dis- ease, and skin wrinkles. Coronary
Single Causes artery disease has multiple causes, including cigarette
In 1882, 40 years after the Holmes-Meigs con- smoking, hyper- tension, hypercholesterolemia,
frontation, Koch set forth postulates for determin- diabetes, inflamma- tion, and heredity. Specific
ing that an infectious agent is the cause of a parasites cause malaria, but only if the mosquito
disease (Table 12.1). Basic to his approach was the vectors can breed, become infected, and bite
assump- tion that a particular disease has one cause people, and those people are not taking antimalarial
and that a particular cause results in one disease. This drugs or are unable to control the infection on their
approach helped him to identify for the first time own.
the bacteria causing tuberculosis, diphtheria, When many factors act together, it has been called
typhoid, and other common infectious diseases of the “web of causation” (3). A causal web is well
his day. understood in chronic degenerative diseases such as
Koch’s postulates contributed greatly to the cardiovascular disease and cancer, but it is also the
con- cept of cause in medicine. Before Koch, it was basis for infectious diseases, where the presence of a
believed that many different bacteria caused any microbe is a necessary but not sufficient cause of dis-
given disease. The application of his postulates ease. AIDS cannot occur without exposure to HIV,
helped bring order out of chaos. They are still useful but exposure to the virus does not necessarily result
today. That a unique infectious agent causes a in disease. For example, exposure to HIV rarely
particular infectious disease was the basis for the results in seroconversion after needlesticks (about
discovery in 1977 that Legion- naire disease is 3/1,000) because the virus is not nearly as
caused by a gram-negative bacterium; the discovery infectious as, say, the hepatitis B virus. Similarly,
in the 1980s that a newly identified not everyone exposed to tuberculosis—in Koch’s
day or now—becomes infected.
When multiple causes act together, the
Table 12.1 resulting risk may be greater or less than would be
expected by simply combining the effects of the
Koch’s
1. Postulates
The organism must be present in every case of
the disease.
separate causes. That is, they interact—there is
2. The organism must be isolated and grown in effect modification. Figure 12.1 shows the 10-year
pure culture. risk of cardiovascular disease in a 60-year-old man
3. The organism must cause a specific disease with no prior history of cardiovascular disease
when inoculated into an animal. according to the presence or absence of several
4. The organism must then be recovered from common risk factors. The risk is greater than the
the animal and identified. sum of the effects of each indi- vidual risk factor.
The effect of low HDL is more in the presence of
elevated total cholesterol, the effect
19 Clinical Epidemiology: The
100
60
40
20
0
Total cholesterol (mg/dL)160280
HDL (mg/dL) 60 60 35
Smoking No No No Yes
Systolic blood pressure (mm Hg) 120 120 120 120 160
Daibetes mellitisNo No NoNoNoYes
Figure 12.1 ■ The interaction of multiple risk factors for cardiovascular dis-
ease. Ten-year cardiovascular risk (%) for a 60-year-old man with no risk
factors (left bar) and with the successive addition of five risk factors (bars to the
right). Each risk factor alone adds relatively little (several percent) to risk whereas
adding them to each other increases risk almost 10-fold, far more than the sum of
the individual risk factors acting independently, which is shown by the shaded
area of the right- hand bar. (Data from The Framingham Risk Calculator in
UpToDate, Waltham, MA according to formulae in D’Agostino RB Sr, Vasan RS,
Pencina MJ, et al. General cardiovascular risk profile for use in primary care. The
Framingham Heart Study. Circulation 2008;117(6):743–753.)
Crowding
Malnutrition Exposure to Tissue invasion
Vaccination Mycobacterium and reaction
Genetics
SUSCEPTIBLE HOST INFECTION TUBERCULOSIS
400
300
200
Antibiotics introduced
100
0
1840 1860 1880 1900 1920 1940 1960 1980 2000
Year
30,000
25,000
20,000
Number of
15,000
10,000
50,000
0
1980 1985 1990 1995 2000 2005 2010
Year
Figure 12.4 ■ Tuberculosis cases in the United States, 1980 through 2010. A
longstanding decline was halted in 1985. The number of cases reached a peak in
1992 and then began to decline again. (Adapted from Centers for Disease
Control and Prevention. Reported tuberculosis in the United States, 2010. Available at
http://www. cdc.gov/Features/dsTB2010Data/. Accessed February 8, 2012.)
who did not follow prescribed drug regimens, Examining Individual Studies
favored the development of multidrug resistant
strains. People who were more likely to have both One approach to evidence for cause-and-effect has
AIDS and tuberculosis—the socially disadvantaged, been discussed throughout this book: in-depth analy-
intravenous drug users, and prisoners—were sis of the studies themselves. When an association
developing multi- drug resistant disease and has been observed, a causal relationship is
exposing others in the pop- ulation to a difficult-to- established to the extent that the association cannot
treat strain. The interplay of environment, behavior, be accounted for by bias and chance. Figure 12.5
and molecular biology com- bined to reverse a summarizes a familiar approach. One first looks
declining trend in tuberculosis. To combat the new for bias and how much it might have changed the
epidemic of tuberculosis, the public health result, and then whether the association is unlikely
infrastructure was rebuilt. Multidrug regimens to be by chance. For observational studies,
(biologic efforts) and directly observing therapy to confounding is always a
ensure compliance (behavioral efforts) were initiated
and the rate of tuberculosis began to decline again. Explanation ASSOCIATION
possibility. Although confounding can be controlled a long way toward increasing or decreasing its
in comprehensive, state-of-the science ways, it is validity, regardless of the type of design used. A bad
almost never possible to rule it out entirely; there- random- ized controlled trial contributes less to our
fore, confounding remains the enduring challenge to under- standing of cause than an exemplary cohort
causal reasoning based on observational research. study.
Randomized trials can deal definitively with con- With this hierarchy in mind, the strength of the
founding, but they are not possible for studies of risk evidence for cause and effect is sometimes judged
(i.e., causes) per se. For example, it is unethical (and according to the best studies of the question. Well-
would be unsuccessful) to randomize non-smokers to designed and well-executed randomized controlled
cigarette smoking to study whether smoking causes trials trump observational studies, state-of-the-
lung cancer. However, randomized controlled trials science observational studies trump case series, and
can contribute to causal inference in two so on. This is a highly simplified approach to
situations. One is when the trial is to treat a possible evidence but is a useful shortcut.
cause, such as elevated cholesterol or blood pressure,
and the out- come is prevented. Another is when a THE BODY OF EVIDENCE FOR
trial is done for another purpose and the AND AGAINST CAUSE
intervention causes unan- ticipated harms. For
example, the fact that there were an excess of What aspects of the research findings support
cardiovascular events in randomized tri- als of the cause and effect when only observational studies are
cyclooxygenase-2 inhibitor rofecoxib, which had avail- able? In 1965, the British statistician Sir
been given for other reasons (e.g., pain relief ), is Austin Brad- ford Hill proposed a set of
evidence that this drug may be a cause of cardiovas- observations that taken together help to establish
cular events. whether a relationship between an environmental
factor and disease is causal or just an association (5)
Hierarchy of Research Designs (Table 12.3). We review these “Bradford Hill
The various research designs can be placed in a criteria,” mainly using smoking and lung cancer as
hierar- chy of scientific strength for the purpose of an example. Smoking is generally believed to cause
establish- ing cause (Table 12.2). At the top of the lung cancer even though there are not randomized
hierarchy are systematic reviews of randomized controlled trials of smoking or an undisputed
controlled trials because they can deal definitively biologic mechanism.
with confounding. Randomized trials are followed by
observational stud- ies, with little distinction Table 12.3
between cohort and case- control studies in an era Evidence That an Association
when case-control analyses are nested in cohorts Is Cause and Effect
sampled from defined populations. Lower still are
uncontrolled studies, biologic reason- ing, and Criteria Comments
personal experience. Of course, this order is only a Temporality Cause precedes effect
rough guide to strength of evidence. The man- ner in Strength Large relative risk
which an individual study is performed can go Dose–response Larger exposure to cause
associated with higher rates of
Table 12.2 disease
Hierarchy of Research Design Strength Reversibility Reduction in exposure is followed
by lower rates of disease
Individual Risk Studies Consistency Repeatedly observed by different
Systematic reviews: consistent evidence from multiple persons, in different places,
randomized controlled trials circumstances, and times
Randomized controlled trials Biologic plausibility Makes sense according to biologic
knowledge of the time
Observational studies
Cohort studies Specificity One cause leads to one effect
Case-control, Case-cohort studies Analogy Cause-and-effect relationship
Cross-sectional studies already established for a similar
exposure or disease
Case series
Adapted from Bradford Hill AB. The environment and disease:
Experience, expert opinion association and causation. Proc R Soc Med 1965;58:295–300.
20 Clinical Epidemiology: The
Lung cancer
and the effect are measured at the same points in
deaths/100,000
time. Smoking clearly precedes lung cancer by
several decades, but there are other examples where 127
the order of cause and effect can be confused.
100
78
Example
“Whiplash,” is the occurrence of neck pain following a forceful flexion/extension injury, typically in an auto accident. Man
10
0
0 1–14
15–24 25+
Cigarettes smoked, number/day
Figure 12.6 ■ Example of a dose–response relation-
ship: lung cancer deaths in male physicians according
to dose (number) of cigarettes smoked. (Data from data
in Doll R, Peto R. Mortality in relation to smoking: 20
years’ observations on male British doctors. Br Med J
1976;2:1525– 1536.)
Dose–Response Relationships
A dose–response relationship is present if increas-
ing exposure to the purported cause is followed by
a larger and larger effect. In the case of cigarette
smok- ing, “dose” might be the number of years of
smoking, current packs per day, or “pack-years.”
Figure 12.6 shows a clear dose–response curve when
lung cancer death rates (responses) are plotted
Finding that what was thought to be a cause actu- against the number of cigarettes smoked (doses).
ally follows an effect is powerful evidence against Demonstrating a dose–response relationship
cause, but temporal sequence alone is only strengthens the argument for cause and effect, but its
minimal evidence for cause. absence is relatively weak evidence against causation
because not all causal associations exhibit a dose–
Strength of the Association response relationship within the range observed
and because confounding remains possible.
A strong association between a purported cause
and an effect, as expressed by a large relative or
absolute risk, is better evidence for a causal Example
relationship than a weak association. The reason is Both the strong association between smok- ing and lung cancer a
that unrecognized bias could account for small
relative risks but is unlikely to result in large ones.
Thus, the 20 times higher incidence of lung
cancer among male smokers compared to non-
smokers is much stronger evidence that smoking
causes lung cancer than the finding that smoking
is related to renal cancer, for which the relative
risk is much smaller (about 1.5). Similarly, a 10-
to 100- fold increase in risk of hepatocellular
carcinoma in patients with hepatitis B infection is
strong evidence that the virus is a cause of liver
cancer.
Chapter 12: Cause 201
15.816.0
Example
Ratio of mortality rate of
15
The substance Laetrile, extracted from apricot pits, was toute
ex-smokers to never
10
5.9
5.3
5
2.0
0
0 <5 5–9 10–14 15+
Years since stopped smoking
Figure 12.7 ■ Reversible association: declining mortal-
ity from lung cancer in ex-cigarette smokers. The data
exclude people who stopped smoking after getting cancer.
(Data from data in Doll R, Petro R. Mortality in relation
to smoking: 20 years’ observations on male British doctors.
Br Med J 1976;2:1525–1536.)
20 Clinical Epidemiology: The
Biologic plausibility, when present, strengthens group to which individuals belong. Another term
the case for causation, but the absence of biologic is ecological studies, because people are classified
plausibility may just reflect the limitations of under- by the general level of exposure in their
standing of the biology of disease rather than the lack environment, which may or may not correspond to
of a causal association. their individual exposure. Examples are epidemiologic
studies relating countries’ wine consumption to rates
Specificity of cardiovascu- lar mortality and studies of
Specificity—one cause–one effect—is more often regional cancer or birth defect rates in relation to
found for acute infectious diseases (e.g., poliomyeli- regional exposures such as chemical spills.
tis and tetanus) and for genetic diseases (e.g., famil- The main problem with studies that simply cor-
ial adenomatous polyposis or ochronosis), although relate average exposure with average disease rates in
genetic effects are sometimes modified by gene–gene groups is the potential for an ecological fallacy,
and gene–environment interactions. As mentioned in which affected individuals in a generally exposed
earlier in this chapter, chronic, degenerative diseases group may not have been the ones actually
often have many causes for the same effect and many exposed to the risk factor. Also, exposure may not be
effects from the same cause. Lung cancer is caused the only characteristic that distinguishes people in
by cigarette smoking, asbestos, and radiation. the exposed group from those in the non-exposed
Cigarette smoking not only causes lung cancer but group; that is, there may be confounding factors.
also bron- chitis, cardiovascular disease, periodontal Thus, aggre- gate risk studies like these are most
disease, and wrinkled skin, to name a few. Thus, useful in raising hypotheses, which should then be
specificity is strong evidence for cause and effect, tested by more rig- orous research.
but the absence of specificity is weak evidence Evidence from aggregate risk studies can be
against it. strengthened when observations are made over a
period of time bracketing the exposure and even fur-
Analogy ther strengthened if observations are in more than
one place and calendar time.
The argument for a cause-and-effect relationship is In a time-series study, disease rates are measured
strengthened when examples exist of well- at several points in time, both before and after the
established causes that are analogous to the one in purported cause has been introduced. It is then
question. Thus, the case that smoking causes lung pos- sible to see whether a trend in disease rate
cancer is strengthened by observations that other over time changes in relation to the time of
environmen- tal toxins such as asbestos, arsenic, and exposure. If changes in the purported cause are
uranium also cause lung cancer. directly followed by changes in the purported effect,
In a sense, applying the Bradford Hill criteria to and not at some other time, the association is less
cause is an example of Bayesian reasoning. For likely to be spurious. An advan- tage of time-series
exam- ple, belief in causality based on strength of analyses is that they can distinguish between changes
association and dose–response is modified (built already occurring over time (secular trends) and the
up or dimin- ished) by evidence concerning effects of the intervention itself.
biologic plausibility or specificity, with each of the
criteria contributing to a greater or lesser extent to
the overall belief that an association is causal. The Example
main difference from the Bayesian approach to
diagnostic testing is that the various lines of
evidence for cause (dose–response, reversibility, Health care–associated infections with methicillin- resistant Staphy
consistency, etc.) are being assembled concurrently
in various scientific disciplines rather than in series
by clinical research.
1.8
1.6
1.4
MRSA infections/1,000
Health–care associated
1.2
1.0
0.8
0.6
0.4
0.2
Year
14
Targeted coverage for national population
(%)
12
10 Denmark 40
Cervical cancer mortality
6 Norway5
4
Sweden 100
Finland 100
2
Iceland 100
0
1955 1960 1965 1970 1975 1980 1985
Year
Figure 12.9 ■ A multiple time-series study. Change in cervical cancer mortality
rates according to year organized Pap smear screening programs were implemented
and targeted coverage. Arrows mark the year coverage was achieved for each
country. (Redrawn with permission from La˘a˘ra˘ E, Day NE, Hakama M. Trends in
mortality from cervical cancer in Nordic countries: association with organized
screening programmes. Lancet 1987;1(8544):1247–1249.
35
30
Risk factors
25 Screening Treatment
Colorectal cancer mortality
20
15 Mortality
10
0
1975
1980 1985 1990 1995 2000 2005
Year of death
Figure 12.10 ■ Causes for the decline in colorectal cancer deaths, 1975–2000. (Re-
drawn with permission from Edwards BK, Ward E, Kohler BA, et al. Annual report to
the Nation on the status of cancer, 1975–2006, featuring colorectal cancer trends and
impact of interventions (risk factors, screening, and treatment) to reduce future rates. Cancer
2010; 116:544–573.)
Systematic review Randomized controlled trial Systematic review Randomized controlled trial
Multiple time series Non-randomized trial Multiple time series Non-randomized trial
Cohort Cohort Case-control Time series
Case-control Time series Cross-sectional
Cross-sectional Case series
Case series DESIGN
Case report
Figure 12.11 ■ Relative strength of evidence for and against a causal effect. Note that
with study designs, the strength of evidence for a causal relationship is a mirror image of
that against. With findings, evidence for a causal effect does not mirror evidence against an
effect.
Figure 12.11 summarizes the different types of cause, whereas a cross-sectional study finding no
evidence for and against cause, depending on the effect is weak evidence against cause.
research design, and results that strengthen or Belief in a cause-and-effect relationship is a
weaken the evidence for cause. The figure roughly judgment based on both the scientific strength and
indicates relative strengths in helping to establish results of all research bearing on the question. As a
or discard a causal hypothesis. Thus, a carefully practical matter, at issue is whether the weight of the
done cohort study showing a strong association and evidence is convincing enough for us to behave as if
a dose–response relationship that is reversible is something were a cause, not whether it is established
strong evidence for beyond all reasonable doubt.
Revie w Question s
Read the following statements and select the brain cancer have not agreed with each other.
best response. A randomized controlled trial might resolve
the question. What is the main reason why a
12.1. One of your patients read that cell phones randomized controlled trial (RCT) would be
cause brain cancer, and she wants to know unlikely for this question?
your opinion. You discover that the inci-
dence of malignant brain tumors is increas- A. It would cost too much.
ing in the United States. Results of several B. People would not agree to be randomized
observational studies of cell phone use and to cell phone use.
Chapter 12: Cause 207
12.10. You discover that case-control studies have A. Use of cell phones increases the
been done to determine whether cell phone incidence of brain cancers by 50%.
use is associated with the development B. Use of cell phones protects against brain
of brain cancer. In one study, patients with cancers.
brain cancer and matched controls without C. Specific types of cancers might
brain cancer were asked about cell phone be associated with cell phone use.
use. The estimated relative risk for at least D. The study has adequate statistical power
100 hours of use compared to no use was 1.0 to answer the research question.
for all types of brain cancers combined (95%
confidence interval 0.6–1.5). This finding is Answers are in Appendix A.
consistent with all of the following except:
REFERENCES
1. Holmes OW. On the contagiousness of puerperal fever. Med
7. Moertel CC, Fleming TR, Rubin J, et al. A clinical trial of
Classics 1936;1:207–268. [Originally published, 1843.]
2. Fouchier RA, Kuiken T, Schutten M, et al. Aetiology: Koch’s amygdalin (Laetrile) in the treatment of human cancer. N
Engl J Med 1982;306:201–206.
postulates fulfilled for SARS virus. Nature 2003;423:240.
8. Jain R, Kralovic SM, Evans ME, et al. Veterans Affairs ini-
3. MacMahon B, Pugh TF. Epidemiology: Principles and Meth-
tiative to prevent methicillin-resistant Staphylococcus aureus
ods. Boston: Little, Brown & Co.; 1970.
4. Burzynski J, Schluger NW. The epidemiology of tuberculosis infections. N Engl J Med 2011;364:1419–1430.
9. Lăără E, Day NE, Hakama M. Trends in mortality from cervi-
in the United States. Sem Respir Crit Care Med 2008;29:492–
cal cancer in the Nordic countries: association with organized
498.
5. Bradford Hill A. The environment and disease: association or screening programmes. Lancet 1987;1(8544):1247–1249.
10. Edwards BK, Ward E, Kohler BA, et al. Annual report to the
causation? Proc R Soc Med 1965;58:295–300.
6. Mykletun A, Glozier N, Wenzel HG, et al. Reverse causality nation on the status of cancer, 1975-2006, featuring
colorec- tal cancer trends and impact of interventions (risk
in the association between whiplash and symptoms of anxiety
and depression. The HUNT Study. Spine 2011;36:1380– factors, screening, and treatment) to reduce future rates.
Cancer 2010; 116:544–573.
1386.
C h a p t e r 13
Summarizing the
Evidence
When the research community synthesizes existing evidence thoroughly, it is certain
that a substantial proportion of current notions about the effects of health care will
be
changed. Forms of care currently believed to be ineffective will be shown to be
effective; forms of care thought to be useful will be exposed as either useless or
harmful; and the justification for uncertainty about the effects of many other forms of
health care will be made explicit.
—Ian Chalmers and Brian Haynes
1994
209
21 Clinical Epidemiology: The
study design (e.g., randomized trial or cohort) to textbooks that are continually updated) are a source.
make PICOTS. Elements of targeted questions for Experts in the content area (e.g., rheumatic heart dis-
other kinds of studies (e.g., studies of diagnostic test ease or Salmonella infection) may recommend stud-
accuracy or observational studies of risk or prognostic ies that were not turned up by the other approaches.
factors) are less well defined but include many of the References cited in articles already found are
same features. another possibility. There are a growing number of
registries of clinical trials and funded research that
Finding All Relevant Studies can be used to find unpublished results.
The goal of consulting all these sources is to avoid
The first step in a systematic review is to find all the
missing any important article, even at the expense of
studies that bear on the question at hand. The review
inefficiency. In diagnostic test terms, the reviewer uses
should include a complete sample of the best studies
multiple parallel tests to increase the sensitivity of the
of the question, not just a biased sample of studies
search, even at the expense of many false-positive
that happen to have come to attention. Clinicians
results (i.e., unwanted or redundant citations), which
who review topics less formally—for colleagues in
need to be weeded out by examining the studies
rounds, morning report, and journal clubs—face a
themselves.
similar challenge and should use similar methods,
In addition to exercising due diligence in find-
although the process cannot be as exhaustive.
ing articles, authors of systematic reviews explicitly
How can a reviewer be reasonably sure that he or
describe the search strategy for their review, including
she has found all the best studies, considering that
search terms. This allows readers to see the extent to
the medical literature is vast and widely dispersed?
which the reviewer took into account all the studies
No one method of searching is sufficient for this task,
that were available at the time.
so multiple complementary approaches are used
(Table 13.2).
Limit Reviews to Scientifically
Most reviews start by searching online databases
of published research, among them MEDLINE, Strong, Clinically Relevant Studies
(the National Library of Medicine’s electronic To be included in a systematic review, studies must
database of published articles) EMBASE, and the meet a threshold for scientific strength. The assump-
Cochrane Database of Systematic Reviews. There tion is that only the relatively strong studies should
are many others that can be identified with a count. How is that threshold established? Various
librarian’s help. Some, such as MEDLINE, can be expert groups have proposed criteria for adequate sci-
searched both for a content area (such as treatment entific strength, and their advantages and limitations
of atrial fibrillation) and for a quality marker (e.g., are discussed later in this chapter.
randomized controlled trial). However, even in the Usually only a small proportion of studies are
best hands the sensitivity of MEDLINE searches selected from a vast number of potential articles on
(even for articles that are in MEDLINE) is far from the topic. Many articles describe the biology of dis-
perfect. Also, the contents of the various databases ease and are not ready for clinical application. Oth-
tend to complement each other. Therefore, database ers communicate opinions or summaries of existing
searching is useful but not suf- ficient. evidence, not original clinical research. Many stud-
Other ways of finding the right articles make ies are not scientifically strong, and the information
up for what database searches might have missed. they contain is eclipsed by stronger studies. Relatively
Recent reviews and textbooks (particularly few articles report evidence bearing directly on the
electronic clinical question and are both scientifically strong and
clinically relevant. Table 13.3 shows how articles were
Table 13.2 selected for a systematic review of statin drugs for the
prevention of infections; only 11 of 632 publications
•Approaches to Finding
Search online database such All the
as MEDLINE, identified were included in the review.
Studies
EMBASE, Bearing on a Question
and the Cochrane Database of Systemic
Reviews. Are Published Studies a Biased
• Read recent reviews and textbooks. Sample of All Completed Research?
• Seek the advice of experts in the content area.
• Consider articles cited in the articles already found The articles cited in systematic reviews should include
by other approaches. all scientifically strong studies of the question, regard-
• Review registries of clinical trials and funded less of whether have been published. Publication
bias is the tendency for published studies to be
21 Clinical Epidemiology: The
Example
Precision of the estimate of the
Duration of follow-up
>6 months 11 2,430
≤6 months 9 1,416
1.0
0.8 0.6 0.4 0.2 0.0
Figure 13.2 ■ Quality of 20 trials in a systematic review of the effectiveness of glucosamine
on pain in patients with osteoarthritis of the knee or hip. (Data from Reichenbach S, Sterchi R,
Scherer M, et al. Meta-analysis: chondroitin for osteoarthritis of the knee and hip. Ann Intern Med 2007;
146:580–590.)
Summarizing Results
Example
Figure 13.2 shows the relationship between several markers The
of results
quality of
anda systematic review
effect size with are typically
confidence dis- for trials
intervals
–0.13 to 0.07), where an effect size of –0.30 was consideredplayed as a forest
minimally showingThat
plotrelevant.
clinically the point estimate
is, there was noofclinically
effectiveness and confidence interval for each study in
the review. Figure 13.3 illustrates a summary of
stud- ies comparing quinine to placebo for muscle
cramps (6). The measure of effectiveness in this
example is change in the number of leg cramps in a
2-week period. In other systematic reviews, it might
be rela- tive risk, attributable risk, or any other
measure of effect. Point estimates are represented by
boxes with their size proportional to the size of the
study. A verti- cal line marks where neither quinine
nor placebo was more effective.
The origin of the name “forest plot” is uncertain,
but it is variously attributed to a researcher’s name
or the appearance resembling a “forest of lines” (7).
We believe they help readers “see the forest and the
trees.”
Warburton
Sidorov 1987
1993 2.18
0.7 9.5%
8.5%
Forest plots summarize a tremendous amount of patients, interventions, doses, follow-ups, and out-
information that would otherwise require a great deal comes. Treating “apples and oranges” as if they are all
of effort to find. just fruits disregards useful information.
Investigators use two general approaches to decide
1. Number of studies. The rows show the number of
studies meeting stringent criteria for quality, in whether it is appropriate to pool study results. One
is to make an informed judgment about whether the
this case, 13.
2. What studies and when. The first column identifies research questions addressed by the trials are similar
names and year of publication for the component enough to constitute studies of the same question (or
studies so that readers can see how old the stud- a set of reasonably similar questions).
ies are and where they can be found. (Full refer-
ences to studies are not shown in the figure but are
included in the article).
3. Pattern of effect sizes. The 13 point estimates, taken Example
as a whole, show what the various studies reported
for effect sizes. In the example, all of the 13 stud- Do antioxidant supplements prevent gastroin- testinal cancers? In
ies favored quinine, but the size of the effects var-
ies.
4. Precision of estimates. Many studies (6 of 13) were
“negative” (their confidence intervals included
no effect). This would give the impression, in a
simple accounting of the number of “positive”
and “negative” studies, that treatment is not
effective or at least that effectiveness is question-
able. The forest plot gives a different impression:
All point estimates favor quinine, and the nega-
tive studies tend to be imprecise yet consistent
with effectiveness.
5. The effects for the big studies. The large, statistically
precise studies (seen by both narrow confidence
intervals and large boxes representing point esti-
mates) deserve more weight than small ones. In
the example, the confidence intervals for the two
largest studies do not include “no change in mus-
cle cramp rate” (although one touches it).
In these ways, a single picture conveys in a glance
a lot of basic information about the very best studies
of a question.
COMBINING STUDIES IN
META-ANALYSES
Meta-analysis is the practice of combining (“pool-
ing”) the results of individual studies, if they are
simi- lar enough to justify a quantitative summary
effect size. When appropriate, meta-analyses provide
more precise estimates of effect sizes than are
available in any of the individual studies.
Example
Inhaled nitric oxide is effective in full-term in- fants with pulmonary hypertension and hypox- ic respiratory failure. How
Results of multiple clinical trials randomly distributed around the true treatment eff
Treatment effects
B Random effects
model
Treatment effects
Figure 13.4 ■ Models for combining studies in a meta-analysis. A. Fixed
effect model. B. Random effects model. (Redrawn with permission from UpTo-
Date, Waltham, MA.)
component studies as a diamond representing the gastric banding, among others. Randomized tri-
summary point estimate and confidence interval als have compared various combinations of these to
(see Fig. 13.3). The summary effect is a more precise each other and to usual care, but no study has com-
and formalized presentation of what might have pared each technique to all the others. Network
been concluded from the pattern of results available meta-analysis is a mathematical way of estimating
in the forest plot. the comparative effectiveness of interventions that
are not directly compared in actual studies but can
Identifying Reasons be indirectly compared by use of modeling. Using
for Heterogeneity a network meta-analysis, investigators were able to
estimate the respective effects of each bariatric sur-
Random effects models are a way of taking heteroge- gery method compared to usual care, showing that
neity into account when calculating a summary effect each was effective and identifying the hierarchy of
size, but a separate need is to identify characteristics effectiveness (10).
of patients or treatments that are responsible for the
variation in effects. CUMULATIVE META-ANALYSES
The most straightforward way to identify reasons
for heterogeneity is to do subgroup analyses. This is Usually the studies in a forest plot are represented
possible in patient-level meta-analyses, as described in separately in alphabetical order by first author or
the example about nitric oxide treatment of preterm in chronological order. Another way to look at the
infants. However, if trials, not patients, are pooled, same information is to present a cumulative meta-
one must rely on less direct methods. analysis. Component studies are put in chrono-
Another approach to understanding the reasons for logical order, from oldest to most recent, and a new
heterogeneity is to do a sensitivity analysis, as summary effect size and confidence interval is calcu-
discussed in Chapter 7. Summary effects are lated for each time the results of a new study became
examined with and without trials that seem, either for available. In this way, the figure represents a running
clinical or statistical reasons, to be different from the summary of all the studies up to the time of each
others. For example, investigators might look at new trial. This is a Bayesian approach, as described
summary effects (or statis- tical tests for in Chapter 11, where each new trial modifies prior
heterogeneity) after removing relatively weak trials belief in comparative effectiveness, established by the
or those in which the dose of drug was relatively trials that went before.
small to see if study strength or drug dose account The following example illustrates the kind of
for differences in results across studies. insights a cumulative meta-analyses can provide and
A modeling approach called meta-regression, also shows how meta-analyses in general and cumula-
similar to multivariable analysis (discussed in Chap- tive meta-analyses in particular are useful for estab-
ter 11) can be used to explore reasons for heterogene- lishing harmful effects. Individual trials, which are
ity when trials, not patients, are pooled. The powered to detect effectiveness, are usually under-
indepen- dent variables are those reported in powered to detect harms because harms occur at a
aggregate in each individual trial (e.g., the average substantially lower rate. Pooling data may accumulate
age or proportion of men and women in those trials) enough events to detect harmful effects.
and the outcomes are the reported treatment effect
for each of those tri- als. The number of observations
is the number of tri- als in the meta-analysis. This
approach is limited by the availability of data on
Example
the covariates of interest in the individual trials, the
compatibility of the data across trials, and the
Rofecoxib is a non-steroidal anti-inflammatory drug (NSAID) th
stability of models based on just a few observations
(the number of trials in the meta- analysis). Another
limitation, as with any aggregate- risk study, is the
possibility of an “ecological fallacy,” as discussed in
Chapter 12.
The various studies in a systematic review some-
times address the effectiveness of a set of
interrelated interventions, not just a single
comparison between an intervention and control
group. For example, there are several techniques for
bariatric surgery such as jejunoileal bypass, sleeve
gastrectomy, adjustable
Chapter 13: Summarizing the Evidence 221
1,399 5 0.828
2,208 6 0.996
2,983 8 0.649
3,324 9 0.866
5,059 13 0.881
13,269 40 0.070
14,247 44 0.034
15,156 46 0.025
20,742 52 0.010
20,742 63 0.007
21,432 64 0.007
0.1 1 10
SYSTEMATIC REVIEWS
OF OBSERVATIONAL AND
DIAGNOSTIC STUDIES
We have discussed systematic reviews and meta-
analyses, using randomized controlled trials as
exam- ples. However, systematic reviews are also
useful for other kinds of studies, as illustrated by the
following summary of observational studies.
Example
Patients with venous thromboembolism may have recurrences after anticoagulation is stopped. Investigators obtained p
Demircan 2002179185 82
Knutsson 1961155187 2
Kosteljanetz 198444231221
Kosteljanetz 1988406 5 1
Revie w Question s
Read the following and select the best A. You wish to obtain a more
response. generalizable conclusion.
B. A statistical test shows that the studies are
13.1. A systematic review of observational studies heterogeneous.
of antioxidant vitamins to prevent cardio- C. Most of the component studies are
vascular disease combined the results of 12 statistically significant.
studies to obtain a summary effect size and D. The component studies have different,
confidence interval. Which of the following and to some extent complementary,
would be the strongest rationale for combin- biases.
ing study results? E. You can obtain a more precise estimate
of effect size.
22 Clinical Epidemiology: The
13.2. You are asked to critique a review of the D. Individual measures of quality such
literature on whether alcohol is a risk factor as randomization and blinding are
for breast cancer. The reviewer has searched not associated with study results.
MEDLINE and found several observational
studies of this question but has not 13.6. Which of the following kinds of studies is
searched elsewhere. All of the following are least likely to be published?
limita- tions of this search strategy except:
A. Small positive studies
A. Studies with negative results tend to B. Large negative studies
not be published. C. Large positive studies
B. MEDLINE searches typically miss D. Small negative studies
some articles, even those included in the
database. 13.7. Which of the following is a comparative
C. MEDLINE does not include all of the advantage of traditional (“narrative”) reviews
world’s journals. over systematic reviews?
D. MEDLINE can be simultaneously
searched for both content area A. Readers can confirm that evidence cited
and methods. is selected without bias.
B. Narrative reviews can review a broad
13.3. A systematic review of antiplatelet drugs range of questions bearing on the care of
for cardiovascular disease prevention a condition.
combined individual patients, not trials. C. They rely on the experience and
Which of the following is an advantage of judgment of an expert in the field.
this approach? D. They provide a quantitative summary of
effects.
A. It is more efficient for the investigator. E. The scientific strength of the studies
B. Subgroup analyses are possible. cited is explicitly evaluated.
C. It is not necessary to choose between
fixed and random effects models when 13.8. Which of the following is not always part of
combining data. a typical forest plot?
D. Publication bias is less likely.
A. The number of studies that meet high
13.4. Which of the following is not generally used
standards for quality
to define a specific clinical question studied B. A summary or pooled effect size with
by randomized controlled trials? confidence interval
C. Point estimates of effect size for each
A. Covariates that were taken study
into account D. Confidence intervals for each study
B. Interventions (e.g., exposure or E. The size or weight contributed by each
experimental treatment) study
C. Comparison group (e.g., patients taking
placebo in a randomized trial) 13.9. Which of the following cannot be used for
D. Outcomes identifying severity of illness as a reason
E. Patients in the trials for heterogeneity in a systematic review
with meta-analysis?
13.5. Which of the following best describes ways
of measuring study quality? A. Controlling for severity of illness in a
study-level meta-analysis
A. The validity of summary measures B. A mathematical model relating average
of study quality is well established. severity of illness to outcome across
B. A description of study quality is a useful the component studies
part of systematic reviews. C. A comparison of summary effect sizes
C. In a summary measure of quality, in studies stratified by mean severity
strengths of the study can make up for of illness
weaknesses. D. Subgroup analyses of patient-level data
Chapter 13: Summarizing the Evidence 225
13.10. Which of the following is the best justifica- 13.11. Which of the following is an advantage of
tion for combining the results of five studies the random effects model over the fixed
into a single summary effect? effect model?
A. The patients, interventions, and A. It can be used for studies with time-to-
outcomes are relatively similar. event analyses.
B. All studies are of high quality. B. It describes the summary effect size for a
C. A statistical test does not single, narrowly defined question.
detect heterogeneity. C. It is better suited for meta-analyses of
D. Publication bias has been ruled out by a diagnostic test performance.
funnel plot. D. It gives more realistic results when there
E. The random effects model will be used to is heterogeneity among studies.
calculate the summary effect size. E. Confidence intervals tend to be narrower.
REFERENCES
C h a p t e r 14
Knowledge
Management
Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in information?
—T.S. Eliot
1934
225
22 Clinical Epidemiology: The
Table 14.1
Grading Recommendations for Treatment According to the Quality of Evidence
(Confidence in Estimate of Effect, A–C) and Strength of Recommendation (1–2) with
Implications. Based on GRADE Guidelines
LOOKING UP ANSWERS TO
CLINICAL QUESTIONS
Clinicians need to be able to look up answers to
questions that arise during the care of their
patients. They need this for things they do not
know but also to check facts they think they know
but might not, because the information base for patient
care is always changing.
It is best to get answers to questions just at the
time and place where they arise during the care of
patients. This has been called the point of care
and the asso- ciated learning just-in-time-learning.
Answers can then be used to guide clinical decision
making for the patient at hand. Also, what is learned
is more likely to be retained than information
encountered out of con- text in a classroom, lecture
hall, book, or journal, apart from the need to know
for a specific patient. In any case, postponing the
answering of questions to a later time too often
means they do not get answered at all. For just-in-
time learning to happen, several con- ditions must
be in place (Table 14.2). Most patient care settings
are time-pressured, so the answer must come
quickly. As an office pediatrician pointed out, “If I
added just 1 to 2 extra minutes to each patient
visit, I would get home an hour later at the end of
the day!” What clinicians need is not an answer
but the best available answer, given the state of
knowledge at the time. They need information that
corresponds as closely as possible to the specific
clinical situation their patient is in; if the patient is
elderly and has several dis- eases, the research
information should be about elderly patients with
comorbidities. Clinicians need informa- tion sources
that move with them as they travel from their office
to home (where they take night and week-
end call) and to hospitals and nursing homes.
When all this happens, and it certainly can, the
results are extraordinarily powerful.
Chapter 14: Knowledge Management 229
Conditions in Which Information Is
Available at the Point of Care
Condition
Rationale
Rapid access The information must be available
within minutes for it to fit into the
busy workflow of most patient care susceptibility to antimalarial drugs varies
settings.
across the globe and that it is continually
Current Because the best information base for changing. The Centers for Disease Control
clinical decisions is continually changing, and Prevention have a Web site
the information usually needs to be
(http://www.cdc.gov) with cur- rent
electronic (as a practical matter, on the
information for travelers to all parts of the
Internet).
world. Using the computer in your clinic,
Tailored to Clinicians need information that you quickly find out which prophylactic
the specific matches as closely as possible the
drug this patient should take and for how
question actual situation of their individual
long before, during, and after the trip. You
patient.
are also reminded that he should have a
Sorted by There is a vast amount of information
booster dose of polio vac- cine and be
scientific for almost any clinical question but
strength only a small proportion of it is vaccinated for hepatitis A and B, typhoid,
scientifically strong and clinical and yellow fever. The site lists clinics where
relevant. these vaccines are available. The site also
Available Clinicians cannot leave their place of shows that northern Ghana, where your
in clinical work to look up answers; they must find patient will be visiting, is in the “meningitis
situations it right where they work. belt,” so he should also be vaccinated
against meningo- coccal disease. The
information you are relying on is an up-to-
Example
Solutions
A patient sees you because he will be traveling to Ghana and wants advice on malaria prophy- laxis. You are aware that the
Clinical Colleagues
A network of colleagues with various and
complemen- tary expertise is a time-honored way
of getting point of care information. Many
clinicians have identified
23 Clinical Epidemiology: The
local opinion leaders for this purpose. Of course, also make explicit the evidence base and rationale for
those opinion leaders must have their own sources of those recommendations. Like evidence-based medi-
infor- mation, presumably more than just other cine, guidelines are meant to be a starting place for
colleagues. decision making about individual patients, to be
modified by clinical judgment; that is, they are
Electronic Textbooks guide- lines, not rules. High-quality guidelines
represent the wise application of research evidence to
Textbooks, even libraries, are on the Internet and
the realities of clinical care, but guidelines vary in
made available to clinicians by their medical schools,
quality. Table 14.3
health systems, and professional societies. For exam-
ple, UpToDate (http://www.uptodate.com) is an elec-
tronic information resource for clinicians, the product Table 14.3
of thousands of physician–authors and editors cover- Standards for Trustworthy
ing 9,000 topics in the equivalent of 90,000 Clinical Practice Guidelines
printed pages (if it were ever printed). † Information is
contin- ually updated, peer reviewed, searchable and Standard Explanation
linked to abstracts of the original research, and Transparency How the guideline was developed
recommenda- tions are graded. UpToDate is and funded has been made explicit
available at the point of care throughout the world and is publically accessible.
wherever the Internet can be accessed by Conflict of Interest Group members’ conflicts of interest
computers or mobile platforms. related to financial, intellectual,
institutional, and patient/public
Example activities bearing on the guideline
are disclosed.
One author was seeing patients in Boston dur- ing the anthrax scare in 2001. Around that time, biologic terrorists spread
Group Group membership was
Composition multidisciplinary and balanced,
comprising a variety of
methodological experts and
clinicians, and populations expected
to be affected by the guideline.
Systematic Review Recommendations are based on
systematic reviews that met
high standards for quality.
Evidence and Each recommendation is
Strength of accompanied by an explanation for
Recommendation its underlying reasoning, the level of
confidence in the evidence, and the
strength of the recommendation.
Description of The guideline states precisely what
Recommendations the recommended action is and
under what circumstances it should
be performed.
External Review The guideline has been reviewed
by the full spectrum of relevant
stakeholders (e.g., scientific and
Other textbooks, such as ACP Medicine, clinical experts, organizations, and
Harrison’s Online, and many subspecialty textbooks, patients).
are also available in electronic form. Updating The guideline reports the date of
publication and evidence review
Clinical Practice Guidelines and plans for updating when
there is new evidence that would
Clinical practice guidelines are advice to clinicians substantially change the guideline.
about the care of patients with specific conditions. In
addition to giving recommendations, good guidelines Modified from Institute of Medicine. Clinical Practice Guidelines
We Can Trust. Washington, DC: National Academies Press; 2011.
The standards were for developing guidelines and have been
modified to guide users in recognize guidelines they can trust.
†
Robert and Suzanne Fletcher are among hundreds the editors
of UpToDate.
Chapter 14: Knowledge Management 231
100
100.0
86.5
80 79.3
72.8
69.9
66.3
60
62.0
Percent of
55.5
48.3
40
38.9
28.8
20
17.3
0
5 10 15 20 25 30 35 40
Number of journals
Figure 14.1 ■ How many journals would you have to read to keep up with
the literature in your field? The proportion of scientifically strong, clinically rel-
evant articles in internal medicine according to the number of journals, in descending
order of yield. (Data from ACP Journal Club, 2011).
review by editors guided by peer review, comments test evaluation, systematic review, etc.) (Table 14.4).
by experts in the article’s content area and methods Readers can use these checklists to see if all the
who provide advice on whether to publish and how a necessary information is included in an article, just as
manuscript (the term for the article before it is pub- investiga- tors use them to assure that their articles
lished) could be improved. The reviewers are are complete.
advisors to the editor (or editorial team), not the Journals themselves are not particularly helpful
ones who directly decide the fate of the manuscript. for some elements of knowledge management.
Peer-review and editing practices, along with the Reading individual journals is not a reliable way of
evidence base and rationale for them, are keeping up with new scientific developments in a
summarized on the official Web site of the World field or for looking up the answers to clinical
Association of Medical Edi- tors questions.
(http://www.wame.org) and the International But journals do add another dimension:
Committee of Medical Journal Editors exposing readers to the full breadth of their
(http://www. icmje.org). Peer review and editing profession. Opin- ions, stories, untested
improve manu- scripts, but published articles are far hypotheses, commentary on published articles,
from perfect (5); therefore, readers should be grateful expressions of professional values, as well as
for the journals’ efforts to make articles better but descriptions of the historical, social, and politi- cal
also maintain a healthy skepticism about the quality context of current-day medicine and much more,
of the end result. Working groups have defined the reflect the full nature of the profession (Table
information that should be in a complete research 14.5). The richness of this information completes the
article according to the type of study (randomized clinical picture for many readers. For example, when
controlled trial, diagnostic Annals of Internal Medicine began publishing stories
about being a doctor (6), many readers remarked that
while reports
Table 14.4
Guidelines for Reporting Research Studies
STOP
How strong were the methods? Methods
Context
Discussion
Figure 14.2 ■ Reading a journal article in layers. Individual readers can progress
deeper into an article or stop and go on to another, according to its scientific strength and
clinical im- portance to them.
Chapter 14: Knowledge Management 235
Table 14.7
may communicate the “bottom line” efficiently. A Criteria Patients Can Use to Evaluate
few articles are so important, in relation to one’s Health Information on the Web
particu- lar needs, that they are worth reading word
for word, perhaps for participation in a journal 1. Sponsorship
club. • Can you easily identify the site sponsor? Are
Structured abstracts are organized according advisory board members and consultants listed?
to the kinds of information that critical readers • What is the Web address (gov government,
edu educational institution, org
depend on when deciding whether to believe a
professional organization, com
study’s results. Table 14.6 shows headings of commercial)?
abstracts in structured form, along with the kind of 2. Currency
information associated with them. (Traditional • The site should have been updated recently, and
abstracts, with headings for Introduction, Methods, the date of the latest revision posted.
Results, and Discussion, are a shortened version.) 3. Factual Information
These headings make it easier for readers to find the • The information should be about facts, not
information they need and also force authors to opinions, and can be verified from primary
include this information, some of which might sources such as professional articles.
• When opinions are stated, the source (a qualified
otherwise have been left out if the abstract were
professional or organization) should be
less structured. identified.
Unfortunately, many clinicians set goals for jour- 4. Audience
nal reading that are higher than they can achieve. • The Web site should clearly state
They believe they must look at each article in whether the information is for consumers
detail, which requires a lot of time with each or health
journal issue. Too often, this results in postponing professionals. (Some sites have separate areas
reading and per- haps never getting to it at all, and it for consumers and health professionals).
can generate a lot Modified from Medical Library Association. A User’s Guide to Finding
and Evaluating Health Information on the Web. Available at http://
23 Clinical Epidemiology: The
mlanet.org/resources/userguide.html. Accessed August 1,
2012.
Chapter 14: Knowledge Management 237
disease, the clinical presentations of illness, the dif- colleagues outside their specialty about patient care
ference between isolated observations and consistent decisions. They have a better basis for deciding
patterns of evidence, and much more. All of this is how to delegate some aspects of their information
a valuable complement to what patients bring to needs. They can gain more confidence and
the encounter—intense interest in a specific clinical experience greater satisfaction with the intellectual
question and the willingness to spend lots of time aspects of their work. Beyond that, every clinician
searching for answers. should have a plan for knowledge management, one
that fits his or her par- ticular needs and resources.
PUTTING KNOWLEDGE The Internet must be an important part of the plan
MANAGEMENT INTO PRACTICE because no other medium is so comprehensive, up-
to-date, and flexible. Much of the information
Clinical epidemiology, as described in this book, is needed to guide patient care deci- sions should be
intended to make clinicians’ professional lives easier available at the point of care so that it can be brought
and more satisfying. Armed with a sound to bear on the patient at hand. There is no reason
grounding in the principles by which the validity why the information you use cannot be the best
and generaliz- ability of clinical information are available in the world at the time, as long as
judged, clinicians can more quickly and accurately you have access to the Internet.
detect whether the scien- tific basis for assertions is A workable approach to knowledge manage-
sound. For example, they can see when confidence ment must be active. Clinicians should set aside time
intervals are consistent with clini- cally important periodically to revisit their plan, to learn about
benefit or harm or that a study of the effects of an new opportunities as they arise, and to acquire new
intervention includes neither randomiza- tion nor skills as they are needed. There has never been a
other efforts to deal with confounding. They are time when the evidence base for clinical medicine
better prepared to participate in discussions with was so strong and accessible. Why not make the
most of it?
Revie w Question s
Read the following statements and select the C. Guarantee that the information
best answer. they contain is beyond reproach.
D. Expose you to the many dimensions of
14.1. You are finishing residency and will begin
your profession.
practice. You want to establish a plan for
keeping up with new developments in your 14.3. Many children in your practice have attacks
field even though there are few professional of otitis media. You want to base your
colleagues in your community. All of the management on the best available evidence.
following might be useful, but which will be Which of the following is the least credible
most useful to you? source of information on this question?
A. Subscribe to a few good journals. A. A clinical practice guideline by a
B. Buy new editions of printed textbooks. major medical society
C. Subscribe to a service that reviews B. A systematic review published in a
the literature in your field. major journal
D. Search MEDLINE at regular intervals. C. The Cochrane Database of Systematic
E. Keep up contacts with colleagues in Reviews
your training program by e-mail and D. The most recent research article on this
telephone. question
14.2. You can rely on the best general 14.4. A search of MEDLINE is especially
medical journals in your field to: useful for which of the following?
A. Provide answers to clinical questions. A. Finding all of the best articles bearing on
B. Assure that you have kept up with the a clinical question
medical literature. B. An efficient strategy for finding the good
articles
23 Clinical Epidemiology: The
C. Looking for reports of rare events 14.8. Which of the following should be least reas-
D. Keeping up with the medical literature suring to a patient about the quality of a
E. Being familiar with the medical Web site providing information about HIV?
profession as a whole
A. The site is sponsored by a
governmental agency and names its
14.5. Which of the following is accomplished by
advisory board members.
peer review of research manuscripts before
B. The site provides facts, not opinions.
they are published?
C. The primary source of information is
A. Exclude articles by authors with a conflict stated.
of interest. D. The author is a well-known expert in the
B. Make the published article accurate and field.
trustworthy. E. The date of the last revision is posted and
C. Relieve readers of the need to be recent.
skeptical about the study.
D. Decide for the editors about whether 14.9. Which of the following is not part of grading
they should publish the manuscript. clinical recommendations using the GRADE
system?
14.6. An author of an article showing that screen-
A. Deciding whether to use a diagnostic test
ing colonoscopy is more effective in
B. Takes into account the balance of benefits
prevent- ing colorectal cancer than other
and harms
forms of screening would have conflicts of
C. Rates the quality of scientific
interest if he or she had any of the following
evidence separately
except: D. Suggests how commonly and how
A. Clinical income from performing forcefully a treatment should be
colonoscopies recommended
B. Investment in a company that makes E. Rates the strength of the evidence and of
colonoscopies recommendations separately
C. Investment in medical products in general
D. Publications of articles that have 14.10. A comprehensive approach to managing
consistently advocated colonoscopy as the knowledge in your field would include
best screening test which of the following?
E. Rivalry with other scholars who advocate A. Subscribing to some journals and
another screening test browsing them
B. Establishing a plan for looking up
14.7. Which of the following is the least useful
information at the point of care
way of looking up answers to clinical C. Finding a publication that helps you keep
questions at the point of care? up with new developments in your field
A. Subscribing to several journals and D. Identifying Web sites you can
keeping them available where you see recommend to your patients
patients E. All of the above
B. Guidelines on http://www.guidelines.gov
C. The Cochrane Library on the Internet Answers are in Appendix
D. A continually updated electronic
textbook A.
REFERENCES
1. Laine C, Horton, R, DeAngelis CD, et al. Clinical trial registra- 4. Losanoff JE, Sauter ER, Rider KD. Cat scratch disease present-
tion: looking back and moving ahead. Lancet 2007;369:1909– ing with abdominal pain and retroperitoneal lymphadenopa-
1911. thy. J Clin Gastroentrol 2004;38:300–301.
2. Fletcher RH, Black B. “Spin” in scientific writing: scientific 5. Goodman SN, Berlin J, Fletcher SW, et al. Manuscript quality
mischief and legal jeopardy. Med Law 2007;26(3):511–525. before and after peer review and editing at Annals of
3. Shariff SZ, Sontrop JM, Haynes RB, et al. Impact of Internal Medicine. Ann Intern Med 1994;121:11–21.
PubMed search filters on the retrieval of evidence for 6. Lacombe MA, ed. On Being a Doctor. Philadelphia: American
physicians. CMAJ 2012;184:303. College of Physicians; 1995.
Answers to Review Questions
CHAPTER 1 INTRODUCTION
237
23 Appendix A: Answers to Review
CHAPTER 2 FREQUENCY
CHAPTER 3 ABNORMALITY
3.1 D. Ordinal
3.8 D. Because clinical distributions do not neces-
3.2 B. Dichotomous sarily follow a normal distribution, abnormal-
ity should not be defined by whether or not
3.3 A. Interval—continuous they do.
3.4 E. Interval—discrete 3.9 A. Naturally occurring distributions may or
3.5 C. Nominal may not resemble the normal curve.
3.6 C. This approach, called construct validity, is 3.10 C. Although ultimately you and the patient
one of the ways of establishing the validity of may decide on a trial of statin therapy, this
a measurement. Note that answers B and D is not an emergency situation. The
relate to reliability, not validity. cholesterol test should be repeated; elevated
values are often lower on repeat testing. A
3.7 D. All except D are reasons for variation in trial of exer- cise and weight loss can also
measurements on a single patient, whereas lower cholesterol. Patients who are otherwise
D is about variation among patients. healthy and with a
10% 10-year risk of cardiovascular diseases
Appendix A: Answers to Review Questions 239
are usually not immediately prescribed medi- the measurements (interobserver variability)
cation to lower cholesterol. could also be at play.
3.11 B. The figure shows one mode (hump). The 3.15 G. Panel B shows skewed measurements to
median and mean are similar to each other, the right of the true value. In other words,
and are both below 4,000 g. the hospital staff tended to overestimate the
elec- tronic monitor results and record normal
3.12 C. Two standard deviations encompass 95%
fetal heart rates when they were abnormal on
of the values. The distribution is not
elec- tronic monitoring, thus demonstrating
skewed. Range is sensitive to extreme
biased measurements. Chance and
values.
interobserver vari- ability may also be
3.13 B. One standard deviation encompasses involved because not all the measurements
about 2/3 of values around the mean. Look- are the same.
ing at the figure, 2/3 of the values around
3.16 G. Panel C shows similar results to Panel B
the mean would be approximately 3,000 to
except that the bias is in the other direction,
4,000 g.
that is, the hospital staff tended to
3.14 D. Panel A of Figure 3.13 shows underesti- mate the electronic monitor
approximately even dispersion of results. In both Panels B and C, the hospital
measurements above and below the true staff measure- ments tended to “normalize”
value, suggesting chance varia- tion. The what were abnor- mal electronic monitor
effect of different observers making measurements.
want to be sure that the patient does not 5.12 B. The degree of illness may have confounded
have other indications of increased risk of the results in this study so that sicker
throm- bosis such as age, smoking, or a patients are more likely to take aspirin. One
family or per- sonal history of clotting way to examine this possibility and adjust
problems. This kind of decision requires for con- founding if it is present is to
careful judgment from both patient and stratify the users and non-users into groups
clinician. However, using absolute or with similar indi- cations for using aspirin
attributable risk will clarify the risk for the and compare death rates in the subgroups.
patient better than using relative risk when
discussing clinical consequences.
CHAPTER 7 PROGNOSIS
CHAPTER 8 DIAGNOSIS
CHAPTER 9 TREATMENT
9.1 A. Every effort was made to make this trial about effectiveness in ordinary circumstances
as true to life as possible by comparing and
drugs in common use, having broad
eligibility cri- teria, not blinding
participants, allowing care to proceed as
usual, and relying on a patient- centered
outcome rather than a laboratory
measurement. Although the trial is for effi-
cacy, it is better described as practical and it is
certainly not large.
9.2 D. Intention-to-treat analysis, counting out-
comes according to the treatment group that
patients `were randomized to, tests the effects
of offering treatment, regardless of whether
patients actually take it. It is, therefore,
24 Appendix A: Answers to Review
the measure of effect, like usual care, is
affected by drop-outs, reducing the observed
treatment effect over what it would have been
if everyone took the treatment they were
assigned to.
9.3 A. All characteristics at the time of
random- ization, such as severity of
disease, are ran- domly allocated.
Characteristics arising after randomization,
such as retention, response to treatment,
and compliance, are not.
9.4 D. The greatest advantage of randomized
tri- als over observational studies is
prevention of confounding. Randomization
creates compar- ison groups that would have
the same outcome rates, on average, were it
not for intervention effects.
Appendix A: Answers to Review Questions 247
9.5 E. The study had extensive inclusion and be some evidence bearing on the
exclu- sion criteria. This would increase the comparison, but the evidence should not be
extent to which patients in the trial were conclusive.
similar to each other, making it easier to
9.11 D. Because the new drug has advantages over
detect treatment differences if they exist,
but at the expense of generalizability, the the old one, but comparative effectiveness is
ability to extrapolate from study results to unknown, the appropriate randomized trial
ordinary patient care. comparing the two would be a non-inferiority
trial to establish whether the new drug is no
9.6 A. Intention-to-treat analyses describe the less efficacious.
effects of being offered treatments, not neces-
9.12 A. Making the primary outcome of a trial a
sarily taking them. To describe the effects
of actually receiving the intervention, one com- posite of clinically important and
would have to treat the data as if they were related out- comes increases the number of
from a cohort study and use a variety of outcome events and, therefore, the ability of
methods to control for confounding. the trial to detect an effect if it is present. The
disadvantage is that the intervention may
9.7 B. Stratified randomization is one approach to affect the component out- comes differently,
control of confounding, especially useful and reliance on the composite outcome alone
when a characteristic is strongly related to might mask this effect.
outcome— and also when the study is small
9.13 C. Both bad luck in randomization and break-
enough that one worries that randomization
down in allocation concealment (by bad meth-
might not cre- ate groups with similar
ods or cheating) would show up as differences
prognosis.
in baseline characteristics of patients in a trial.
9.8 C. In explanatory analyses, outcomes are Small differences are expected and the chal-
attrib- uted to the treatment patients actually lenge is to decide how large the differences
receive, not the treatment group they were must be for them to raise concern.
randomized to, which is an intention-to-treat
9.14 C. The usual drug trials reported in the
analysis.
clini- cal literature are “Phase III” trials,
9.9 B. Side effects of the drug, both symptoms intended to establish efficacy or
and signs, would alert patients and doctors to effectiveness. Fur- ther study, with
who is taking the active drug but could not postmarketing surveillance, is needed to
affect random allocation, which was done detect uncommon side effects. Responses A
before drug was begun. and D are about what Phase I and Phase II
9.10 C. For a randomized controlled trial to be trials are meant to establish.
ethi- cal, there should not be conclusive 9.15 B. Prevention of confounding is the main
evidence that one of the experimental advantage of randomized controlled trials,
treatments is bet- ter or worse than the other which is why they are valued despite being
—that is, the scien- tific community should eth- ically complicated, slower, and more
be in a state of “equi- poise” on that issue. expensive. Whether they resemble usual care
There may be opinions as to which is better, depends on how the trial is designed.
but no consensus. There may
CHAPTER 10 PREVENTION
10.3 C. If 30% of the screened group were found every day circumstances, in which some
to have polyps, at least 4,671 (0.3 15,570) people in the intervention group did not
had a false-positive test for colon cancer. If receive the vaccine)? (ii) Is the vaccine safe?
the sensitivity of the test was 90% and the (iii) Is the bur- den of suffering of the
number of cancers was 82, about 74 people condition the vaccine protects against
(0.9 82) had a true-positive result. The important enough to consider a preventive
positive pre- dictive value of the test, measure? and (iv) Is it cost-effective? Cost of
therefore, would be about 74/(74 4,671), the vaccine is only one component of an
or 1.6%. (Using exact numbers, the authors analysis of cost-effectiveness.
calculated a positive predictive value of
10.8 C. Disease prevalence is lower in presumably
2.2%). Negative predictive value was not
calculated, but it would be high because the well people (screening) than in symptom-
incidence of cancer over 13 years was low atic patients (diagnosis). As a result, positive
(323/15,570 or 21 per thousand) and about predictive value will be lower in screening.
90% of tests were negative. Because screening is aimed at picking up
early disease, the sensitivity of most tests is
10.4 B. Lead time (the period time between the lower in screening than in patients with more
detection of a disease on screening and advanced disease. Overdiagnosis is less likely in
when it would ordinarily be diagnosed diagnos- tic situations when symptomatic
because the patient seeks medical care due patients have more late-stage disease.
to symptoms) can be associated with what
10.9 C. Volunteers for preventive care are more
appears to be an improvement in survival.
The fact that mortal- ity did not improve after compliant with advice about medical care
screening in this ran- domized trial raises the and usually have better health outcomes
possibility that lead- time bias is responsible than those who reject preventive care. This
for the result. Another possible cause of the effect is so strong that it is seen even when
finding is overdiagnosis. volunteers are taking placebo medications.
10.10 B. The incidence method is particularly useful
10.5 A. See answer for 10.4.
when calculating sensitivity for screening
10.6 C. Overdiagnosis, the detection of lesions that tests because it takes into account the
would not have caused clinical symptoms or possibility of overdiagnosis. The gold
morbidity, is likely because, even after 20 years, standard for screen- ing tests almost always
the number of cancers in the control group involves a follow-up interval. The first round
remained fewer than that in the screened of screening picks up prevalent as well as
group. Increased numbers of cancers in the incident cases, inflating the number compared
screened group could occur if there were to later screening rounds.
more smokers in the screened group, but
10.11 E. Cost-effectiveness should be estimated in
randomiza- tion should have made that
a way that captures all costs of the
possibility unlikely.
preventive activity and all the costs
10.7 D. Several important questions are to be associated with diag- nosis and treatment,
considered when assessing a new vaccine, wherever they occur, to determine the costs
including: (i) Has the vaccine been shown to of the preventive activity from a societal
be efficacious (did it work under ideal perspective. All costs that occur when
circum- stances when everyone in the prevention is not done are subtracted from
intervention group received the vaccine and no all costs when prevention is done to esti- mate
one in a comparable control group did) and the cost for a given health effect.
was it effective (under
CHAPTER 11 CHANCE
11.1 C. Although subgroup analyses risk false- to do with bias and statistical power or the
positive and false-negative conclusions, they
provide information that can help clinicians as
long as their limitations are kept in mind
11.2 C. The P value describes the risk of a false-
positive conclusion and has nothing directly
25 Appendix A: Answers to Review
generalizability or clinical importance of the
finding
11.3 E. The P value is not small enough to
establish that a treatment effect exists and
provides no information on whether a
clinically important effect could have been
missed because of inad- equate statistical
power.
Appendix A: Answers to Review Questions 251
11.4 D. The result was statistically significant, if one rater, not sample size), 6,000 divided by 3
wanted to think of the role of chance in that or a 1/2,000 event rate could be detected
way, because it excluded a relative risk of 1.0. with 6,000 people under observation.
Response D described the information the
11.8 E. Statistical power depends on the joint
confidence interval contributes.
effects of all of the factors mentioned in A–D.
11.5 B. Calling P 0.05 “statistically significant” is It may vary a bit with the statistical test
a useful convention but otherwise has no par- used to calculate it, but this is not the main
ticular mathematical or clinical meaning. deter- minate of sample size.
11.6 A. Models do depend on assumptions about 11.9 B. Bayesian reasoning is about how new
the data, are not done in a standard way, infor- mation affects prior belief and has
and are meant to complement stratified nothing to do with inferential statistics or
analyses, not replace them. Although they the ethical rationale for randomized trials.
might be used in large randomized trials,
11.10 D. The results are consistent with a 1% up to a
they are not particularly useful in that
situation because randomization of a large 46% higher death rate, with the best
number of patients has already made estimate being 22% higher, and excludes a
confounding very unlikely. hazard ratio of 1.0 (no effect), so it is
“statistically signifi- cant.” Confidence
11.7 C. Of the 12,000 people in the trial, about internals provide more infor- mation than a P
6,000 would be in the chemoprevention arm value for the same data because they include
of the trial. Applying the rule of thumb the point estimate and range of values that is
men- tioned in this chapter (but solving for likely to include the true effect.
event
CHAPTER 12 CAUSE
EFFICACY
CHAPTER 13 SUMMARIZING THE EVIDENCE
14.4 C. PubMed searches are indispensable for point of care (if computers and the relevant
find- ing whether a rare event has been programs are available). Medical journals
reported. They are one important part of an have great value for other reasons, but they
effort to find all published articles on a are not useful for this purpose.
specific clini- cal question and are useful for
14.8 D. Patients and clinicians alike might well
researchers, but they are too inefficient for
respect famous experts but should look for
most questions at the point of care.
more solid footing—the organization that
14.5 B. Although peer review and editing make sponsors them, facts and the source of those
articles better (more readable, accurate, and facts—when deciding whether to believe
complete), articles are far from perfect them.
when the process is over and they are
14.9 A. GRADE has been developed for treat-
published.
ment recommendations, but not other clinical
14.6 C. Conflict of interest is in relation to a spe- questions.
cific activity and does not exist in the
14.10 E. All are basic elements of a comprehensive
general case of investing in medical products
knowledge management plan, as described in
not spe- cifically related to colonoscopy.
this chapter.
14.7 A. B–E are all valuable resources for looking
up the answers to clinical questions at the
Additional Readings
1. INTRODUCTION Hennekins CH, Buring JE. Epidemiology in Medicine.
Boston: Little, Brown and Company; 1987.
Clinical Epidemiology Jekel JF, Elmore JG, Katz DL. Epidemiology, Biostatistics
Feinstein AR. Why clinical epidemiology? Clin Res and Preventive Medicine, 3rd ed. Philadelphia:
1972;20: 821–825. Elsevier/ Saunders; 2007.
Feinstein AR. Clinical Epidemiology. The Architecture of Rothman KJ. Epidemiology: An Introduction. New
Clinical Research. Philadelphia: WB Saunders; 1985. York: Oxford University Press; 2002.
Feinstein AR. Clinimetrics. New Haven, CT: Yale Univer-
sity Press; 1987. Related Fields
Hulley SB, Cummings SR. Designing Clinical Research. Brandt AM, Gardner M. Antagonism and accommodation:
An Epidemiologic Approach, 3rd ed. Philadelphia: Lip- interpreting the relationship between public health
pincott Williams & Wilkins; 2007. and medicine in the United States during the 20th
Riegelman RIC. Studying and Study and Testing a Test, century. Am J Public Health 2000;90:707–715.
5th ed. Philadelphia: Lippincott Williams & Wilkins; 2005. Kassirer JP, Kopelman RI. Learning Clinical Reasoning.
Sackett DL. Clinical epidemiology. Am J Epidemiol 1969; Baltimore: Williams & Wilkins; 1991.
89:125–128. Sox, HC, Blatt MA, Higgins MC, et al. Medical
Sackett DL, Haynes RB, Guyatt GH, et al. Clinical Decision Making. Philadelphia, American College of
Epide- miology: A Basic Science for Clinical Physicians, 2006.
Medicine, 2nd ed. Boston: Little, Brown and White KL. Healing the Schism: Epidemiology, Medicine,
Company; 1991. and the Public’s Health. New York: Springer-Verlag;
Weiss NS. Clinical Epidemiology: The Study of the 1991.
Outcomes of Illness, 3rd ed. New York: Oxford
University Press; 2006. 2. FREQUENCY
Morgenstern H, Kleinbaum DG, Kupper LL. Measures
Evidence-Based Medicine
of disease incidence used in epidemiologic research. Int
Guyatt G, Rennie D, Meade M, et al. User’s Guide to
J Epidemiol 1980;9:97–104.
the Medical Literature: Essentials of Evidence-Based
Clini- cal Practice, 2nd ed. Chicago: American Medical 3. ABNORMALITY
Asso- ciation Press; 2008.
Hill J, Bullock I, Alderson P. A summary of the methods Feinstein AR. Clinical Judgment. Baltimore: Williams &
that the National Clinical Guideline Centre uses to Wilkins; 1967.
produce clinical guidelines for the National Institute Streiner DL, Norman GR. Health Measurement Scales—A
for Health and Clinical Excellence. Ann Intern Med Practical Guide to Their Development and Use, 3rd ed.
2011:154:752–757. New York: Oxford University Press; 2003.
Jenicek M, Hitchcock D. Evidence-Based Practice: Logic Yudkin PL, Stratton IM. How to deal with regression to the
and Critical Thinking in Medicine. Chicago: American mean in intervention studies. Lancet 1996;347:241–
Medical Association Press; 2005. 243.
Straus SE, Glasziou P, Richardson WS, et al. Evidence- 4. RISK: BASIC PRINCIPLES
Based Medicine: How to Practice and Teach It, 4th ed.
New York: Elsevier; 2011. Diamond GA. What price perfection? Calibration and dis-
crimination of clinical prediction models. J Clin Epide-
Epidemiology miol 1992;45:85–89.
Friedman GD. Primer of Epidemiology, 5th ed. New York: Steiner JF. Talking about treatment: the language of popu-
Appleton and Lange; 2004. lations and the language of individuals. Ann Intern Med
Gordis L. Epidemiology, 4th ed. Philadelphia: Elsevier/ 1999;130:618–622.
Saunders; 2009.
Greenberg RS, Daniels SR, Flanders W, et al. Medical Epi- 5. RISK: EXPOSURE TO DISEASE
demiology, 4th ed. New York: Lange Medical Books/ Samet JM, Munoz A. Evolution of the cohort study. Epide-
McGraw Hill; 2005. miol Rev 1998;20:1–14.
249
25 Appendix B: Additional
6. RISK: FROM DISEASE TO EXPOSURE reflections from 4 current and former members of the
U.S. Preventive Services Task Force. Epidemiol Rev
Grimes DA, Schulz KF. Compared to what? Finding 2011;33:20–25.
controls for case-control studies. Lancet 2005;365: Rose G. Sick individuals and sick populations. In J
1429–1433. Epidemiol 30:427–432.
Wald NJ, Hackshawe C, Frost CD. When can a risk factor
7. PROGNOSIS be used as a worthwhile screening test? BMJ 1999;319:
Dekkers OM, Egger M, Altman DG, et al. Distinguish- 1562–1565.
ing case series from cohort studies. Ann Intern Med
2012;156:37–40. 11. CHANCE
Jenicek M. Clinical Case Reporting in Evidence-Based
Concato J, Feinstein AR, Holford TR. The risk of deter-
Medicine, 2nd Ed. New York: Oxford University Press;
mining risk with multivariable models. Ann Intern Med
2001.
1993;118:201–210.
Laupacis A, Sekar N, Stiell IG. Clinical prediction rules:
Goodman SN. Toward evidence-based statistics. 1: the P
a review and suggested modifications of methodologic
value fallacy. Ann Intern Med 1999;130:995–1004.
standards. JAMA 1997;277:488–494.
Goodman SN. Toward evidence-based statistics. 2: the
Vandenbroucke JP. In defense of case reports. Ann Intern
Bayes factor. Ann Intern Med 1999;130:1005–1013.
Med 2001;134:330–334.
Rothman KJ. A show of confidence. N Engl J Med
1978; 299:1362–1363.
8. DIAGNOSIS
McGee S. Evidence-Based Physical Diagnosis. New York: 12. CAUSE
Elsevier; 2007.
Ransohoff DF, Feinstein AR. Problems of spectrum and Buck C. Popper’s philosophy for epidemiologists. Int J
bias in evaluating the efficacy of diagnostic tests. N Epi- demiol 1975;4:159–168.
Engl J Med 1978;299:926–930. Chalmers AF. What Is This Thing Called Science?, 2nd ed.
Whiting P, Rutjes AWS, Reitsma JB, et al. Sources of New York: University of Queensland Press; 1982.
varia- tion and bias in studies of diagnostic accuracy: a Morganstern H. Ecologic studies in epidemiology: con-
systematic review. Ann Intern Med 2004;140:189–202. cepts, principles, and methods. Ann Rev Public Health
1995;16:61–81.
9. TREATMENT
13. SUMMARIZING THE EVIDENCE
Friedman LM, Furberg CD, DeMets DL. Fundamentals
of Clinical Trials, 3rd ed. New York: Springer-Verlag; Goodman S, Dickersin K. Metabias: a challenge for
1998. compar- ative effectiveness research. Ann Intern Med
Kaul S, Diamond GA. Good enough: a primer on the 2011;155: 61–62.
analysis and interpretation of noninferiority trials. Ann Lau J, Ioannidis JPA, Schmid CH. Summing up the
Intern Med 2006;145:62–69. evidence: one answer is not always enough. Lancet
Pocock SJ. Clinical Trials: A Practical Approach. 1998;351: 123–127.
Chichester: Wiley; 1983. Leeflang MMG, Deeks JJ, Gatsonis C, et al. Systematic
Sackett DL, Gent M. Controversy in counting and reviews of diagnostic test accuracy. Ann Intern Med
attrib- uting events in clinical trials. N Engl J Med 2008;149:889–897.
1979;301: 1410–1412. Norris SL, Atkins D. Challenges in using nonrandomized
The James Lind Library. http://www.jameslindlibrary.org studies in systematic reviews of treatment interventions.
Tunis SR, Stryer DB, Clancy CM. Practical clinical tri- Ann Intern Med 2005;142:1112–1119.
als: increasing the value of clinical research for Riley RD, Lambert PC, Abo-Zaid G. Meta-analysis of indi-
decision making in clinical and health policy. JAMA vidual participant data: rationale, conduct, and
2004;291: 1624–1632. report- ing. BMJ 2010;340:c221.
Yusuf S, Collins R, Peto R. Why do we need some
large, simple randomized trials? Stat Med 1984;3:409– 14. KNOWLEDGE MANAGEMENT
420. Cook DA, Dupras DM. A practical guide to develop-
ing effective Web-based learning. J Gen Intern Med
10. PREVENTION 2004;19:698–707.
Goodman SN. Probability at the bedside: the knowing Shiffman RN, Shekelle P, Overhage JM, et al. Standard-
of chances or the chances of knowing? Ann Intern ized reporting of clinical practice guidelines: a proposal
Med 1999;130:604–606. from the conference on guideline standardization. Ann
Harris R, Sawaya GF, Moyer VA, et al. Reconsidering the Intern Med 2003;139;493–498.
criteria for evaluation proposed screening programs:
Note: Page numbers in italics denote figures; those followed by a t denote tables.
251
25 Ind
Randomized controlled trials, 134, 135 studies of, 61–67 Spectrum of patients, 116
alternatives to, 147–148 taking other variables into Stage migration, 96
assessment of outcomes, 142–143, account, 71 Standardization, 75
143t ways to express and compare, 67–71 Statistical power, 181, 184–185
blinding, 141–142, 141 Risk assessment tool, 54 Statistical precision, 183
cluster, 145 Risk difference, 68 Statistical tests, 176, 178–179, 178t
comparison groups, 138, 138 Risk factors, 51 Statistically significant, 176, 177
crossover, 146 casuality and, 53–54 Stratification, 74
differences arising after, 139–141 to choose treatment, 58 Stratified randomization, 139
ethics, 135 common exposure to, Structured abstract, 234, 234t
intervention, 136, 138 52 Subgroup analysis, 187–189, 188, 189t
limitations of, 147 in establishing pretest probability for Subgroups, 146
non-inferiority, 145 diagnostic testing, 58 Superiority trials, 145
versus observational studies, to predict risk, 54 Surveillance, 155
148 sampling, 135–136, 137 to prevent disease, 59 Surveys, 21
variations on, 145–146 Risk prediction tool, 54 Survival analysis, 97
Randomized trials, 156 calibration, 56 Survival curves, 98–100, 99
Range, 34 clinical uses of, 58–59 interpreting, 100
Rare events, detecting, 185, 185 discrimination, 56 Survival of a cohort, 97–98, 99
Reading journals, 233–234, 233, 234t risk stratification, 57 Survival rate, 20
Recall bias, 86 sensitivity and specificity of, 56–57, Systematic review, 210–216, 210t, 212t
Receiver operator characteristic (ROC) 57
curve, 57, 114–115, 115 Risk ratio, 68
Reference standard, 109 Risk stratification, 54 T
Referral process, in increasing the for screening programs, 58 Tertiary prevention, 154
pretest probability of disease, Run-in period, 141 treatments in, 158
120–122 Test set, 102
Regression to the mean, 45–46 Time, distribution of disease by, 26–27,
Relative risk, 68, 68t S 26
Reliability, 34, 35 Safety, 157 Time-series studies, 202, 203
Reproducibility, 34 Sample, 6, 6 Time-to-event analysis, 99
Responsiveness, 34–35 Sample size, 181 Training set, 102
Restriction, 73 Sampling bias, 12, 103 Treatment
Retrospective/historical cohort Sampling fraction, 25, 36 allocating, 139, 139t
studies, 63 Scales, 33 effectiveness trials in, 143–144, 143
Reverse causation, 147 Scientific misconduct, 226 efficacy trials in, 143–144, 143
Reversible associations, 201, 201 Screening, 153 equivalence trial, 145
Reviews acceptable to patients and clinicians, explanatory trials, 144–145, 144
narrative, 209 166 ideas and evidence, 132–134
systematic, 210 changes in, 169 intention-to-treat, 144–145, 144
traditional, 209 low positive predictive value, non-inferiority trials, 145
Risk 164 safety, 165–166 observational studies of interventions,
absolute, 67–68, 67t, sensitivity, 163 147–148
68t attributable, 68, 68t calculating, 163–164, 164t phases of clinical trials, 148–149
confounding, 71–72 simplicity and low cost, 164–165 randomized controlled trials, 134, 135
control of, 72–76 specificity, 163 alternatives to, 147–148
controlling for extraneous variables, Secondary prevention, 154 assessment of outcomes, 142–143,
88–89 treatments in, 158 143t
defined, 50 Selection bias, 7, 14, 16, blinding, 141–142, 141
difference, 67t, 68 237 Sensitive tests, use of, comparison groups, 138, 138
effect modification, 76–77 113 Sensitivity, 111, 112 differences arising after, 139–141
of false-positive result, 166–167 defined, 113 ethics, 135
interpreting attributable and relative, establishing, 115–117 intervention, 136, 138
68–69, 69t of risk prediction tool, 56– limitations of, 147
of negative labeling effect, 167 57 trade-offs between, 113, versus observational studies, 148
observational studies and cause, 76 113t sampling, 135–136, 137
odds ratio, 87–88, 87 Sensitivity analysis, 104–105 variations on, 145–146
overdiagnosis (pseudo disease) in Serial likelihood ratios, 128–129 studies of, 134
cancer screening, 167–169 Serial testing, 128, 128 superiority trials, 145
population, 69–71 Shared decision making, 12 tailoring the results of trials to
population attributable, 69 Single-blind, 142 individual patients, 146–147
predicting, 54–56 Single causes, 195, 195t Treatment effect studies of, 134
ratio, 67t, 68 Skewed distribution, 39 Trials of N = 1, 146–147
recognizing, 51–54 Specificity, 111, 112, 202 True negative, 109
relative, 67t, 68, 68t defined, 113 True positive, 109
simple descriptions of, 71 establishing, 115–117 Two-tailed, 179
of risk prediction tool, 56–
57 trade-offs between, 113,
113t
Index 255