A short introduction to epidemiology
Chapter 6: Validity
Neil Pearce
Centre for Public Health Research
Massey University
Wellington, New Zealand
Chapter 7
Validity
• Systematic error
• Selection bias
• Information bias
• Confounding
Systematic Error
• Systematic error (bias) occurs if there is a
systematic difference between what the study
is estimating and what it is intended to
estimate
• It also occurs in clinical trials, but some types
of systematic error may be more likely in
epidemiologic studies because of the lack of
randomization of exposure
Types of Bias
• Suppose that we are studying the association
between an exposure and the risk of disease
in a population (the source population)
• We compare the risk of disease in the
exposed and non-exposed within the
source population over the risk period
Types of Bias
• Selection bias is any bias arising from the
way that study participants are selected (or
select themselves) from the source population
• Information bias is any bias arising from
errors in classification of the exposure or
disease status of the study participants
• Confounding occurs if (because of the lack
of randomization) the underlying risk of
disease is different in the exposed and non-
exposed groups
Chapter 7
Validity
• Systematic error
• Selection bias
• Information bias
• Confounding
Selection Bias
• In an incidence (or prevalence study)
selection bias will not occur if there is a
100% response rate
• However, if for example, cases of disease
are more likely to participate than non-
cases, and this is related to exposure, then
selection bias will occur
Selection Bias in Case-Control Studies
• In a case-control study, the controls are a
sample of the source population
• Selection bias can occur if the sample is
non-random, and the selection of controls
is related to exposure status
• In other words, selection bias can occur if
the controls are not representative of the
exposure in the source population
Selection Bias: Solutions
• Achieve a response rate of 100% (and in a
case-control study, ensure that controls are a
random sample of the source population)
• Control for the determinants of selection bias
as confounders in the analysis (e.g. if
response rates vary by social class then
control for social class as a confounder)
• Assess the likely size of the selection bias
Selection Bias in Case-Control Studies:
Solutions
• Selection bias can occur if the selection of
controls is related to exposure status
• In the analysis, we can control for the
determinants of control selection (e.g. social
class)
• An exception is when we have chosen
“other disease” controls and the other diseases
are directly caused by the main exposure of
interest: this selection bias cannot be removed
Chapter 7
Validity
• Systematic error
• Selection bias
• Information bias
• Confounding
Information Bias
• May occur when there is misclassification of
exposure or disease
• If misclassification of exposure (or disease) is
unrelated to disease (or exposure) then the
misclassification is non-differential
• If misclassification of exposure (or disease) is
related to disease (or exposure) then the
misclassification is differential
Actual Rates and Rate Ratio
Exposed Non-exposed
Deaths 100 10
Person-years 100,000 100,000
Incidence rate 100 10
Rate ratio 10.0
Example of Non-Differential
Misclassification
• The true disease rates are 100 per 100,000 in
the exposed and 10 per 100,000 in the non-
exposed
• 15% of exposed workers are misclassified as
non-exposed
• 10% of non-exposed workers are misclassified
as exposed
• Misclassification is not related to disease
status
Observed Rates and Rate Ratio
Exposed Non-exposed
Deaths 85 + 1 = 86 9 + 15 = 24
Person-years 85,000 + 10,000 90,000 + 15,000
= 95,000 = 105,000
Incidence rate 91 23
Rate ratio 4.0
Non-Differential Misclassification
• Bias is usually (but not always) towards the
null, i.e. the relative risk is biased towards
1.0
• If we find an association between exposure
and disease, and there is non-differential
misclassification, then the true relative risk
is usually (but not always) stronger than the
observed relative risk
Non-Differential Misclassification:
Exceptions to “Bias Towards the Null”
• If the specificity of case ascertainment is
100% but the sensitivity is less than 100%
then the risk difference will be biased towards
the null, but the relative risk will not be
• E.g. suppose that we only identify 50% of
the deaths
Actual Rates and Rate Ratio
Exposed Non-exposed
Deaths 100 10
Person-years 100,000 100,000
Incidence rate 100 10
Rate ratio 10.0
Observed Rates and Rate Ratio
Exposed Non-exposed
Deaths 50 5
Person-years 100,000 100,000
Incidence rate 50 5
Rate ratio 10.0
Non-Differential Misclassification:
Exceptions to “Bias Towards the Null”
• If there are multiple exposure categories then
bias may be away from the null for some
categories
Actual Rates and Rate Ratio
High Low Non-exposed
Deaths 100 10 5
Person-years 100,000 100,000 100,000
Incidence rate 100 10 5
Rate ratio 20.0 2.0 1.0
Example of Non-Differential
Misclassification
• 15% of highly exposed workers are
misclassified as low-exposed
• 10% of low-exposed workers are
misclassified as highly exposed
• There is no misclassification of the non-
exposed
• Misclassification is not related to
disease status
Observed Rates and Rate Ratio
High Low Non-exposed
Deaths 86 24 5
Person-years 95,000 105,000 100,000
Incidence rate 91 23 5
Rate ratio 18.1 4.6 1.0
Non-Differential Misclassification:
Exceptions to “Bias Towards the Null”
• If there is misclassification of a positive
confounder then control for that confounder
will only partially remove the confounding;
hence there will be bias away from the null
(compared with what would have been
obtained with full control of confounding)
Differential Misclassification
• Bias can be in any direction, i.e. towards or
away from the null value
• It is usually therefore desirable to ensure that
misclassification is non-differential, even if
this means not using some information on
one group, i.e. it is important to collect the
information in a similar manner in the groups
being compared
Chapter 7
Validity
• Systematic error
• Selection bias
• Information bias
• Confounding
Confounding
• Occurs when the exposed and non-exposed
groups in the source population are not
comparable, because of inherent differences
in background disease risk
• Confounding can also be introduced into a
study through selection factors (response
bias) or misclassification of exposure or
disease
Example of Confounding
Exposed Non-exposed
Cases 1,000 800
Non-cases 2,000 2,200
Total 3,000 3,000
Risk/100 33.3 26.7
Risk ratio 1.25
Example of Confounding
Non-
Smokers Non- smokers Non- Total Non-
exposed exposed exposed exposed exposed exposed
Cases 800 400 200 400 1,000 800
Non-cases 1,200 600 800 1,600 2,000 2,200
Total 2,000 1,000 1,000 2,000 3,000 3,000
Risk/100 40 40 20 20 33.3 26.7
Risk ratio 1.0 1.0 1.25
Criteria for Confounding
• Smoking is associated with disease (in the absence
of exposure) - 40% of smokers have the disease
under study compared with 20% of non-smokers
• It is associated with exposure (in the absence of
disease) - 2/3 of smokers are exposed compared
with 1/3 of non-smokers
• We also assume that smoking is not an
intermediate factor in the causal pathway leading
from exposure to disease
Control of Confounding
• Randomization
• Restriction (e.g. restricting a study to white
females aged 20-24 years)
• Matching (e.g. matching for ethnicity, gender
and age) - but matching is not always
advisable in case-control studies
• Control in the analysis
Control of Confounding in the Analysis
• Stratify the data into subgroups
• Calculate the effect estimate within each
subgroup
• Calculate a summary effect estimate
across strata
Assessment of Confounding
• Controlling for surrogates (e.g. social class)
of potential confounders
• Obtain confounder information for a subgroup
and assess confounding in this subgroup
• Assess how strong the confounding is likely
to be
Summary of Validity Issues
• Reduce random error by making the study as
large as possible and through appropriate study
design
• Minimize selection bias by having a good
response rate (and selecting controls appropriately
in a case-control study)
• Ensure that information bias is non-differential
and keep it as small as possible
• Minimize confounding in the study design and
control for it in the analysis
Summary of Validity Issues
• Study design always involves a compromise
between these issues, e.g. obtaining better
exposure information may reduce information bias
but may increase random error if the study size is
thereby reduced
• Confounding is often weaker than is “expected”
• Non-differential information bias cannot usually
cause “false positive” findings
A short introduction to epidemiology
Chapter 7: Validity
Neil Pearce
Centre for Public Health Research
Massey University
Wellington, New Zealand