0% found this document useful (0 votes)
377 views

Case Control Study Design

A case-control study compares exposure to potential risk factors between individuals who have a disease (cases) and individuals who don't (controls). It collects data on exposures retrospectively. Case-control studies are prone to bias but are relatively easy to conduct compared to other study designs. The document discusses issues in designing case-control studies such as defining cases and controls, sources of bias, and analyzing results using odds ratios to estimate disease risk among exposed groups.

Uploaded by

georgeloto12
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
377 views

Case Control Study Design

A case-control study compares exposure to potential risk factors between individuals who have a disease (cases) and individuals who don't (controls). It collects data on exposures retrospectively. Case-control studies are prone to bias but are relatively easy to conduct compared to other study designs. The document discusses issues in designing case-control studies such as defining cases and controls, sources of bias, and analyzing results using odds ratios to estimate disease risk among exposed groups.

Uploaded by

georgeloto12
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Introduction to study designs - casecontrol studies

Introduction

Learning objectives:You will learn about basic introduction to case-control


studies, its analysis and interpretation of outcomes. Case-control studies are
one of the frequently used study designs due to the relative ease of its
application in comparison with other study designs. This section introduces
you to basic concepts, application and strengths of case-control study. This
section also covers: 1. Issues in the design of case-control studies 2.
Common sources of bias in a case-control study 3. Analysis of case-control
studies 4. Strengths and weaknesses of case-control studies 5. Nested casecontrol studies Read the resource text below.

Resource text
Case-control studies start with the identification of a group of cases
(individuals with a particular health outcome) in a given population and a
group of controls (individuals without the health outcome) to be included in
the study.

In a case-control study the prevalence of exposure to a potential risk


factor(s) is compared between cases and controls. If the prevalence of
exposure is more common among cases than controls, it may be a risk

factor for the outcome under investigation. A major characteristic of casecontrol studies is that data on potential risk factors are collected
retrospectively and as a result may give rise to bias. This is a particular
problem associated with case-control studies and therefore needs to be
carefully considered during the design and conduct of the study.

1. Issues in the design of case-control studies


Formulation of a clearly defined hypothesis
As with all epidemiological investigations the beginning of a case-control
study should begin with the formulation of a clearly defined
hypothesis. Case definition It is essential that the case definition is clearly
defined at the outset of the investigation to ensure that all cases included in
the study are based on the same diagnostic criteria. Source of cases The
source of cases needs to be clearly defined.
Selection of cases
Case-control studies may use incident or prevalent cases.
Incident cases
comprise cases newly diagnosed during a defined time period. The use of
incident cases is considered as preferential, as the recall of past exposure(s)
may be more accurate among newly diagnosed cases. In addition, the
temporal sequence of exposure and disease is easier to assess among
incident cases.
Prevalent cases
comprise individuals who have had the outcome under investigation for
some time. The use of prevalent cases may give rise to recall bias as
prevalent cases may be less likely to accurately report past exposures(s). As
a result, the interpretation of results based on prevalent cases may prove
more problematic, as it may be more difficult to ensure that reported events
relate to a time before the development of disease rather than to the
consequence of the disease process itself. For example, individuals may
modify their exposure following the onset of disease. In addition, unless the
effect of exposure on duration of illness is known, it will not be possible to
determine the extent to which a particular characteristic is related to the
prognosis of the disease once it develops rather than to its cause.

Source of cases
Cases may be recruited from a number of sources; for example they may be
recruited from a hospital, clinic, GP registers or may be population bases.
Population based case control studies are generally more expensive and
difficult to conduct.
Selection of controls
A particular problem inherent in case-control studies is the selection of a
comparable control group. Controls are used to estimate the prevalence of
exposure in the population which gave rise to the cases. Therefore, the ideal
control group would comprise a random sample from the general population
that gave rise to the cases. However, this is not always possible in practice.
The goal is to select individuals in whom the distribution of exposure status
would be the same as that of the cases in the absence of an exposure
disease association. That is, if there is no true association between exposure
and disease, the cases and controls should have the same distribution of
exposure. The source of controls is dependent on the source of cases. In
order to minimize bias, controls should be selected to be a representative
sample of the population which produced the cases. For example, if cases
are selected from a defined population such as a GP register, then controls
should comprise a sample from the same GP register.

In case-control studies where cases are hospital based, it is common to


recruit controls from the hospital population. However, the choice of controls
from a hospital setting should not include individuals with an outcome
related to the exposure(s) being studied. For example, in a case-control
study of the association between smoking and lung cancer the inclusion of
controls being treated for a condition related to smoking (e.g. chronic
bronchitis) may result in an underestimate of the strength of the association
between exposure (smoking) and outcome. Recruiting more than one control
per case may improve the statistical power of the study, though including

more than 4 controls per case is generally considered to be no more


efficient.
Measuring exposure status
Exposure status is measured to assess the presence or level of exposure for
each individual for the period of time prior to the onset of the disease or
condition under investigation when the exposure would have acted as a
causal factor. Note that in case-control studies the measurement of exposure
is established after the development of disease and as a result is prone to
both recall and observer bias. Various methods can be used to ascertain
exposure status. These include:

Standardized questionnaires
Biological samples
Interviews with the subject
Interviews with spouse or other family members
Medical records
Employment records
Pharmacy records
The procedures used for the collection of exposure data should be the same
for cases and controls.

2. Common sources of bias in case-control


studies
Due to the retrospective nature of case-control studies, they are particularly
susceptible to the effects of bias, which may be introduced as a result of a
poor study design or during the collection of exposure and outcome data.
Because the disease and exposure have already occurred at the outset of a
case control study, there may be differential reporting of exposure
information between cases and controls based on their disease status. For
example, cases and controls may recall past exposure differently (recall
bias). Similarly, the recording of exposure information may vary depending
on the investigator's knowledge of an individual's disease status
(interviewer/observer bias). Therefore, the design and conduct of the study
must be carefully considered, as there are limited options for the control of
bias during the analysis. Selection bias in case-control studies Selection
bias is a particular problem inherent in case-control studies, where it gives

rise to non-comparability between cases and controls. Selection bias in case


control studies may occur when: 'cases (or controls) are included in (or
excluded from) a study because of some characteristic they exhibit which is
related to exposure to the risk factor under evaluation' [1]. The aim of a
case-control study is to select study controls who are representative of the
population which produced the cases. Controls are used to provide an
estimate of the exposure rate in the population. Therefore, selection bias
may occur when those individuals selected as controls are unrepresentative
of the population that produced the cases.

The potential for selection bias in case control studies is a particular problem
when cases and controls are recruited exclusively from hospital or clinics.
Hospital patients tend to have different characteristics than the population,
for example they may have higher levels of alcohol consumption or cigarette
smoking. If these characteristics are related to the exposures under
investigation, then estimates of the exposure among controls may be
different from that in the reference population, which may result in a biased
estimate of the association between exposure and disease. Berkesonian bias
is a bias introduced in hospital based case-control studies, due to varying
rates of hospital admissions. As the potential for selection bias is likely to be
less of a problem in population based case-control studies, neighbourhood
controls may be a preferable choice when using cases from a hospital or
clinic setting. Alternatively, the potential for selection bias may be minimized
by selecting controls from more than one source, such as by using both
hospital and neighbourhood controls. Selection bias may also be introduced
in case-control studies when exposed cases are more likely to be selected
than unexposed cases.

3. Analysis of case-control studies


The odds ratio (OR) is used in case-control studies to estimate the strength
of the association between exposure and outcome. Note that it is not
possible to estimate the incidence of disease from a case control study

unless the study is population based and all cases in a defined population
are obtained.
The results of a case-control study can be presented in a 2x2 table as follow:

The odds ratio is a measure of the odds of disease in the exposed compared
to the odds of disease in the unexposed (controls) and is calculated as:

Example: Calculation of the OR from a hypothetical case-control study of


smoking and cancer of the pancreas among 100 cases and 400 controls.
Table 1. Hypothetical case-control study of smoking and cancer of the
pancreas.

OR = 60 x 300
100 x 40 OR = 4.5 The OR calculated from the
hypothetical data in table 1 estimates that smokers are 4.5 times more likely
to develop cancer of the pancreas than non-smokers. NB: The odds ratio of
smoking and cancer of the pancreas has been performed without adjusting
for potential confounders. Further analysis of the data would involve
stratifying by levels of potential confounders such as age. The 2x2 table can
then be extended to allow for stratum specific rates of the confounding
variable(s) to be calculated and, where appropriate, an overall summary
measure, adjusted for the effects of confounding, and a statistical test of
significance can also be calculated. In addition, confidence intervals for the
odds ratio would also be presented.

4. Strengths and weaknesses of case-control


studies
Strengths

Cost effective relative to other analytical studies such as cohort

studies.
Case-control studies are retrospective, and cases are identified at the

beginning of the study; therefore there is no long follow up period (as


compared to cohort studies).
Efficient for the study of diseases with long latency periods.
Efficient for the study of rare diseases.

Good for examining multiple exposures.


Weaknesses

Particularly prone to bias; especially selection, recall and observer

bias.
Case-control studies are limited to examining one outcome.
Unable to estimate incidence rates of disease (unless study is

population based).
Poor choice for the study of rare exposures.
The temporal sequence between exposure and disease may be difficult
to determine.
References 1. Hennekens CH, Buring JE. Epidemiology in Medicine, Lippincott
Williams & Wilkins, 1987.

Case-Control Studies
Cohort studies have an intuitive logic to them, but they can be very problematic when:
1. The outcomes being investigated are rare;
2. There is a long time period between the exposure of interest and the development of the
disease; or
3. It is expensive or very difficult to obtain exposure information from a cohort.
In the first case, the rarity of the disease requires enrollment of very large numbers of people. In
the second case, the long period of follow-up requires efforts to keep contact with and collect
outcome information from individuals. In all three situations, cost and feasibility become an
important concern.
A case-control design offers an alternative that is much more efficient. The goal of a casecontrol study is the same as that of cohort studies, i.e. to estimate the magnitude of association
between an exposure and an outcome. However, case-control studies employ a different
sampling strategy that gives them greater efficiency. As with a cohort study, a case-control
study attempts to identify all people who have developed the disease of interest in the defined
population. This is not because they are inherently more important to estimating an association,
but because they are almost always rarer than non-diseased individuals, and one of the
requirements of accurate estimation of the association is that there are reasonable numbers of
people in both the numerators (cases) and denominators (people or person-time) in the
measures of disease frequency for both exposed and reference groups. However, because
most of the denominator is made up of people who do not develop disease, the case-control
design avoids the need to collect information on the entire population by selecting a sample of
the underlying population.
Rothman describes the case-control strategy as follows:
"Case-control studies are best understood by considering as the starting point a source
population, which represents a hypothetical study population in which a cohort study might have
been conducted. The source population is the population that gives rise to the cases included in
the study. If a cohort study were undertaken, we would define the exposed and unexposed
cohorts (or several cohorts) and from these populations obtain denominators for the incidence
rates or risks that would be calculated for each cohort. We would then identify the number of
cases occurring in each cohort and calculate the risk or incidence rate for each. In a case-control
study the same cases are identified and classified as to whether they belong to the exposed or
unexposed cohort. Instead of obtaining the denominators for the rates or risks, however, a control
group is sampled from the entire source population that gives rise to the cases. Individuals in the
control group are then classified into exposed and unexposed categories. The purpose of the
control group is to determine the relative size of the exposed and unexposed components of the

source population."
To illustrate this consider the following hypothetical scenario in which the source population is
the state of Massachusetts. Diseased individuals are red, and non-diseased individuals are
blue. Exposed individuals are indicated by a whitish midsection. Note the following aspects of
the depicted scenario:
1. The outcome being investigated is rare.
2. There is a fairly large number of exposed individuals in the state, but most of these are
not diseased.
3. The proportion of exposed individuals among the disease cases (7/13) is higher than the
proportion of exposure among the controls.

If I somehow had exposure and outcome information on all of the subjects in the source
population and looked at the association using a cohort design, it might look like this:
Diseased
Exposed

Non-diseased
7

1,000

Total
1,007

Non-exposed

5,634

5,640

Therefore, the incidence in the exposed individuals would be 7/1,007 = 0.70%, and the
incidence in the non-exposed individuals would be 6/5,640 = 0.11%. Consequently, the risk ratio
would be 0.70/0.11=6.52, suggesting that those who had the risk factor (exposure) had 6.5
times the risk of getting the disease compared to those without the risk factor. This is a strong
association.
In this hypothetical example, I had data on all 6,647 people in the source population, and I could
compute the probability of disease (i.e., the risk or incidence) in both the exposed group and the
non-exposed group, because I had the denominators for both the exposed and non-exposed
groups.

The problem, of course, is that I usually don't have the resources to get the data on all
subjects in the population. If I took a random sample of even 5-10% of the population, I might
not have any diseased people in my sample.
An alternative approach would be to use surveillance databases or administrative databases to
find most or all 13 of the cases in the source population and determine their exposure status.
However, instead of enrolling all of the other 5,634 residents, suppose I were to just take a
sample of the non-diseased population. In fact, suppose I only took a sample of 1% of the nondiseased people and I then determined their exposure status. The data might look something
like this:
Diseased

Non-diseased

Total

Exposed

10

unknown

Non-exposed

56

unknown

With this sampling approach I can no longer compute the probability of disease in each
exposure group, because I no longer have the denominators in the last column. In other words, I
don't know the exposure distribution for the entire source population. However, the small control
sample of non-diseased subjects gives me a way to estimate the exposure distribution in the
source population. So, I can't compute the probability of disease in each exposure group, but I
can compute the odds of disease in each group.

The Odds Ratio


The odds of disease in the exposed group are 7/10, and the odds of disease in the non-exposed
group are 6/56. If I compute the odds ratio, I get (7/10) / (5/56) = 6.56, very close to the risk ratio
that I computed from data for the entire population. We will consider odds ratios and casecontrol studies in much greater depth in a later module. However, for the time being the key
things to remember are that:
1. The sampling strategy for a case-control study is very different from that of cohort
studies, despite the fact that both have the goal of estimating the magnitude of
association between the exposure and the outcome.

2. In a case-control study there is no "follow-up" period. One starts by identifying diseased


subjects and determines their exposure distribution; one then takes a sample of the
source population that produced those cases in order to estimate the exposure
distribution in the overall source population that produced the cases. [In cohort studies
none of the subjects have the outcome at the beginning of the follow-up period.]
3. In a case-control study, you cannot measure incidence, because you start with diseased
people and non-diseased people, so you cannot calculate relative risk.
4. The case-control design is very efficient. In the example above the case-control study of
only 79 subjects produced an odds ratio (6.56) that was a very close approximation to
the risk ratio (6.52) that was obtained from the data in the entire population.
5. Case-control studies are particularly useful when the outcome is rare is uncommon in
both exposed and non-exposed people.

The Difference Between "Probability" and "Odds"?

The probability that an event will occur is the fraction of times you expect to see that event in many trials

Probabilities always range between 0 and 1.

The odds are defined as the probability that the event will occur divided by the probability that the event
not occur.

If the probability of an event occurring is Y, then the probability of the event not occurring is 1-Y. (Example: I
probability of an event is 0.80 (80%), then the probability that the event will not occur is 1-0.80 = 0.20, or 20%.

The odds of an event represent the ratio of the (probability that the event will occur) / (probability that the eve
not occur). This could be expressed as follows:
Odds of event = Y / (1-Y)

So, in this example, if the probability of the event occurring = 0.80, then the odds are 0.80 / (1-0.80) = 0.80/0.20
(i.e., 4 to 1).

If a race horse runs 100 races and wins 25 times and loses the other 75 times, the probability of winning
25/100 = 0.25 or 25%, but the odds of the horse winning are 25/75 = 0.333 or 1 win to 3 loses.

If the horse runs 100 races and wins 5 and loses the other 95 times, the probability of winning is 0.05 or
and the odds of the horse winning are 5/95 = 0.0526.

If the horse runs 100 races and wins 50, the probability of winning is 50/100 = 0.50 or 50%, and the odd
winning are 50/50 = 1 (even odds).

If the horse runs 100 races and wins 80, the probability of winning is 80/100 = 0.80 or 80%, and the odd
winning are 80/20 = 4 to 1.

NOTE that when the probability is low, the odds and the probability are very similar.

On Sept. 8, 2011 the New York Times ran an article on the economy in which the writer
began by saying "If history is a guide, the odds that the American economy is falling into a
double-dip recession have risen sharply in recent weeks and may even have reached 50
percent."
Further down in the article the author quoted the economist who had been interviewed for the
story. What the economist had actually said was, "Whether we reach the technical definition [of
a double-dip recession] I think is probably close to 50-50."
Question: was the author correct in saying that the "odds" of a double-dip recession may have
reached 50 percent?
ANSWER

Hepatitis Outbreak in Marshfield, MA

In 2004 there was an outbreak of hepatitis A on


the South Shore of Massachusetts. Over a period of a few weeks there were 20 cases of
hepatitis A that were reported to the MDPH, and most of the infected persons were residents of
Marshfield, MA. Marshfield's health department requested help in identifying the source from
MDPH. The investigators quickly performed descriptive epidemiology. The epidemic curve
indicated a point source epidemic, and most of the cases lived in the Marshfield area, although
some lived as far away as Boston. They conducted hypothesis-generating interviews, and taken
together, the descriptive epidemiology suggested that the source was one of five or six food
establishments in the Marshfield area, but it wasn't clear which one. Consequently, the
investigators wanted to conduct an analytic study to determine which restaurant was the source.

They invited all 20 cases of hepatitis A to answer questions from a questionnaire designed for
this study, and 19 of the cases agreed to complete the survey.

Summary

Note that the lower three study designs (retrospective and prospective cohort studies and
clinical trials) are similar in that an initially disease free cohort is divided into groups based on
their "exposure" status, i.e., whether or not they have a particular "risk factor," and for all three,
the investigator measures and compares the incidence of disease. In contrast, case-control
studies identify diseased and non-diseased subjects and then measure and compare their
likelihood of having had certain prior exposures.

You might also like