Probability Cheat Sheet
Probability Cheat Sheet
Thursday 22 April 2021: Building blocks for probabilistic reasoning in clinical diagnosis
Medical tests often serve as an indirect tool for determining the presence or absence of a health
condition of medical importance. Instead, many medical tests indicate other traits about the
patient that are associated with the health condition of interest. For example, if blood is drawn
for a serologic test1 and the results come back “seropositive,” this means that specific antibodies
have been detected in the patient’s blood. In some cases, the kind of antibody that has been
detected may be caused by different antigens (for example, infection by different bacteria,
viruses, or allergens), so a seropositive test does not provide decisive proof of infection by any
one of these antigens. Consequently, the approach that clinicians take when diagnosing the
health status of a patient following their test results needs to be more nuanced. In problem set 3,
you will apply several rules of probability to arrive at useful insights about what medical test
results mean about the health status of a tested patient. In this exercise, you will build your
intuitions about why we can rely on these rules.
1
Tests examining blood serum to detect either antibodies present in the blood or introduced substances.
1
Complements imply complementary probabilities: pairs of probabilities that sum to 1. In the case
of medical tests there are two pairs of complementary probabilities we need to recognize:
• True positives and false negatives complement each other: 𝑃(𝑇|𝐶) + 𝑃(𝑇 𝑐 |𝐶) = 1. Out
of everyone who has the condition, the proportions of those yielding positive tests and
those yielding negative tests will add up to 1.
• False positives and true negatives complement each other: 𝑃(𝑇|𝐶 𝑐 ) + 𝑃(𝑇 𝑐 |𝐶 𝑐 ) = 1.
Out of everyone who does not have the condition, the proportions of those yielding
positive tests and those yielding negative tests will add up to 1.
We desire a medical test that yields positive results if and only if a patient has a medical
condition of interest, and equivalently we desire a test that yields negative results if and only if a
patient does not have that condition. While no test is perfect, a good one will have both of the
following properties:
• It will be very sensitive to a particular medical condition, so that most individuals who
have that condition will yield positive results when given the test. In statistical terms,
good tests will have a high probability of a true positive.
• It will be very specific to a particular medical condition. In other words, other medical
conditions will not lead the test to yield a positive result. In statistical terms, good tests
will have a high probability of a true negative.
What will happen if a medical test is strong in one of these respects but not the other?
• A test with high sensitivity but low specificity will yield too many false positives and too
few true negatives.
• A test with low sensitivity but high specificity will yield too many false negatives and too
few true positives.
As a very general rule, we might reject any test with either a sensitivity or a specificity lower
than 0.5, but the higher the better in both cases. If both of these two values exceed 0.5, does this
make it more probable than not that a positive test implies that a patient has the condition of
interest, and that a negative test implies that the patient does not have the condition?
Unfortunately, no. Instead, successful clinical diagnosis requires us to calculate the balance of
true and false positives in the case of a positive test, or the balance of true and false negatives in
the case of a negative test. This requires us to look beyond the sensitivity and specificity of the
test, to also consider epidemiological facts: how many people have the condition overall? The
exercises on the next page will help you to improve your intuitions about why knowledge of the
number of people with the condition interacts with the sensitivity and specificity of the test.
2
1. Imagine a population of 10000 individuals. 100 of these individuals has the condition, while
the remainder do not: 𝑁(𝐶) = 100 and 𝑁(𝐶 𝑐 ) = 9900. The sensitivity of a test for this
condition is very good, 0.99, but the specificity is not as good, 0.60. A person drawn at random
from this population has yielded a positive test.
a. How many people with the condition are expected to yield a positive test?
𝑁(𝐶) × 𝑃(𝑇|𝐶) = 𝑁(𝐶 and 𝑇) =
b. If the specificity is 𝑃(𝑇 𝑐 |𝐶 𝑐 ), what is the complementary probability of a false positive?
𝑃(𝑇|𝐶 𝑐 ) = 1 − 𝑃(𝑇 𝑐 |𝐶 𝑐 ) =
c. How many people who do not have the condition are expected to yield a positive test?
𝑁(𝐶 𝑐 ) × 𝑃(𝑇|𝐶 𝑐 ) = 𝑁(𝐶 𝑐 and 𝑇) =
d. How many positive tests should we expect overall?
𝑁(𝐶 and 𝑇) + 𝑁(𝐶 𝑐 and 𝑇) = 𝑁(𝑇)
e. Who receives more positive tests: those with the condition or those without? Why?
f. Given your response to the previous question, what should a person conclude if they have
received a positive test?
2. Now imagine the same population, but let’s imagine a much better test: the sensitivity is the
same as before, but specificity is much better: 0.99. A person drawn at random from this
population has yielded a positive test.
a. How many people with the condition are expected to yield a positive test?
b. What is the probability of a false positive?
c. How many people who do not have the condition are expected to yield a positive test?
d. How many positive tests should we expect overall?
e. Who receives more positive tests: those with the condition or those without? Why?
f. Given your response to the previous question, what should a person conclude if they have
received a positive test?
3. Let’s return to the original test (sensitivity: 0.99, specificity: 0.60). However, let’s imagine a
very common condition: Only 200 individuals out of a population of 10000 are free of the
condition (i.e., 9800 do have it). Let’s imagine that an individual has received a negative test.
Does this mean they have sufficient reason to believe they do not have the condition?
a. How many people who do not have the condition are expected to yield a negative test?
𝑁(𝐶 𝑐 ) × 𝑃(𝑇 𝑐 |𝐶 𝑐 ) = 𝑁(𝐶 𝑐 and 𝑇 𝑐 ) =
b. If the sensitivity is 𝑃(𝑇|𝐶), what is the complementary probability of a false negative?
𝑃(𝑇 𝑐 |𝐶) = 1 − 𝑃(𝑇|𝐶) =
c. How many people who have the condition are expected to yield a negative test?
𝑁(𝐶) × 𝑃(𝑇 𝑐 |𝐶) = 𝑁(𝐶 and 𝑇 𝑐 ) =
d. How many negative tests should we expect overall?
𝑁(𝐶 𝑐 and 𝑇 𝑐 ) + 𝑁(𝐶 and 𝑇 𝑐 ) = 𝑁(𝑇 𝑐 )
e. Who receives more negative tests: those with the condition or those without? Why?
f. Given your response to the previous question, what should a person conclude if they have
received a negative test?