0% found this document useful (0 votes)
95 views25 pages

Lecture 3-5 - Analyzing Contingency Tables: Azadeh Alimadad. DANA 4820 Jan 17 - 24, 2022

This document summarizes key concepts from lectures 3-5 on analyzing contingency tables: 1. It discusses types of observational and experimental studies, contingency tables, and measures of association like sensitivity, specificity, relative risk, and odds ratio. 2. Examples are provided to illustrate concepts like calculating probabilities from contingency tables and determining if a study is prospective or retrospective. 3. Methods for testing independence in contingency tables are covered, including the chi-squared statistic and calculating odds ratios to measure strength of association between variables.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views25 pages

Lecture 3-5 - Analyzing Contingency Tables: Azadeh Alimadad. DANA 4820 Jan 17 - 24, 2022

This document summarizes key concepts from lectures 3-5 on analyzing contingency tables: 1. It discusses types of observational and experimental studies, contingency tables, and measures of association like sensitivity, specificity, relative risk, and odds ratio. 2. Examples are provided to illustrate concepts like calculating probabilities from contingency tables and determining if a study is prospective or retrospective. 3. Methods for testing independence in contingency tables are covered, including the chi-squared statistic and calculating odds ratios to measure strength of association between variables.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Azadeh Alimadad.

DANA 4820 Jan 17 – 24, 2022

Lecture 3-5 - Analyzing contingency tables

1. What are the type of studies?


2. Understanding the time of studies
3. Contingency tables
4. Probability structure
5. Sensitivity, Specificity, Positive predicted value
6. Measure of association for 2x2 contingency tables
7. Relative Risk, Risk difference
8. Odds Ratio
9. Testing Independence in contingency table
10. Pearson chi-squared statistic
11. Likelihood-Ratio Statistic
12. Residuals for Cells in a contingency table
13. Partitioning Chi-squared
14. Simpson’s paradox
Azadeh Alimadad. DANA 4820 Jan 17 – 24, 2022

Types of studies we will look at:

Observational (No Randomization) Experimental (Randomization)

• Cohort studies . Clinical trials


• Case-control studies
• Cross-sectional studies

Question: What is the benefit of randomization?


Azadeh Alimadad. DANA 4820 Jan 17 – 24, 2022

Distribute unmeasured determinants of the outcome more equally


among exposure groups.
• We are unaware of and so do not measure
• We are aware of them, but are unable to measure them for one reason
or another

Example 1: The Vanguard Project was a study of HIV incidence and risk
behaviours among young MSM (men who have sex with men). Briefly,
HIV-seronegative self-identified gay and bisexual men between 18 and
30 years of age who lived in Vancouver were recruited at community
events, in community clinics and through advertisements in local
newspapers. At baseline and annually thereafter, participants
underwent HIV serologic testing with pre- and post-test counselling and
completed a confidential self-administered questionnaire. They studied
the associations between questionnaire answers (E) and the later risk of
becoming HIV-seropositive (D).
1. This is an example of an ________ study.
A. Experimental
B. Observational
Azadeh Alimadad. DANA 4820 Jan 17 – 24, 2022

Observational Studies:
To identify cohort, case-control and cross-sectional studies, think about
their E (exposure, determinant, or predictor) and D (disease or
outcome) variables:

Example2: (Recall, Example 1)

1. This is an example of a ________ study.


A. Cohort
B. Case-control
C. Cross-sectional
Azadeh Alimadad. DANA 4820 Jan 17 – 24, 2022

The TIMING of a study concerns when the outcome (D) occurs in time
relative to when the study begins. If the outcome (D) has already
occurred (and therefore all events being measured) when the study
begins, then the study is RETROSPECTIVE. If the outcome has not yet
occurred when the study begins, then it is PROSPECTIVE.

Example3: (Recall, Example 1)

This study was ___________.


A. Prospective
B. Retrospective
Azadeh Alimadad. DANA 4820 Jan 17 – 24, 2022
Azadeh Alimadad. DANA 4820 Jan 17 – 24, 2022

Contingency Table (Two way table)


Belief in Afterlife
Yes No/undecided Total
Females 1230 357 1587
Males 859 413 1272
Total 2089 770 2859

Probability Structure (Joint, Marginal, and Conditional Probabilities)

Let r denote the # of row and c denote the # of columns. This table has
cells that display the ………… possible combinations of outcomes.

1. Joint probabilities:

What is the probability of male’s belief in an afterlife?

2. Marginal probabilities:

What is the probability of belief in an afterlife?


Azadeh Alimadad. DANA 4820 Jan 17 – 24, 2022

3. Conditional probabilities:
What is the probability of belief in an afterlife within males?

Sensitivity and Specificity:


Many screening tests are designed to produce positive or negative
results.

A positive test: Implies that the disease is likely to be present and


follow-up diagnostic tests are therefore advisable.

A negative test: Implies that the disease is unlikely to be present and


follow-up diagnostic tests are therefore not indicated.

True Disease Status


Positive Negative
Positive TP FP TP+FP
Screening test Negative FN TN FN+TN
TP+FN FP+TN Total
Azadeh Alimadad. DANA 4820 Jan 17 – 24, 2022

Sensitivity: A measure of the ability of screening test to identify


𝑇𝑃
correctly those with the disease. *100
𝑇𝑃+𝐹𝑁

Sensitivity: 𝑷(𝑻𝒉𝒂𝒕 𝒕𝒉𝒆 𝒕𝒆𝒔𝒕 𝒊𝒔 +| 𝑷𝒆𝒓𝒔𝒐𝒏 𝒉𝒂𝒔 𝑫)

Specificity: A measure of the ability of screening test to identify


𝑇𝑁
correctly those without the disease. *100
𝐹𝑃+𝑇𝑁

Specificity: 𝑷(𝑻𝒉𝒂𝒕 𝒕𝒉𝒆 𝒕𝒆𝒔𝒕 𝒊𝒔 𝒏𝒆𝒈𝒂𝒕𝒊𝒗𝒆 |𝑫𝒐𝒆𝒔 𝒏𝒐𝒕 𝒉𝒂𝒗𝒆 𝑫)

If you get a positive test result, the more relevant conditional


probability is:

P(That the person has D | test is +) which is called positive predicted


value (PPV).

Note: For rare disease (Probability of having disease is low) even a test
with high sensitivity and specificity can have low PPV.

100
Yes No
1 99

Test + - test + -

1 0 12 87
Azadeh Alimadad. DANA 4820 Jan 17 – 24, 2022

Sensitivity of the test=0.86

Specificity of the test=0.88

We can calculate PPV (use above diagram)

PPV= 1/13 which is 0.08

Myocardial Infarction
Group Yes No Total
Placebo 189 10,845 11,034
Aspirin 104 10.933 11,037

Sample proportions of physician that suffered MI during the study?


Azadeh Alimadad. DANA 4820 Jan 17 – 24, 2022

Placebo:

Aspirin:

Difference:

Note: How big is the difference relative to actual effect?

0.010 - 0.001=0.009

0.410 - 0.401= 0.009

0.01/0.001=10 but 0.410/.401=1.02

The Ratio of probabilities is often called the Relative Risk.


𝜋1
Relative risk=
𝜋2
Azadeh Alimadad. DANA 4820 Jan 17 – 24, 2022

When RR=1, it means 𝜋1 = 𝜋2

RR in MI example is=……………..

Using the difference of proportion alone to compare two groups can be


misleading when the proportions are both close to zero.

Odds Ratio: Another measure of association for 2x2 contingency tables.


𝜋
𝑂𝑑𝑑𝑠 =
1−𝜋

𝑆𝑜𝑚𝑒𝑡ℎ𝑖𝑛𝑔 ℎ𝑎𝑝𝑝𝑒𝑛𝑖𝑛𝑔
Odds=
𝑠𝑜𝑚𝑒𝑡ℎ𝑖𝑛𝑔 𝑛𝑜𝑡 ℎ𝑎𝑝𝑝𝑒𝑛𝑖𝑛𝑔

Example: For the probability of success 𝜋=.75, what is the odds of success?
Azadeh Alimadad. DANA 4820 Jan 17 – 24, 2022

What if 𝜋=.5?

What if 𝜋=.25?

Properties of the odds and odds ratio:

Odds are non-negative

If odds >1, then P(success) > P(failure)

The success probability is the function of the odds


𝑜𝑑𝑑𝑠
𝝅=
𝑜𝑑𝑑𝑠 + 1

𝜋1
𝑜𝑑𝑑𝑠1 ⁄1−𝜋
1
Odds ratio 𝜃 = = 𝜋2 which can be any non-negative number.
𝑜𝑑𝑑𝑠2 ⁄1−𝜋
2
Azadeh Alimadad. DANA 4820 Jan 17 – 24, 2022
Azadeh Alimadad. DANA 4820 Jan 17 – 24, 2022

𝑜𝑑𝑑𝑠1
When X and Y are independent, then Odds ratio 𝜃 = =1
𝑜𝑑𝑑𝑠2

Example (2.4 book). The opening 2018 World Cup odds against being the wining
team specifies by espn.com were 9/2 for Germany, 5/1 for Brazil, 11/2 for France,
20/1 for England, and 7/1 for Spain. Find the corresponding prior probabilities of
winning for these five teams.
Azadeh Alimadad. DANA 4820 Jan 17 – 24, 2022

Example (Case-Control study)

An epidemiologist wants to test the hypothesis that drinking, and driving is


associated with fatal automobile injuries. They select 80 cases and 112 controls
from police records.

A. Calculate the odds of drinking among those with fatal injuries? Odds of
drinking among those with non-fatal injuries. Calculate odds ratio?

Fatal Non fatal


Drinkers 58 70
Non-drinkers 22 42

Odds ratio does not change when table orientation reverse.

The odds ratio is also called the cross product ratio because it equals the ratio of
𝒏𝟏𝟏 𝒏𝟐𝟐
the products
𝒏𝟏𝟐 𝒏𝟐𝟏
Azadeh Alimadad. DANA 4820 Jan 17 – 24, 2022

Inference:

Odds ratio and log odds ratio:

The sampling distribution of odds ratio is highly skewed (It is not Normal), so we
usually use its natural logarithm, 𝑙𝑛(𝜃) for making statistical inference.
(𝑙𝑛(𝜃), ℎ𝑎𝑠 𝑎𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑒𝑙𝑦 𝑛𝑜𝑟𝑚𝑎𝑙 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛)

Since 𝜃 = 1 implies independence, therefore 𝑙𝑛(𝜃) = 0 implies independence.

95% Wald Confidence Interval for OR:


̂ ) ± 𝒛𝜶⁄ (𝑺𝑬)
𝒍𝒏(𝜽
𝟐

𝟏 𝟏 𝟏 𝟏
SE=√ + + +
𝒏 𝒏
𝟏𝟏 𝒏 𝒏
𝟏𝟐 𝟐𝟏 𝟐𝟐

Which gives us (a,b). To find 95% CI for 𝜃 we use exponential function(𝑒 𝑎 , 𝑒 𝑏 )


Azadeh Alimadad. DANA 4820 Jan 17 – 24, 2022

Example 2.3.5: Retrospective case-control study:

Association between lung cancer and smoking. 709 cases and 709 controls were
selected, and the outcome measured is whether the subject ever was a smoker.
Lung Cancer
Smoker Cases Controls
Yes 688 650
No 21 59
Total 709 709

What is the odds ratio?

Estimate the Confidence Interval?


Azadeh Alimadad. DANA 4820 Jan 17 – 24, 2022

Wald CI for odds ratio

install.packages("epitools")
library(epitools)

oddsratio(c(688, 650, 21, 59), method = "wald", conf.level = 0.95, correction


= FALSE)
$data
Outcome
Predictor Disease1 Disease2 Total
Exposed1 688 650 1338
Exposed2 21 59 80
Total 709 709 1418

$measure
odds ratio with 95% C.I.
Predictor estimate lower upper
Exposed1 1.000000 NA NA
Exposed2 2.973773 1.786737 4.949427

For your information: Score CI

Computations for the score confidence interval our beyond our scope, but it is
available in software.

For small sample & even when an 𝒏𝒊𝒋 is 0 (Wald fails)


library(PropCIs)
orscoreci(688, 709, 650, 709, conf.level = 0.95)

data:
95 percent confidence interval:
1.794500 4.928018

Relationship between Odds ratio and Relative risk:

When 𝜋̂1 𝑎𝑛𝑑 𝜋̂2 both are close to zero, then the odds ratio and relative risk take
similar values.
Azadeh Alimadad. DANA 4820 Jan 17 – 24, 2022

Note: We should not rely on RR in case-control studies (There is no risk in case-


control studies).

Comparing Proportions in 2x2 tables:

We learned in an introductory statistics course


Azadeh Alimadad. DANA 4820 Jan 17 – 24, 2022

As we discussed before, we were more interested in Relative Risk instead of Risk


Differences.

Therefore, we can use exponential function to estimate CI for Relative Risk.

Example: A research study estimated that under a certain condition, the probability
a subject would be referred for heart catheterization was 0.906 for whites and
0.847 for blacks.

a. A press released about the study stated that the odds of referral for cardiac
catheterization for blacks are 60% of the odds for whites. Explain how they
obtained 60% (more accurately, 57%).
Azadeh Alimadad. DANA 4820 Jan 17 – 24, 2022

b. An associated Press story, that described the study stated “Doctors were only
60% as likely to order cardiac catheterization for blacks as for whites”. What is
wrong with this interpretation? Give the correct percentage for this interpretation.

Example: Consider the following two studies reported in the New York Times: a. A
British study reported that of smokers who get lung cancer, “women were 1.7 times
more vulnerable than men to get small-cell lung cancer”. Is 1.7 an odds ratio or a
relative risk?

b. A National Cancer Institute study about tamoxifen and breast cancer reported
that the women taking the drug were 45% less likely to experience invasive breast
cancer compared to the women taking placebo. Find the relative risk for

i) Those taking the drug compared to those taking placebo

ii) Those taking placebo compared to those taking the drug.


Azadeh Alimadad. DANA 4820 Jan 17 – 24, 2022

Example. Estimate and find 95% CI for the population OR.

OR shows the relationship between our response & explanatory variable. Now we
need to check is this relationship statistically significant or not.

Independence: Variables X, Y are statistically independent if the true conditional


distribution of Y is the same at each level of X.

Lemma: X and Y are independent if and only if

𝜋𝑖𝑗 = 𝜋𝑖+ 𝜋+𝑗 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑖, 𝑗


Azadeh Alimadad. DANA 4820 Jan 17 – 24, 2022

Testing Independence:

𝐻0 : 𝜋𝑖𝑗 = 𝜋𝑖+ 𝜋+𝑗 For all i and j

We assume H0 is true

𝜇𝑖𝑗 = 𝑛𝜋𝑖𝑗 = 𝑛𝜋𝑖+ 𝜋+𝑗

1. Pearson chi-squared statistic for testing 𝑯𝟎 is

2
( 𝑛𝑖𝑗 − 𝜇𝑖𝑗 )2
𝑋 =∑
𝜇𝑖𝑗

2. Likelihood-Ratio Statistic
𝑛𝑖𝑗
𝐺 2 = 2 ∑ 𝑛𝑖𝑗 log( )
𝜇𝑖𝑗

495 498
𝐺 2= 2 [495 ∗ 𝑙𝑛 ( ) + … … … + 498 ∗ ln (485.4) ] =12.6
456.9
Azadeh Alimadad. DANA 4820 Jan 17 – 24, 2022

Three-Way Contingency Tables:

Simpson’s paradox:

You might also like