0% found this document useful (0 votes)
45 views42 pages

Biostat - Group 3

This document provides information on categorical data analysis techniques used in biostatistics and epidemiology. It discusses categorical variables and how they are summarized using probability tables. It describes different types of categorical data like nominal and ordinal data. Contingency tables are presented as a way to understand relationships between categorical variables. The Cochran-Mantel-Haenszel test, kappa statistics, and goodness of fit tests are introduced as statistical methods for analyzing categorical data while accounting for potential confounding variables.

Uploaded by

Jasmin Jimenez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views42 pages

Biostat - Group 3

This document provides information on categorical data analysis techniques used in biostatistics and epidemiology. It discusses categorical variables and how they are summarized using probability tables. It describes different types of categorical data like nominal and ordinal data. Contingency tables are presented as a way to understand relationships between categorical variables. The Cochran-Mantel-Haenszel test, kappa statistics, and goodness of fit tests are introduced as statistical methods for analyzing categorical data while accounting for potential confounding variables.

Uploaded by

Jasmin Jimenez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

BIOSTATISTICS

AND
EPIDEMIOLOGY
Presented by GROUP 3
Categorical Data
Analysis
is statistical data made up of categorical variables of data that
have been converted into categories.

One of the examples is grouped data. More specifically,


categorical data could be derived from countable qualitative
data analysis or from quantitative data analysis grouped
within given intervals. These data are summarized in the
form of a probability table.
Categorical Data
Analysis
The categorical data consists of categorical variables which represent
the characteristics such as a person’s gender, hometown etc.
Sometimes categorical data can take numerical values, but those
numbers do not have mathematical meaning. Some of the examples of
the categorical data are as follows:
Birthdate
Favourite sport
School Postcode
Travel method to school etc.
Categorical Data
Analysis
Types of Categorical Data
Nominal data is a type of data that is used to label the variables
without providing any numerical value. It is also known as the nominal
scale.

Ordinal data is a type of data that follows a natural order. The


notable features of ordinal data are that the difference between data
values cannot be determined.
Contingency Table
It displays frequencies for combinations of two
categorical variables. It is also known as “Cross
tabulation” and “two-way tables.”
Classify outcomes for one variable in rows and other in
columns.
Use contingency tables to understand the relationship
between categorical variables.
Contingency Table
Example:
Contingency Table
Relative contingency table:
FORMULA:
Count value in cell X x100
Total number surveyed

Cell 1: (72/203) x 100 = 35.47% (male smoker)


Cell 2: (44/203) x 100 = 21.67% (male non-smoker)
Cell 3: (34/203) x 100 = 16.75% (female smoker)
Cell 4: (53/203) x 100 = 26.11% (female non-smoker)
Contingency Table
JOINT DISTRIBUTION

MARGINAL DISTRIBUTION

CONDITIONAL DISTRIBUTION
Cohran Mantel Haenszel Test
It's a test used to examine matched or stratified categorical data. It enables an
investigator to examine, while accounting for stratification, the relationship
between a binary predictor or therapy and a binary outcome such as case- or
control-status.
It is often used in observational studies where random assignment of subjects to
different treatments cannot be controlled, but confounding covariates can be
measured.
The Cochran–Mantel–Haenszel test is also known as the Mantel-Haenszel
test or the Mantel-Haenszel chi-square test. Researchers use this statistical
procedure to evaluate the association between two categorical
variables while controlling for the effects of a third categorical
variable.
Cohran Mantel Haenszel Test
The Cochran-Mantel-Haenszel test also produces an estimate of the common
odds ratio, a way of summarizing how big the effect is when pooled across the
different repeats of the experiment. This require assuming that the odds ratio is
the same in the different repeats.
The Cochran-Mantel-Haenszel test's capacity to account for the effects of
confounding variables is one of its main advantages. Confounding can happen
when there is a third variable present since it can alter the relationship
between two variables. The test can take into account its
effects and give a more precise estimate of the connection
between the two variables of interest by stratifying the data
by the confounding variable.
Cohran Mantel Haenszel Test
We consider a binary outcome variable such as case status and
a binary predictor such as treatment status. The observations
are grouped in strata. The stratified data are summarized in a
series of 2 × 2 contingency tables, one for each stratum.
Cohran Mantel Haenszel Test
Using the notation in
this table estimates for
a risk ratio or an odds
ratio would be
computed as follows

To explore and adjust for confounding, we can use a stratified


analysis in which we set up a series of two-by-two tables, one for
each stratum (category) of the confounding variable. Having done
that, we can compute a weighted average of the estimates of the
risk ratios or odds ratios across the strata. The weighted average
provides a measure of association that is adjusted for
confounding. The weighted averages for risk ratios and odds ratios
are computed as follows:
Cohran Mantel Haenszel Test

Where ai, bi, ci and di are the numbers of participants in


the cells of the two-by-two table in the ith stratum of
the confounding variable. "ni" represents the number of
participants in the ith stratum.
Cohran Mantel Haenszel Test

To assess the relationship between two categorical variables while


accounting for the impact of a third categorical variable, the Cochran-
Mantel-Haenszel test is a helpful statistical technique. Because it is
resistant to deviation from the premise of normality, it is frequently used
in many domains. Multi-way contingency tables can be handled well as
well. Prior to using the test, it is crucial to confirm that the
observations are independent.
KAPPA STATISTICS
Cohen’s kappa statistic measures interrater reliability
(interobserver agreement).
Interrater reliability, or precision, happens when your data
raters (or collectors) give the same score to the same data
item.

This statistic should only be calculated when:

Two raters each rate one trial on each sample, or


One rater rates two trials on each sample.
The Kappa statistic varies from 0 to 1, where.

0 = agreement equivalent to chance.


0.1 – 0.20 = slight agreement.
0.21 – 0.40 = fair agreement.
0.41 – 0.60 = moderate agreement.
0.61 – 0.80 = substantial agreement.
0.81 – 0.99 = near perfect agreement
1 = perfect agreement.
Formula

Where:
Po = the relative observed agreement among raters.
Pe = the hypothetical probability of chance agreement
Example
The following hypothetical data comes from a medical test
where two radiographers rated 50 images for needing further
study. The researchers (A and B) either said Yes (for further
study) or No (No further study needed).

20 images were rated Yes by both.


15 images were rated No by both.
Overall, Rater A said Yes to 25 images and No to 25. On the other
hand, Rater B said Yes to 30 images and No to 20.
Example
Step 1: Calculate Po (the observed proportional agreement): Po
= number in agreement / total = (20 + 15) / 50 = 0.70

Step 2: Rater A said Yes to 25/50 images, or 50%(0.5). Rater B


said Yes to 30/50 images, or 60%(0.6). The total probability of
the raters both saying Yes randomly is: 0.5 * 0.6 = 0.30.
Step 3:

Rater A said No to 25/50 images, or 50%(0.5).


Rater B said No to 20/50 images, or 40%(0.4).
The total probability of the raters both saying No randomly is: 0.5 * 0.4 = 0.20.

Step 4: Calculate Pe. Pe = 0.30 + 0.20 = 0.50.

Step 5:

k = (Po – Pe) / (1 – Pe ) = (0.70 – 0.50) / (1 – 0.50) = 0.40.”


k = 0.40, which indicates agreement.
Goodness of fit test
The goodness of fit test tells if your sample data represents the
data you would expect to find in the actual population. More
specifically, it is used to test if sample data fits a distribution
from a certain population (i.e. a population with a normal
distribution or one with a Weibull distribution).
Principles
It is used to find out how the observed value
of a given phenomena is significantly different
from the expected value.

The statistical models that are analyzed by


chi-square goodness of fit tests are
distributions.

The chi-square goodness of fit test is a hypothesis


test. It allows you to draw conclusions about the
distribution of a population based on a sample.
Function of test
1.
It is a statistical hypothesis test used to see how
closely observed data mirrors expected data.

2.
Can help determine if a sample follows a normal
distribution, if categorical variables are related, or if
random samples are from the same distribution.
Type of variable

A chi-square (Χ²) goodness of fit test is a


goodness of fit test for a categorical
variable.
Level of measurement
Categorical variables that
have discrete categories or
levels such as nominal,
dichotomous, or ordinal.
Type of study design

Qualitative study designs


Type of objective
A chi-square goodness-of-fit test can be conducted when
there is one categorical variable with more than two levels.
If there are exactly two categories, then a one proportion z
test may be conducted. The levels of that categorical
variable must be mutually exclusive. In other words, each
case must fit into one and only one category.
Formula
Example
We collect a random sample of ten bags of candies . Each bag
has 100 pieces of candy and five flavors. Our hypothesis is that
the proportions of the five flavors in each bag are the same.
Example
Let’s start by answering: Is the Chi-square goodness of fit test an appropriate method to
evaluate the distribution of flavors in bags of candy?
We have a simple random sample of 10 bags of candy. We meet this requirement.
Our categorical variable is the flavors of candy. We have the count of each flavor in
10 bags of candy. We meet this requirement.
Each bag has 100 pieces of candy. Each bag has five flavors of candy. We expect to
have equal numbers for each flavor. This means we expect 100 / 5 = 20 pieces of
candy in each flavor from each bag. For 10 bags in our sample, we expect 10 x 20 =
200 pieces of candy in each flavor. This is more than the requirement of five expected
values in each category.
Number of Pieces of Expected Number of
Flavour
Candy (10 bags) Pieces of Candy

Apple 180 200

Lime 250 200

Cherry 120 200

Grape 225 200

Orange 225 200


Number of Expected
Pieces of Number of Observed-
Flavour
Candy (10 Pieces of Expected
bags) Candy

Apple 180 200 180- 200= -20

Lime 250 200 250-200= 50

Cherry 120 200 120-200= -80

Grape 225 200 225-200= 25

Orange 225 200 225-200= 25


Number of Expected
Pieces of Number of Observed- squared
Flavour
Candy (10 Pieces of Expected difference
bags) Candy

180- 200=
Apple 180 200 400
-20

Lime 250 200 250-200= 50 2500

120-200=
Cherry 120 200 6400
-80

Grape 225 200 225-200= 25 625

Orange 225 200 225-200= 25 625


Number of Expected
squared
Pieces of Number of Observed- squared
Flavour difference/
Candy (10 Pieces of Expected difference
expected number
bags) Candy

180- 200=
Apple 180 200 400 400 / 200 = 2
-20

Lime 250 200 250-200= 50 2500 2500 / 200 = 12.5

Cherry 120 200 120-200= -80 6400 6400 / 200 = 32

Grape 225 200 225-200= 25 625 625 / 200 = 3.125

Orange 225 200 225-200= 25 625 625 / 200 = 3.125


Finally, we add the numbers in the final column to
calculate our test statistic:
2+12.5+32+3.125+3.125=52.75
To draw a conclusion, we compare the test statistic to a critical value from the Chi-Square
distribution. This activity involves four steps:

We compare the value of our test statistic (52.75) to the Chi-square value. Since 52.75 > 9.488,
we reject the null hypothesis that the proportions of flavors of candy are equal.
McNemar Test
Principles
It is used to analyze pretest-posttest study designs, as well as being
commonly employed in analyzing matched pairs and case-control studies.
McNemar's test has three assumptions that must be met:
1. Assumption 1: You have one categorical dependent variable with two
categories (i.e. a dichotomous variable) and one categorical independent
variable with two related groups
2. Assumption 2: The two groups of your dependent
variable must be mutually exclusive
3. Assumption 3: The cases (e.g. participants) are a
random sample from the population of interest.
McNemar Test
Function of test

It is used to determine if there are differences on a dichotomous


dependent variable between two related groups.
The McNemar test is a non-parametric test for paired nominal data. It is
used when you are interested in finding a change in proportion for the
paired data.
Types of variable

Nominal variable with two categories


One independent variable with two connected groups
McNemar Test
Level of measurement

It is the only test that can be used when one or both conditions being
studied are measured using nominal scale

Type of study design


Retrospective Study design (it is used to analyze pretest-posttest study
design)
McNemar Test
Type of objective
The McNemar test is used to determine if there are differences on a
dichotomous dependent variable between two related groups.
Example

Evaluate the effect of a connective tissue graft (CTG) in comparison to a guided tissue
regeneration (GTR) procedure in the treatment of gingival recession. This clinical trial
was formed by matched pairs in which one member of each matched pair is randomly
assigned to CTG and the other member to GTR. The patients are matched on age, sex,
oral hygiene standards, gingival health, probing depth, and other prognostic attributes.
THANK YOU!
MEMBERS:
Bautista, Timhry Erin Dae B.
Estrada, Christalle Claire
Jimenez, Jasmin M.
Landrito,Riva Ysabella
Pascual,Maria Victoria P.
Pepito, Angeljoy Y.
Yatco, Berlyne

You might also like