0% found this document useful (0 votes)

155 views75 pages

Analysis of Categorical Data

This document discusses the analysis of categorical data. It defines categorical data as data that classifies observations into categories. Some common methods for analyzing categorical data discussed include goodness-of-fit tests, contingency tables, and odds ratios. The chi-square test and Fisher's exact test are presented as methods for analyzing contingency tables. Examples are provided to demonstrate how to perform chi-square tests and calculate odds ratios.

Uploaded by

Malik Shabbir Ali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

155 views75 pages

Analysis of Categorical Data

Uploaded by

Malik Shabbir Ali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 75

.

STAT-7213
BIO-STATISTICS

M.PHIL STATISTICS
YEAR-1 (SEMESTER-II)

Submitted to:

Dr. Jamal Abdul Nasir

PRESENTATION
PRESENTED BY

MUBEEN ASGHAR (0557)

LAIBA SUBHANI (0559)
NOOR-E-AMNA (0561)
RABIA SAIF (0563)

3
TOPIC

ANALYSIS OF
CATEGORICAL
.

DATA
Categorical data is data that classifies an observation as belonging
to one or more categories. For example, an item might be judged as
good or bad, or a response to a survey might includes categories
such as agree, disagree, or no opinion
CATEGORICAL DATA
5
ANALYSIS OF CATEGORICAL DATA

Categorical data analysis is the analysis of data where the

response variable has been grouped into a set of mutually exclusive
ordered (such as age group) or unordered (such as eye color)
categories.

6
BASIC ANALYSIS

 The Goodness-of-Fit Test

 The 2 by 2 Contingency Table
 The r by c Contingency Table
 Multiple 2 by 2 Contingency Tables

7
GOODNESS-OF-FIT TEST

The goodness-of-fit test is a statistical hypothesis test to see how

well sample data fit a distribution from a population with a normal
distribution.

8
USES OF GOF TEST

• Goodness-of-fit tests are statistical methods often used to make

inferences about observed values.
• These tests determine how related actual values are to the predicted
values in a model, and when used in decision-making, goodness-of-
fit tests can help predict future trends and patterns.
• Goodness-of-fit tests are commonly used to test for the normality of
residuals or to determine whether two samples are gathered from
identical distributions.

9
METHODS OF GOODNESS-OF-FIT TEST

There are multiple methods for determining goodness-of-fit. Some

of the most popular methods used in statistics include the
• Chi-square
• The Kolmogorov-Smirnov test
• The Anderson-Darling test
• The Shipiro-Wilk test

10
KEY POINTS

• Goodness-of-fit tests are statistical tests aiming to determine whether a set of

observed values match those expected under the applicable model.
• There are multiple types of goodness-of-fit tests, but the most common is the
chi-square test.
• Chi-square determines if a relationship exists between categorical data.
• The Kolmogorov-Smirnov test used for large samples determines whether a
sample comes from a specific distribution of a population.
• Goodness-of-fit tests can show you whether your sample data fit an expected set
of data from a population with normal distribution.

11
CHI-SQUARE TEST

The chi-square independence test is a procedure for

testing, if two categorical variables are related in some
population.

12
TEST STATISTIC

Where, O is an observed frequency and E is an estimated expected frequency.

13
DEGREES OF FREEDOM

The degrees of freedom is basically a

number that determines the exact
shape of our distribution. The figure
illustrates this point.
degrees of freedom -or df- are calculated as
df = (r-1)*(c-1)

14
PROCEDURE
1. State Null and Alternative Hypothesis.

2. Level of Significance.

3. Test Statistic.
2 =
4. Computation.
5. Critical Region:
reject

6. Conclusion.
If reject
15
GOVERNMENT COLLEGE UNIVERSITY
EXAMPLE

Popularity of psychology professors who enrolled students in college at 0.05

significance level test the random enrolment of students.

16
GOVERNMENT COLLEGE UNIVERSITY
SOLUTION

• State Null and Alternative Hypothesis.

• Level of Significance.

• Test Statistic.
2 =
17
GOVERNMENT COLLEGE UNIVERSITY
PROCEDURE

• Computation.

• Critical Region:

• Conclusion.
As so we reject . And conclude that students do not enroll at random. 18
GOVERNMENT COLLEGE UNIVERSITY
CONTINGENCY TABLE

A contingency table (also known as a cross tabulation or crosstab)

is a type of table in a matrix format that displays the
(multivariate) frequency distribution of the variables.

19
GOVERNMENT COLLEGE UNIVERSITY
TYPES CONTINGENCY TABLE

• 2×2 Contingency table

• r × c Contingency table
• Multiple 2×2 Contingency table

20
GOVERNMENT COLLEGE UNIVERSITY
2×2 CONTINGENCY TABLE

The two by two or fourfold contingency

table represents two classifications of a set of counts or frequencies.
The rows represent two classifications of one variable (e.g.,
outcome positive/outcome negative) and the columns
represent two classifications of another variable (e.g.,
intervention/no intervention).

21
TEST STATISTIC

where, for r rows and c columns of n observations, O is an observed frequency and E

is an estimated expected frequency.

E=
22
FISHER EXACT TEST

Fisher's exact test is a statistical significance test used in the

analysis of contingency tables.
Although in practice it is employed when sample sizes are small.

23
CRITERIA FOR FISHER EXACT TEST

 Both variables are dichotomous qualitative (2 cross 2 table).

 When the overall total of the table (sample size) is 30.
 When anyone expected cell value is less than 5.

24
ASSUMPTIONS

 Data consist of two population. A sample observation from

population 1 and B sample observation from population 2.
 The samples are random and independent.
 Each observation can be categorized as one of two mutually
exclusive type.

25
26
27
ODD RATIO

28
Difference Between ODDS AND ODDS RATIO

DEF: The odds for success are the ratio Odds ratio that we may compute from the
of the probability of success to the data of a retrospective study.
probability of failure. We use symbol OR to indicate that the
measure is computed from sample data and
The odds of being a case(having used as an estimate of population odds ratio
disease) to being a control(not having
disease) among subjects with risk factor
is [a/(a+b)]/[b/(a+b)]=a/b
The odds of being a case(having
disease) to being a control(not having
disease) among subjects without risk
factor is [c/(c+d)]/[d/(c+d)]=c/d

29
PROPERTIES

 Equal to any non-negative number

 The odds of success are higher in row 1 as compared to row 2 when OR>1
 When one cell has zero probability, OR equals 0 or ∞

30
INTERPRETATION

A value of 1 indicates no association between the risk factor and disease status.
A value less than 1 indicates reduced odds of the disease among subjects with
the risk factor.
A value greater than 1 indicates increased odds of having the disease among
subjects in whom the risk factor is present

31
EXAMPLE

To compute the odds of receiving a death penalty for each groups

32
The odds of death sentence if the defendant was blacks= 28/45=0.6222

The odds of death sentence if the defendant was non-black=22/52=0.4231

The impact of being black on receiving the death penalty is measured by the odds ratio. Such
as ;

INTERPRET
The odds of death sentence for black is 47% higher for blacks as compared to
non-blacks

33
YATE’S METHOD

Cochran suggests that chi square test should not be used if n is small and
expected frequency less than 5.
Yates (1934) proposed a procedure for correcting in case of 2*2 table, That is,

34
CRITERIA FOR YATE’S CORRECTION

 Both variables are dichotomous qualitative (2 cross 2 table).

 When the overall total of the table (sample size) is 30.
 When anyone expected cell value is less than 5.

35
36
37
38
MATCHED-PAIR STUDIES

A matched pairs design is an experimental design that is used when

an experiment only has two treatment conditions. The subjects in
the experiment are grouped together into pairs based on some
variable they “match” on, such as age or gender. Then, within each
pair, subjects are randomly assigned to different treatments.

39
EXAMPLE

Pairs with the same exposure status for both case and control the diagonal cells
are called concordant pairs (c1and c2), and pairs with different exposures the off-
diagonal cells are called discordant pairs (d1 and d2).
40
EXAMPLE

Let be the probability that a discordant pair has an exposed case. Then, from the
preceding table, can be estimated by the following proportion,

41
HYPOTHESIS

Under the null hypothesis of no association between the risk factor and the
disease, each discordant pair is just as likely to have a case exposed as to have a
control
exposed. Thus, the null hypothesis can be written as

42
APPROXIMATION

For
large samples, we can use the normal approximation.

43
44
45
R × C CONTINGENCY TABLE

We now consider the more general situation where two

classification variables have more than two categories. First, we
consider the situation where both variables are nominal followed by
the situation when one of the variables is ordinal.

46
R × C CONTINGENCY TABLE

Testing Hypothesis of No Association

The same ideas used in the 2 by 2 table still apply to the r by c
contingency table. If there is no association between a row variable
and a column variable, the ratio of the expected cell frequency in
the ith row and jth column, mij, to the ith row total, ni⋅, should
equal the ratio of the jth column total, n⋅j, to the overall total.

47
R × C CONTINGENCY TABLE

There are (r - 1)(c - 1) degrees of freedom for the r by c table because once we
know the frequencies of any (r - 1)(c - 1) cells, we can find the values of the
other frequencies by subtraction from the row and column totals. The hypothesis
of no association between the row and column variables is tested using the chi-
square goodness-of-fi t statistic. Most statisticians perform no adjustment to the
test statistic when used with tables other than the 2 by 2 table. If the test statistic
is greater than the value of , we reject the hypothesis of no association in favor
of the alternative that the row and column variables are related. If the test statistic
is less than we fail to reject the null hypothesis.

48
49
50
MULTIPLE 2×2 CONTINGENCY
TABLE

Here, we gonna focus on the relationship between 2 factors in the

presence of a third factor. We examined the relationship between 2
categorical variables (factors).

51
EXAMPLE

For example, we might be interested in the relationship between

smoking and lung cancer, and how this relationship may change
with gender (a third factor). We observe that the apparent
(combining) relationship between 2 factors may switch or change
its direction and magnitude depending on third factor.

52
EXAMINE THE RELATIONSHIP

We will test for such a dependency, and, if we don’t

seem to find one, we will analyze the aggregated data; if we do find
such a dependency, then it is appropriate to examine the relationship
of the 2 factors of interest separately for each of the levels of the
third factor (don’t aggregate).
We will focus on 2 factors each with 2 levels, including a third
factor with possibly several (g) levels; thus, we will be working
with multiple 2x2 contingency tables.

53
A study to determine if there is any association between the occurrence of upper respiratory infections (URI) of young children and outdoor
air pollution. There are several variables that could affect the relationship between the occurrence of infections and outdoor air pollution.
(I.E, dust, traffic, smoke etc) hypothetical data for this situation are based on an article by jaakkola et al. (1991) and are shown in table

54
EXAMPLE
55
EXAMPLE
56
EXAMPLE
57
SOLUTION

One way of taking the passive smoke variable into account is to analyze each 2 by 2 table
separately. Then we have two tables i.e, one who smoked and other who don’t smoked
Table.1

Passive smoke City polluted URI URI total

in the home some none

yes high 100 20 120

yes low 124 40 164
total 224 60 284
58
SOLUTION

Calculations:
By using the chi-square and odd ratio formula, we have the XYC -square statistic is 2.039
and its p-value is 0.1533 for homes in which someone smoked. The odds ratio for this data
is 1.613. The 95 percent confidence intervals for the odds ratios is from 0.887 to 2.933

Table 2.

Passive smoke City polluted URI URI total

in the home some none
NO high 128 62 190
NO low 166 119 285
total 294 181 475

59
SOLUTION

Table 2.

Passive smoke City polluted URI URI total

in the home some none
NO high 128 62 190
NO low 166 119 285
total 294 181 475

60
SOLUTION
Calculations
The XYC -square value is 3.645, and its p-value is 0.0562 for those without passive smoke
in the home. The odds ratio for this data is 1.480. The 95 percent confidence intervals for
the odds ratios is from 1.007 to 2.171
Interpretation
The first confidence interval, a much wider interval than the second interval, includes the
value of one that suggests that there is no relation between the two variables. The second
interval barely misses including one. The second interval’s smaller size reflects the larger
sample size associated with the
home in which there was no passive smoke. Neither of these tables has a statistically
significant association between the outdoor air pollution and the occurrence of URI at the
0.05 level based on the test statistics. The conclusion from the analyses of the separate
tables is different from that of the combined table.
A problem with the use of the separate tables is that the analyses are based on the smaller
sample sizes associated with each sub-table, not on the sample size of the combined table.
This makes it diffificult to find the presence of small but consistent trends across tables.
61
COCHRAN MENTAL HAENSZEL TEST

Two bio statisticians, Nathan Mantel and William Haenszel, developed a method in 1959
for examining the relation between two categorical variables while controlling for another
categorical variable (Mantel and Haenszel 1959).
This method, like a method published by William Cochran in 1954, uses all the data in the
combined table and produces one overall test statistic. The test is designed to detect the
consistent effect of the independent variable on the dependent variable across the levels of
the extraneous variable.
Thus, this method should only be used when the estimated odds ratios in the Sub-tables
are similar to one another. One very attractive feature of this test is that it can be used with
extremely small sample sizes.

62
PROPERTIES

 For large samples, when H0 is true, CMH has chi-squared distribution with df = 1.

 If all θ(AB(k))=1, then CMH is close to zero
 If some or all θ(AB(k))>1, then CMH is large
 If some or all θ(AB(k))<1, then CMH is large
 If some θ(AB(k))<1 and others θ(AB(k))>1, then CMH is NOT an appropriate test;
that is, the test works well if the conditional odds ratios are in the same direction and
comparable in size.
This test has also been generalized for application to three-way tables of size other than 2
by 2 by k (Landis, Heyman, and Koch 1978)

63
WHEN TO USE

Use the Cochran–Mantel–Haenszel test (which is sometimes called the Mantel–

Haenszel test) for repeated tests of independence. The most common situation is
that you have multiple 2×2 tables of independence; we're analyzing the kind of
experiment that we had to analyze with a test of independence, and we have done
the experiment multiple times or at multiple locations. There are three nominal
variables: the two variables of the 2×2 test of independence, and the third
nominal variable that identifies the repeats (such as different times, different
locations, or different studies).

64
CMH

We have one Z* test statistics, but we are dealing with discrete variables, we should use
the continuity correction with Z*. However, instead of using the continuity-corrected
Z* statistic, we would prefer to use a chi-square statistic, since all the other tests
associated with contingency tables use a chi-square statistic. This poses no problem, since
the square of a standard normal variable follows a chi-square distribution with one degree
of freedom. Thus, the statistic to be used to test the hypothesis of no association between
air pollution and the occurrence of upper respiratory problems is the Cochran-Mantel-
Haenszel chi-square statistic.
65
CMH
. Also called the Mantel-Haenszel statistic, it is defined by

where Oi and Ei are the observed and expected values in the (1,1) cell in the ith
sub-table.
In terms of the entries in the ith table, Ei is defined as,

66
VARIANCE

Vi, with a variance of Oi minus Ei, can be as,

In XCMH-square O, E, and V are defined as the sums of the Oi, the Ei and the Vi
over the k subtables. If XCMH-square is greater than chi-square table value, we
reject the hypothesis of no association between air
pollution and the occurrence of upper respiratory infections. Otherwise we fail to
reject the null hypothesis.

67
EXAMPLE
68
EXAMPLE
69
EXAMPLE
70
MENTAL HEANSZEL COMMON ODD RATIOS

Mantel and Haenszel also showed how to combine the data from the separate sub tables to
form a common odds ratio for the data. Again, this should only be done when the
estimated odds ratios in the sub tables are similar. If the estimated odds ratios for the sub
tables are not similar — for example, some are less than one and some are greater than one
— the common odds ratio would not be very useful. The relation between the independent
and dependent variable would depend on the level of the extraneous variable, and the use
of a common odds ratio would mask this. The Mantel-Haenszel
estimator of the common odds ratio, θ is,

71
DISADVANTAGES

• There is a limit to the kind of statistical analysis that can be performed on

categorical data.
• The options in categorical data do not have a standardized interval scale.
Therefore, respondents are not able to effectively gauge their options before
responding.
• Quantitative analysis cannot be performed on categorical data. Therefore,
numerical or arithmetic operations can not be performed.

72
REFERENCES
• https://www3.nd.edu/~rwilliam/stats1/x51.pdf
• https://www.investopedia.com/terms/g/goodness-of-fit.asp
• https://www.statsdirect.com/help/chi_square_tests/22.htm
• https://www.statsdirect.com/help/chi_square_tests/22.htm
• https://onlinestatbook.com/2/chi_square/contingency.html
• https://www2.stat.duke.edu/courses/Spring02/sta102/chap16.pdf

73
RECOMMENDATION

• https://ncss-wpengine.netdna-ssl.com/wp-
content/themes/ncss/pdf/Procedures/NCSS/Contingency_Tables-Crosstabs-
Chi-Square_Test.pdf

74
THE END

Directory of Exporters
100% (2)
Directory of Exporters
230 pages
Chapter 8-10 Contigency Table, Correlation and Regression
No ratings yet
Chapter 8-10 Contigency Table, Correlation and Regression
91 pages
Basic Biostatistics - Wakgari Module 17-21
No ratings yet
Basic Biostatistics - Wakgari Module 17-21
82 pages
statistics-in-data-science
No ratings yet
statistics-in-data-science
100 pages
Lecture 3_Measuresof Assocn
No ratings yet
Lecture 3_Measuresof Assocn
55 pages
Lecture 17- Ch10- ChiSquare Test
No ratings yet
Lecture 17- Ch10- ChiSquare Test
35 pages
Tom a. B. Snijders - Multilevel Analysis_ an Introduction to Basic and Advanced Multilevel Modeling (2011)-1
No ratings yet
Tom a. B. Snijders - Multilevel Analysis_ an Introduction to Basic and Advanced Multilevel Modeling (2011)-1
521 pages
Get (eBook PDF) Engineering Economic Analysis 13th by Donald G. Newnan free all chapters
80% (5)
Get (eBook PDF) Engineering Economic Analysis 13th by Donald G. Newnan free all chapters
56 pages
L3 Categorical Data Analysis
No ratings yet
L3 Categorical Data Analysis
25 pages
Probability Distributions
100% (1)
Probability Distributions
248 pages
RYAN, THOMAS P. - [Wiley Series in Probability and Statistics] Modern Regression Methods __ (2
No ratings yet
RYAN, THOMAS P. - [Wiley Series in Probability and Statistics] Modern Regression Methods __ (2
658 pages
Foreign Currency Valuation Configuration
0% (1)
Foreign Currency Valuation Configuration
9 pages
10measures of Association
No ratings yet
10measures of Association
249 pages
Environmental and Ecological Statistics With R, Second Edition (Song S. Qian)
No ratings yet
Environmental and Ecological Statistics With R, Second Edition (Song S. Qian)
560 pages
Handbook Statistical Foundations of Machine Learning
No ratings yet
Handbook Statistical Foundations of Machine Learning
267 pages
Statistical Methods For Stochastic Differential Equations: Monographs On Statistics and Applied Probability 124
No ratings yet
Statistical Methods For Stochastic Differential Equations: Monographs On Statistics and Applied Probability 124
498 pages
Ingmar Visser, Maarten Speekenbrink - Mixture and Hidden Markov Models With R (Use R!) - Springer (2022)
No ratings yet
Ingmar Visser, Maarten Speekenbrink - Mixture and Hidden Markov Models With R (Use R!) - Springer (2022)
277 pages
Week 1 Analytics in Practice
100% (2)
Week 1 Analytics in Practice
12 pages
Advanced Strategies For Metabolomic Data Analysis
100% (1)
Advanced Strategies For Metabolomic Data Analysis
31 pages
Statistics I
100% (2)
Statistics I
686 pages
All R Packages
No ratings yet
All R Packages
179 pages
Categorical Data Analysis With Graphics
No ratings yet
Categorical Data Analysis With Graphics
104 pages
Fundamental of Financial Mathematics
No ratings yet
Fundamental of Financial Mathematics
4 pages
(Springer Series in Statistics) R.-D. Reiss (Auth.) - Approximate Distributions of Order Statistics - With Applications To Nonparametric Statistics-Springer-Verlag New York (1989) PDF
100% (2)
(Springer Series in Statistics) R.-D. Reiss (Auth.) - Approximate Distributions of Order Statistics - With Applications To Nonparametric Statistics-Springer-Verlag New York (1989) PDF
362 pages
Chi - Square Test: PG Students: DR Amit Gujarathi DR Naresh Gill
No ratings yet
Chi - Square Test: PG Students: DR Amit Gujarathi DR Naresh Gill
32 pages
Sampling Theory and Method-301-500
No ratings yet
Sampling Theory and Method-301-500
200 pages
Bayesian Statistics Primer PDF
No ratings yet
Bayesian Statistics Primer PDF
23 pages
GWSB Graduate Resume Template: Finance - Basic Format: Education
No ratings yet
GWSB Graduate Resume Template: Finance - Basic Format: Education
1 page
Stan Reference 2.7.0
No ratings yet
Stan Reference 2.7.0
534 pages
Sequential Analysis Hypothesis Testing and Changepoint Detection ( Etc.) (Z-Library)
No ratings yet
Sequential Analysis Hypothesis Testing and Changepoint Detection ( Etc.) (Z-Library)
600 pages
Rom31 99 PART I
100% (1)
Rom31 99 PART I
30 pages
Crossing of Cheques
100% (2)
Crossing of Cheques
14 pages
Advanced Statistical Computing PDF
No ratings yet
Advanced Statistical Computing PDF
329 pages
Biostatistics Concepts and Applications For Biologists
No ratings yet
Biostatistics Concepts and Applications For Biologists
210 pages
Valedictory+Speech+Transcript+JESSEL+PINEDA+
No ratings yet
Valedictory+Speech+Transcript+JESSEL+PINEDA+
6 pages
Rstudio Cheat Sheet: Console
No ratings yet
Rstudio Cheat Sheet: Console
3 pages
R Manual To Agresti's Categorical Data Analysis
100% (1)
R Manual To Agresti's Categorical Data Analysis
280 pages
Mar 13 Lae 08
No ratings yet
Mar 13 Lae 08
656 pages
NEC Fiber Art 770
100% (1)
NEC Fiber Art 770
4 pages
Statistical Foundations - Intro 64zlf
100% (2)
Statistical Foundations - Intro 64zlf
86 pages
Innovations in Classification Data Science Daniel Baier, Klaus Dieter
No ratings yet
Innovations in Classification Data Science Daniel Baier, Klaus Dieter
620 pages
Gaucher Friesen Kay 2011
No ratings yet
Gaucher Friesen Kay 2011
20 pages
Diggle 2013 Statistical Analysis of Spatial and
No ratings yet
Diggle 2013 Statistical Analysis of Spatial and
69 pages
Np4 np5
No ratings yet
Np4 np5
71 pages
Bio Statistics
No ratings yet
Bio Statistics
174 pages
Bayesian Lecture Notes
No ratings yet
Bayesian Lecture Notes
28 pages
BiodiversityR PDF
No ratings yet
BiodiversityR PDF
128 pages
ShipmentLink - Sailing Schedules (Search by Point)
No ratings yet
ShipmentLink - Sailing Schedules (Search by Point)
1 page
Statistical Computing by Using R
100% (1)
Statistical Computing by Using R
11 pages
Categorical Data Analysis
No ratings yet
Categorical Data Analysis
11 pages
SAP Success Factors
No ratings yet
SAP Success Factors
84 pages
210703165404suraksha Smart City Phase 1 - Booklet
No ratings yet
210703165404suraksha Smart City Phase 1 - Booklet
9 pages
Cyberbulling On Social Media Under The Influence
No ratings yet
Cyberbulling On Social Media Under The Influence
13 pages
Estimation and Hypothesis
100% (1)
Estimation and Hypothesis
32 pages
Statistical Modeling
No ratings yet
Statistical Modeling
22 pages
Seefeld-Statistics Using R With Biological Examples PDF
No ratings yet
Seefeld-Statistics Using R With Biological Examples PDF
325 pages
Binomial Distribution
No ratings yet
Binomial Distribution
16 pages
Neuralink: Neuralink Corporation Is A Neurotechnology Company Founded by
No ratings yet
Neuralink: Neuralink Corporation Is A Neurotechnology Company Founded by
8 pages
(GAM) Application PDF
No ratings yet
(GAM) Application PDF
30 pages
Keyboard Shortcuts RStudio
No ratings yet
Keyboard Shortcuts RStudio
6 pages
Applied Multivariate Statistical Analysis Solution Manual PDF
No ratings yet
Applied Multivariate Statistical Analysis Solution Manual PDF
18 pages
UAE Corporate Tax Registration
No ratings yet
UAE Corporate Tax Registration
40 pages
Prophet R
No ratings yet
Prophet R
18 pages
Longitudinal PDF
No ratings yet
Longitudinal PDF
664 pages
Mathematical Statistics With Applications Solution Manual
No ratings yet
Mathematical Statistics With Applications Solution Manual
5 pages
Class 12TH Hy Syllabus
No ratings yet
Class 12TH Hy Syllabus
3 pages
Physical and Chemical Property Worksheet
No ratings yet
Physical and Chemical Property Worksheet
2 pages
Iron Age
No ratings yet
Iron Age
24 pages
APPLICation of Solving Polynomial Equation
No ratings yet
APPLICation of Solving Polynomial Equation
13 pages
Show What You Know
No ratings yet
Show What You Know
6 pages
Moments of Truth: Toastmasters
No ratings yet
Moments of Truth: Toastmasters
17 pages
Uttar Pradesh Secondary Education Service Selection Board, Prayagraj
No ratings yet
Uttar Pradesh Secondary Education Service Selection Board, Prayagraj
3 pages
Dependent Prepositions by Simple Language
No ratings yet
Dependent Prepositions by Simple Language
6 pages
R Packages For Machine Learning
No ratings yet
R Packages For Machine Learning
3 pages
Optical Determination of Velocity of Sound in Liquids
No ratings yet
Optical Determination of Velocity of Sound in Liquids
4 pages
Frequency Distribution For Categorical Data
No ratings yet
Frequency Distribution For Categorical Data
6 pages
App.A - Detection and Estimation in Additive Gaussian Noise PDF
No ratings yet
App.A - Detection and Estimation in Additive Gaussian Noise PDF
55 pages
Survival Plots SURVMINER Package Tutorial
No ratings yet
Survival Plots SURVMINER Package Tutorial
5 pages
Environmental Quality (Prescribed Activities) (Environmental Impact Assessment) Order 1987
No ratings yet
Environmental Quality (Prescribed Activities) (Environmental Impact Assessment) Order 1987
9 pages
Mango
No ratings yet
Mango
14 pages
K-Means Clustering Algorithm
No ratings yet
K-Means Clustering Algorithm
13 pages
Data Sheet: Specifications
No ratings yet
Data Sheet: Specifications
3 pages
Higgins Resume
No ratings yet
Higgins Resume
1 page
Pestel Swot Analyses-Silver Cross
No ratings yet
Pestel Swot Analyses-Silver Cross
5 pages
Regression Analysis of Count Data 2nd Ed
No ratings yet
Regression Analysis of Count Data 2nd Ed
9 pages
Bayesian Statistics: A User's Perspective
No ratings yet
Bayesian Statistics: A User's Perspective
24 pages
13 Pag Design and Analysis of Experiments in The Health Sciences
No ratings yet
13 Pag Design and Analysis of Experiments in The Health Sciences
13 pages
Random Variable Generation
No ratings yet
Random Variable Generation
5 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Exercises of Statistical Inference
From Everand
Exercises of Statistical Inference
Simone Malacrida
No ratings yet

Analysis of Categorical Data

Uploaded by

Analysis of Categorical Data

Uploaded by

.

Dr. Jamal Abdul Nasir

MUBEEN ASGHAR (0557)

Categorical data analysis is the analysis of data where the

 The Goodness-of-Fit Test

The goodness-of-fit test is a statistical hypothesis test to see how

• Goodness-of-fit tests are statistical methods often used to make

There are multiple methods for determining goodness-of-fit. Some

• Goodness-of-fit tests are statistical tests aiming to determine whether a set of

The chi-square independence test is a procedure for

Where, O is an observed frequency and E is an estimated expected frequency.

The degrees of freedom is basically a

Popularity of psychology professors who enrolled students in college at 0.05

• State Null and Alternative Hypothesis.

A contingency table (also known as a cross tabulation or crosstab)

• 2×2 Contingency table

The two by two or fourfold contingency

where, for r rows and c columns of n observations, O is an observed frequency and E

Fisher's exact test is a statistical significance test used in the

 Both variables are dichotomous qualitative (2 cross 2 table).

 Data consist of two population. A sample observation from

 Equal to any non-negative number

To compute the odds of receiving a death penalty for each groups

The odds of death sentence if the defendant was non-black=22/52=0.4231

 Both variables are dichotomous qualitative (2 cross 2 table).

A matched pairs design is an experimental design that is used when

We now consider the more general situation where two

Testing Hypothesis of No Association

Here, we gonna focus on the relationship between 2 factors in the

For example, we might be interested in the relationship between

We will test for such a dependency, and, if we don’t

Passive smoke City polluted URI URI total

yes high 100 20 120

Passive smoke City polluted URI URI total

Passive smoke City polluted URI URI total

 For large samples, when H0 is true, CMH has chi-squared distribution with df = 1.

Use the Cochran–Mantel–Haenszel test (which is sometimes called the Mantel–

Vi, with a variance of Oi minus Ei, can be as,

• There is a limit to the kind of statistical analysis that can be performed on

You might also like