Logistic regression_2021 ch-8
Logistic regression_2021 ch-8
haileabfekadu(@gmail.com
July 2021 1
Objectives:
Explore relationship between two categorical variables
2
Statistical Methods for Categorical Variables
Chi-squared test
Measures of association
Logistic Regression
3
Recap: Important terms
Statistical significance
4
Introduction
Categorical Response Variables:
Variables measured in Nominal scale and ordinal scale
None
Pain Ordinal Response Moderate
sever
5
Introduction cont’d…
6
Testing and measuring Association
Testing Hypothesis about association between variables
Hypothesis
1. The hypothesis to be tested can be stated as:
H0: There is no association b/n variables
H1: There is association
2. Compute the calculated and tabulated value of the test
statistic
3. Compare the two (tabulated and calculated values)
4. Decision
7
testing Association: Chi-square
• The test statistic is used to test the association between
two categorical variables
9
Solution
• Hypothesis:
– H0:: There is no association between helmet and head injury
– HA: There is association between helmet and head injury
10
Chi-squared cont’d…
• Limitations
– Discussion
11
Measures of Association
• How do we determine whether a certain disease is associated
with a certain exposure?
12
Measures of Association
Odds Ratio:
• Ratio of two odds
• The odds in favor of an event happening (such as getting a
disease) is the probability of the event happening relative to
the probability of the event not happening.
𝑎𝑑
OR=
𝑏𝑐
112∗224
OR= =1.62
176∗88
• Interpret ???
15
Interpretation
16
Examples using STATA
Consider the Jimma Infant data
• STATA Code:
– tabulate sex bwt
– tabulate sex bwt,chi2
17
STATA output
18
Logistic regression
In linear regression,
Dependent variable continuous
Independent variable/s categorical or numeric
19
Logistic regression for binary outcome
• Suppose Y= 1, Yes/success == π = probability of success= P(Y=1)
0, No/failure == 1- π =probability of failure=P(Y=0)
yi # of 1' s
y Proportionof " success"
n # of trials
Goal of logistic regression: Predict the “true” proportion of success,
π, at any value of the predictor.
Modeling binary data cont’d…
21
Logit Function
no data Function Plot
1.0
0.8
0.6
y
0.4
0.2
-10 -8 -6 -4 -2 0 2 4 6 8 10 12
x
exp ( bo + b1• x )
y=
1 + exp ( bo + b1• x )
Logistic regression cont’d…
Again such S-shaped (sigmoid) curve is difficult to describe with a linear
equation for two reasons.
First, even though it seems linear at the center of the curve, the
extremes do not follow a linear trend;
Second, the errors are neither normally distributed nor constant across
the entire range of data.
Answer:
First: Find a function that best fits (be linked) with this S- shaped
graph
Second: Find another function that transforms the S-shaped graph
into linear function
23
(I) Finding a function that best fits with the S-
shaped graph of probability
1.0
0.8
0.6
P = P(y/x) = P(success given x
0.4
occurred) = P(a person is +ve CHD
0.2 given his age is x)
0.0
20 40 60 80 100
It always has an S- shaped curve within the range of 0 and 1 for any x
That is why we linked it with p (probability) which has the same S-shape in the
same range of 0 to 1
24
b0 + b1 X
e
p= b0 + b1 X
1+ e
log 0 1 X
1
curve
27
Data exploration:
Explore relationship between variables
• We want to model the probability to develop a MI given the
aspirin.
28
Modeling binary data cont’d…
• Determine he probability of successes
• The probability of success: P(Z=1).
• This is the probability to have cardiovascular disease.
𝛽 +𝛽 𝑋
𝑒 0 1 𝑗
𝑃𝑗 = 𝛽0 +𝛽1 𝑋𝑗 , The parameter βj is the
1+𝑒
treatment effect
• We want to see if Aspirin intake has an effect on the
probability to have Myocardial infarction.
29
Model formulation: Binary logistic Regression
• Model the probability of o have Myocardial infarction given
the aspirin intake
logit(𝜋(x)) = 𝛽0 + 𝛽1 𝑋𝑗 Remark: 𝜋(x)=𝜋=P
𝑒 𝛽0 +𝛽1 𝑋𝑗
𝜋𝑗 =
1 + 𝑒 𝛽0 +𝛽1 𝑋𝑗
• The logit model is the logit transformation of the
probability
logit (p) = log(odds)
31
Example-1: cont’d…
32
Multiple Logistic Regression Model
• Simple logistic regression
33
Multivariable logistic regression:
• In its most basic form, it can test the hypothesis that all the
coefficients in a model are all equal to 0:
H0: ß1 = ß2 = . . . = ßk = 0
H1: ßk
40
Example : Jimma infant Data cont’d…
41
Example : Jimma infant Data cont’d…
STATA CODE:
logit bwt i.sex, or
42
Example : Jimma infant Data cont’d…
44
Example : Jimma infant Data cont’d…
sex
male .6468934 .0539523 -5.22 0.000 .5493393 .7617715
place
semi-urban 1.935937 .3516486 3.64 0.000 1.356053 2.763794
rural 5.436301 .835046 11.02 0.000 4.023039 7.346031
STATA CODE:
logit bwt i.sex i.place, or 45
Example : Jimma infant Data cont’d…
46
Example : Jimma infant Data cont’d…
Multiple logistic Regression …
Add a third variable: mother’s age (momage) measured in years
Logistic regression Number of obs = 7,873
LR chi2(4) = 290.71
Prob > chi2 = 0.0000
Log likelihood = -2136.6035 Pseudo R2 = 0.0637
sex
male -.4384726 .0835816 -5.25 0.000 -.6022895 -.2746556
site
2 .4978627 .1891165 2.63 0.008 .1272011 .8685243
3 1.733488 .1537344 11.28 0.000 1.432174 2.034802
STATA CODE:
logit bwt i.sex i.place age 47
Example : Jimma infant Data cont’d…
48
Example : Jimma infant Data cont’d…
Multiple Logistic Regression …
Odds Ratio estimates
Logistic regression Number of obs = 7,873
LR chi2(4) = 290.71
Prob > chi2 = 0.0000
Log likelihood = -2136.6035 Pseudo R2 = 0.0637
sex
male .6450209 .0539119 -5.25 0.000 .5475566 .7598337
site
2 1.645201 .3111347 2.63 0.008 1.135645 2.383391
3 5.660361 .8701922 11.28 0.000 4.187793 7.650734
STATA CODE:
logit bwt i.sex i.place age, or 49
Model Comparison
• Compare the model fit statistics, say -2Log-likelihood of the
three models considered so far, i.e,
• Simple logistic regression: only `sex' as a factor,
• Multiple logistic regression: `sex' and `place' as two factor
variables, and
• Multiple logistic regression: `sex', `place' and `maternal age'.
50
Hosmer and Lemeshow Test
• The Hosmer -Lemeshow goodness- of - fit statistic is used to
assess whether the necessary assumptions for the
application of multiple logistic regression are fulfilled.