Logistic Regression Analysis
Logistic Regression Analysis
1
Logistic Regression
In many studies the outcome variable of
interest is the presence or absence of some
condition, such as:
- survival status (alive or dead)
- responding or not to a treatment
- having a MI or not
- birth weight status (normal or low)
0.8
0.6
0.4
0.2
10 20 30 40 50
mother's age
Cont…
We can see that this plot is less informative
about the relationship between the outcome
and the explanatory variables than in the
case when the outcome variable is
continuous.
p=P(Y=1) = P(“success”)
Sample data
Low birth
weight
Age of
mother No Yes Total % LWBt
< 20 506 68 574 11.8
20-24 1207 123 1330 9.2
25-29 1163 101 1264 8.0
30-34 838 78 916 8.5
35-39 613 55 668 8.2
40-44 113 15 128 11.7
> 44 28 4 32 12.5
Total 4468 444 4912 9.0
Cumulative low birth weight rate by
age of the mother
9
8
7
6
Cumulative Percent
5
4
3
2
1
0
< 15 15-19 20-24 25-29 30-34 35-39 40-44 > 44
Age group
Cont…
Lower Upper
Age -0.036 0.014 6.47 1 0.011 0.965 0.938 0.992
Constant -1.302 0.366 12.64 1 0.000 0.272
pˆ
ln 1.302 0.036 X
1 pˆ
From the model, the coefficient of age implies
that for one year increase in the age of the
mother, the log odds that the newborn will be
low birth weight decreases by 0.036. When the
log odds decreases, the probability p decreases
Plot of estimated probability of LBW by
mothers age
0.16
0.14
Estimated probability of LBW
0.12
0.1
0.08
0.06
0.04
0.02
0
10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50
Age of mother
Inference on coefficients
Example
Consider the systolic blood pressure data and
let < 140 and ≥ 140 mmHg are categories to
define normal and elevated systolic blood
pressure. Let the group ≥ 140 be elevated
systolic BP and we want to identify factors
that contribute for elevated systolic blood
pressure.
23
Estimating Probabilities
In order to estimate the probability that a
person with a particular age will have
elevated systolic blood pressure, we simply
substitute the appropriate value of x into the
preceding equation.
32
Cont…
When 50 percent of the people are 1s, then the variance is
0.25, its maximum value. As we move to more extreme
values, the variance decreases. When P=.10, the variance
is 0.10.9 = .09, so as P approaches 1 or zero, the
variance approaches zero.
38
Cont…
The success of the logistic regression can be
assessed by looking at the classification table,
showing correct and incorrect classifications of
the dichotomous, ordinal, or polychotomous
dependent.
Wald χ
se(b)
Logistic regression
Model: log odds = β0 + β1x
= -2
Likelihood forxmodel
(LexcludingLikelihood
x – Lincluding
forx)model
fitted without x fitted with x
Likelihood-Ratio Test
The likelihood-ratio test uses the ratio of the
maximized value of the likelihood function for
the full model (LL1) over the maximized value
of the likelihood function for the simpler
model (LL0).
-2(LLo – LL1)