Binary Logistic Regression
Binary Logistic Regression
P Y = 0 P X | Y=0
P Y = 0 | X =
P Y = 0 P X | Y=0 + P Y = 1 P X | Y=1
1
=
P Y = 1 P X | Y=1
1+
P Y = 0 P X | Y=0
1
=
P Y = 1 P X | Y=1
1 + exp log
P Y = 0 P X | Y=0
1
=
P Y = 1 P X | Y=1
1 + exp log + log
P Y = 0 P X | Y=0
1
= , (1)
1 + exp 1 + 2 X
where
P Y = 1
1 = log
P Y = 0
and
P X | Y=1
2 = log ,
P X | Y=0
exp 1 + 2 X
if X is discrete. Also (1) P Y = 1| X =
1 + exp 1 + 2 X
Page 1 of 8
If X is continuous (1) holds using the density f(.) instead of P. In other words,
P Yi = 1| X i1 = F ( 1 + 2 X i1 ) ,
where
exp x
F(x) = .
1 + exp x
f ( y | Xi1 ) = P Yi = y|Xi1
exp x
The term Logistic Regression derives from the fact that the function F(x) = is
1 + exp x
known as the Logistic Function.
ASSUMPTIONS
▪ The data Y1, Y2, ..., Yn are independently distributed, i.e., cases are independent.
▪ Binary logistic regression model assumes Bernoulli distribution of the response.
▪ Does NOT assume a linear relationship between the dependent variable and the independent
variables, but it does assume linear relationship between the logit of the response and the
explanatory variables; log it i = 1 + 2 X i1 .
Page 2 of 8
▪ Independent (explanatory) variables can be even the power terms or some other nonlinear
transformations of the original independent variables.
▪ The homogeneity of variance does NOT need to be satisfied. In fact, it is not even possible in
many cases given the model structure.
▪ Errors need to be independent but NOT normally distributed.
▪ It uses maximum likelihood estimation (MLE) rather than ordinary least squares (OLS) to
estimate the parameters, and thus relies on large-sample approximations.
P Yi = 1| X i1
Odd(X) = P Yi = 0 | X i1
= exp 1 + 2 X i1 .
The odd ratio is the ratio of two odds for different values of Xi1 , say Xi1 =x and Xi1 = x + x
Page 3 of 8
1 Odd(x + x) - Odd(x)
x
lim
x →0 Odd(x)
exp 2 x − 1
= lim
x →0
x
exp 2 x − 1
= 2 lim
x →0
2 x
d exp u
= 2
du u =0
= 2 exp 0
= 2 .
Thus, 2 may be interpreted as the relative change in the odds due to small change x in Xi1 :
If Xi1 is a Binary variable itself, Xi1 = 0 or Xi1 = 1 , then only reasonable choices for x + x
and x are 1 and 0, respectively, so that then
Odd(1)
-1
Odd(0)
Odd(1) − Odd(0)
=
Odd(0)
= exp 2 - 1.
Only if 2 is small we may use the approximation exp 2 - 1 2 . If not, one has to interpret
2 in terms of the log of the odd ratio involved:
Odd(1)
log = 2 .
Odd(0)
Page 4 of 8
GENERALIZATION
If k 2 and X ij are independent
Setting
P Xij | Yi =1
jXij = log
P Xij | Yi =0
One can extend the model and obtain general logistic regression model
k
exp 1 + jXij
j= 2
Yi |Xij ~ Bernoulli .
k
1 + exp + X
1 j ij
j= 2
Regardless of whether the Xs’ are dichotomous, polychotomous or continuous, Logistic
Regression is a way to identify the distribution of Y as a function of X and of parameter ,
just as linear regression is a way to identify the distribution of a function of X and of parameter
(different) .
The interpretation of the coefficients j , j=2,3,...,k in the logistic model is given as:
(
Odd X1j , X 2j ,...,Xi-1, j , Xij + Xij , Xi+1, j , ...,X kj ) -1
(
Odd X1j , X 2j ,...,Xi-1, j , Xij , Xi+1,j , ...,X kj )
j Xij ,
if X ij is small.
( )
Odd X1j , X 2j ,...,Xi-1, j , Xij , Xi+1,j , ...,X kj due to small percentage change in X ij .
Page 5 of 8
ESTIMATION OF PARAMETERS
Let k=2. The parameters 1 and 2 are estimated using method of maximum likelihood.
n
log L ( 1 , 2 ) = log ( f ( yi | Xi1, 1 , 2 ) )
i =1
n n
= yi log F(1 + 2 Xi1 ) + (1 − yi ) log(1 − F(1 + 2Xi1))
i =1 i =1
n
F(1 + 2 Xi1 ) n
= yi log + log(1 − F(1 + 2Xi1))
i =1 1 − F(1 + 2 Xi1 ) i =1
n n
= yi (1 + 2 X i1 ) - log(1 + exp 1 + 2 Xi1 )
i =1 i =1
log L ( 1 , 2 ) n n exp 1 + 2 Xi1
= yi −
1 i =1 i =1 1 + exp 1 + 2 X i1
and
log L ( 1 , 2 ) n n X exp + X
2
= yi Xi1 - 1 +i1 exp 1 + 2X i1
i =1 i =1 1 2 i1
n
= ( yi - i )Xi1
i =1
Since this is a transcendental equation, therefore it is not possible to obtain closed-form solution
of 2 . One can use Newton-Raphson can be used to obtain ̂2 :
ˆ 1 ˆ 1(0)
ˆ ˆ
• Guess initial value of = , say, =
(0)
, say, ̂02 .
~ ˆ ~ ˆ (0)
2 2
• Use
Page 6 of 8
log L ( 1 , 2 )
−1 1 ,
ˆ (t +1) = ˆ (t +1) + −H
~ ~ log L ( , )
1 2
2
where H is Hessian Matrix given as:
2 log L ( 1 , 2 ) 2 log L ( 1 , 2 )
21 212
H= 2
log L ( 1 , 2 ) log L ( 1 , 2 )
2
212 21
(
n ˆ j - j0 ) ~ N(0, 1) , j = 1,2,...,k .
sˆ
j
These results can be used to test whether the coefficients j is zero or not, j=1,3,…,k. The
null hypothesis H 0 : j = j0 , j = 2,…,k is of interest since this hypothesis implies that
n ˆ j
~ N(0, 1) , j = 2,...,k .
sˆ
j
This statistic is called pseudo t-value as it is used in the same way as the t-value in linear
regression and sˆ is called the standard error of ̂ j .The test-statistic is also called Wald’s
j
statistic and the corresponding test Wald’s test.
Page 7 of 8
II. Testing the joint significance of all predictors
We are interested in testing H0 : 2 = 3 = ... = m = 0 (m k) against the alternative
hypothesis that at least one of 2 , 3 , ... , m is not equal to zero. For this we proceed as
follows:
Re-estimate the logit model using
(
log L 0, 0 , ..., 0,ˆ m+1 , ˆ m+2 , ... , ˆ k = )
max
m+1 , m + 2 , ... , k
( log L ( 0, 0 , ..., 0,m+1 , m+2 , ... , k ) )
Then, under H0
(
L 0, 0 , ..., 0,ˆ m+1 , ˆ m+ 2 , ... , ˆ k
LR m = -2log
)
(
L ˆ 2 , ˆ 3 , ... , ˆ k )
2m−1 .
have been obtained. These estimators are consistent and asymptotically normally
distributed. The objective is to estimate conditional probability of the event such as
Yn +1 given X n +1 , 1. This is given as :
If the above probability is greater than half , one is led to predict that Yn +1 = 1 , otherwise
References
Page 8 of 8