0% found this document useful (0 votes)
4 views

Binary Logistic Regression

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Binary Logistic Regression

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

BINARY LOGISTIC REGRESSION

Linear Regression is defined by the statement :

Yi ~ N(1 + 2 Xi2 + ... + k Xik , 2 )


or
k
Yi ~ N(  jXij , 2 ) , i = 1,2,...,n , j=1,2,...,k , Xi1 = 1 i
j=1

In BINARY LOGISTIC REGRESSION , Y assumes the values 0 and 1 , and so is a Bernoulli


random variable, the explanatory variables can be discrete or continuous but are treated as
fixed.
The basic form of Logistic regression can be derived using Bayes’ rule. Assume that k=2, so
that there is one non-trivial explanatory variable X and a constant term, then

P  Y = 0 P  X | Y=0
P Y = 0 | X =
P  Y = 0 P  X | Y=0 + P  Y = 1 P  X | Y=1
1
=
P  Y = 1 P  X | Y=1
1+
P  Y = 0 P  X | Y=0
1
=
   P  Y = 1 P  X | Y=1  
1 + exp log   
   P  Y = 0 P  X | Y=0  
1
=
   P  Y = 1   P  X | Y=1  
1 + exp log   + log   
   P  Y = 0   P  X | Y=0   
1
= , (1)
1 + exp 1 + 2 X
where
 P  Y = 1 
1 = log  
 P  Y = 0 
and
 P  X | Y=1 
2 = log  ,
 P  X | Y=0 

exp 1 + 2 X
if X is discrete. Also (1)  P  Y = 1| X  =
1 + exp 1 + 2 X

Page 1 of 8
If X is continuous (1) holds using the density f(.) instead of P. In other words,

P  Yi = 1| X i1  = F ( 1 + 2 X i1 ) ,

where

exp  x 
F(x) = .
1 + exp  x 

The conditional probability function is:

f ( y | Xi1 ) = P  Yi = y|Xi1 

= ( F ( 1 + 2 Xi1 ) ) (1 − F (1 + 2 Xi1 ) )


y 1− y

F ( 1 + 2 Xi1 ) if y=1


=
1 − F ( 1 + 2 Xi1 ) if y = 0
Thus, the logistic regression model is:

 exp 1 + 2 Xi1  


Yi |Xi1 ~ Bernoulli  
1 + exp 1 + 2 Xi1  
or
exp 1 + 2 Xi1 
i = P  Yi = 1 | Xi1  =
1 + exp 1 + 2 Xi1 
or
  
log  i  = 1 + 2 X i1
1 − i 
or
log it  i  = 1 + 2 X i1.

exp  x 
The term Logistic Regression derives from the fact that the function F(x) = is
1 + exp  x 
known as the Logistic Function.

ASSUMPTIONS
▪ The data Y1, Y2, ..., Yn are independently distributed, i.e., cases are independent.
▪ Binary logistic regression model assumes Bernoulli distribution of the response.
▪ Does NOT assume a linear relationship between the dependent variable and the independent
variables, but it does assume linear relationship between the logit of the response and the
explanatory variables; log it  i  = 1 + 2 X i1 .

Page 2 of 8
▪ Independent (explanatory) variables can be even the power terms or some other nonlinear
transformations of the original independent variables.
▪ The homogeneity of variance does NOT need to be satisfied. In fact, it is not even possible in
many cases given the model structure.
▪ Errors need to be independent but NOT normally distributed.
▪ It uses maximum likelihood estimation (MLE) rather than ordinary least squares (OLS) to
estimate the parameters, and thus relies on large-sample approximations.

For modelling , Logistic Regression is often used to estimate probabilities as a function of


explanatory variables, X. and parameters ,  . Often these probabilities are used to find odds,
odd ratios and relative risks.

ODDS AND ODDS RATIOS


The odds is the ratio of the probability that something is true divided by the probabilities that
it is not true. Thus,

P  Yi = 1| X i1 
Odd(X) = P  Yi = 0 | X i1 
= exp 1 + 2 X i1 .
The odd ratio is the ratio of two odds for different values of Xi1 , say Xi1 =x and Xi1 = x + x

Odd(x + x) exp 1 + 2 (x + x) 


=
Odd(x) exp 1 + 2 (x + x) 
= exp 2 x  ,

where x is a small change in x.


Then,

Page 3 of 8
1  Odd(x + x) - Odd(x) 
x  
lim
x →0 Odd(x) 
 exp 2 x  − 1 
= lim  
x →0
 x 
 exp 2 x  − 1 
= 2 lim  
x →0
 2 x 
d exp  u 
= 2
du u =0

= 2 exp  0
= 2 .

Thus, 2 may be interpreted as the relative change in the odds due to small change x in Xi1 :

Odd(x + x) - Odd(x) Odd(x + x)


= -1
Odd(x) Odd(x)
 2 x

If Xi1 is a Binary variable itself, Xi1 = 0 or Xi1 = 1 , then only reasonable choices for x + x
and x are 1 and 0, respectively, so that then
Odd(1)
-1
Odd(0)
Odd(1) − Odd(0)
=
Odd(0)
= exp 2  - 1.

Only if 2 is small we may use the approximation exp 2  - 1  2 . If not, one has to interpret
2 in terms of the log of the odd ratio involved:

 Odd(1) 
log   = 2 .
 Odd(0) 

Page 4 of 8
GENERALIZATION
If k  2 and X ij are independent

 P  X | Y=1  k  P  Xij | Yi =1 


log   
= log    .
 P  X | Y=0  j=2  P  Xij | Yi =0  

Setting

 P  Xij | Yi =1 
 jXij = log   
 P  Xij | Yi =0 

One can extend the model and obtain general logistic regression model

  k  
 exp 1 +   jXij  
  j= 2  
Yi |Xij ~ Bernoulli  .
 k 
1 + exp  +  X  
 1  j ij 
 
 j= 2  
Regardless of whether the Xs’ are dichotomous, polychotomous or continuous, Logistic
Regression is a way to identify the distribution of Y as a function of X and of parameter  ,
just as linear regression is a way to identify the distribution of a function of X and of parameter
(different)  .

The interpretation of the coefficients  j , j=2,3,...,k in the logistic model is given as:

(
Odd X1j , X 2j ,...,Xi-1, j , Xij + Xij , Xi+1, j , ...,X kj ) -1
(
Odd X1j , X 2j ,...,Xi-1, j , Xij , Xi+1,j , ...,X kj )
  j Xij ,

if X ij is small.

For example, j may be interpreted as the percentage change in the

( )
Odd X1j , X 2j ,...,Xi-1, j , Xij , Xi+1,j , ...,X kj due to small percentage change in X ij .

Page 5 of 8
ESTIMATION OF PARAMETERS
Let k=2. The parameters 1 and 2 are estimated using method of maximum likelihood.

The log of likelihood function L ( 1 , 2 ) is given as:

n
log L ( 1 , 2 ) =  log ( f ( yi | Xi1, 1 , 2 ) )
i =1
n n
=  yi log F(1 + 2 Xi1 ) +  (1 − yi ) log(1 − F(1 + 2Xi1))
i =1 i =1
n
F(1 + 2 Xi1 ) n
=  yi log +  log(1 − F(1 + 2Xi1))
i =1 1 − F(1 + 2 Xi1 ) i =1
n n
=  yi (1 + 2 X i1 ) -  log(1 + exp 1 + 2 Xi1 )
i =1 i =1

 log L ( 1 , 2 ) n n exp 1 + 2 Xi1 
=  yi − 
1 i =1 i =1 1 + exp 1 + 2 X i1 

and
 log L ( 1 , 2 ) n n X exp  +  X 
2
=  yi Xi1 -  1 +i1 exp 1 +  2X i1
i =1 i =1 1 2 i1
n
=  ( yi - i )Xi1
i =1

Since this is a transcendental equation, therefore it is not possible to obtain closed-form solution
of 2 . One can use Newton-Raphson can be used to obtain ̂2 :

 ˆ 1   ˆ 1(0) 
ˆ ˆ
• Guess initial value of  =   , say,  = 
(0)
 , say, ̂02 .
~  ˆ  ~  ˆ (0) 
 2  2 
• Use

Page 6 of 8
  log L ( 1 , 2 ) 
 
−1  1 ,
ˆ (t +1) = ˆ (t +1) +  −H 
~ ~   log L (  ,  ) 
 1 2

 2 
where H is Hessian Matrix given as:
  2 log L ( 1 , 2 )  2 log L ( 1 , 2 ) 
 
  21  212 
H=  2
  log L ( 1 , 2 )  log L ( 1 , 2 ) 
2

  212  21 
 

iteratively till two consecutive values of ̂ are approximately equal.


~

The estimated variance covariance matrix of ̂ is  − H  .The diagonal elements of this


−1
~

matrix gives estimated standard errors of parameters 1 and 2 .

Foe k>2, the result can be generalized.


TESTING OF HYPOTHESES
I. Testing the significance of single regression coefficient
If sample size is large , under H 0 :  j =  j0 ,

(
n ˆ j -  j0 ) ~ N(0, 1) , j = 1,2,...,k .
sˆ
j

These results can be used to test whether the coefficients  j is zero or not, j=1,3,…,k. The
null hypothesis H 0 :  j =  j0 , j = 2,…,k is of interest since this hypothesis implies that

the conditional probability P  Yi = 1| Xij  does not depend on X ij , j = 2,3,…,k. Under


H 0 :  j =  j0 ,

n ˆ j
~ N(0, 1) , j = 2,...,k .
sˆ
j

This statistic is called pseudo t-value as it is used in the same way as the t-value in linear
regression and sˆ is called the standard error of ̂ j .The test-statistic is also called Wald’s
j
statistic and the corresponding test Wald’s test.

Page 7 of 8
II. Testing the joint significance of all predictors
We are interested in testing H0 : 2 = 3 = ... = m = 0 (m  k) against the alternative
hypothesis that at least one of 2 , 3 , ... , m is not equal to zero. For this we proceed as
follows:
Re-estimate the logit model using

(
log L 0, 0 , ..., 0,ˆ m+1 , ˆ m+2 , ... , ˆ k = )
max
m+1 , m + 2 , ... , k
( log L ( 0, 0 , ..., 0,m+1 , m+2 , ... , k ) )
Then, under H0

(
 L 0, 0 , ..., 0,ˆ m+1 , ˆ m+ 2 , ... , ˆ k
LR m = -2log 
) 

 (
L ˆ 2 , ˆ 3 , ... , ˆ k ) 

 2m−1 .

This is the LIKELIHOOD RATIO test which is right-sided.


PREDICTION WITH LOGISTIC REGRESSION
From prediction point of view , logistic regression can be used for classification and the
zero and one are taken as class labels.

Suppose data of the form ( Yi , X i1 ) , i= 1,2,…,n is available and estimates of parameters

have been obtained. These estimators are consistent and asymptotically normally
distributed. The objective is to estimate conditional probability of the event such as
Yn +1 given X n +1 , 1. This is given as :

exp ˆ 1 + ˆ 2 X n+1, 1 


Est.P  Yn +1 = 1 | X n+1,1  = .
1 + exp ˆ 1 + ˆ 2 X n+1, 1 

If the above probability is greater than half , one is led to predict that Yn +1 = 1 , otherwise

Yn +1 = 1 for given X n+1, 1 .

References

Allison, P.D.(2012). Logistic Regression using SAS-Theory and Application, SAS


Institute Inc., Cary, NC, USA, 2nd ed..

Page 8 of 8

You might also like