Binary Logit Regression
Binary Logit Regression
• Unlike multiple linear regression or Linear discriminant analysis, Logistic regression fits
an S-shaped curve to the data.
• The curved relationship ensures two things (1) the predicted values are always between
0 and 1 and (2) the predicted values correspond to the probability of Y being 1 or 0 on the
basis of cut-off score.
• Binary logistic regression is a type of regression analysis where the dependent variable is
a dummy variable: coded 0 (did not vote) or 1(did vote); No =0 and Yes =1; will not buy =0
and Buy=1 and so on.
The Logit Model
Algorithm:
While Linear models use the OLS (Ordinary Least Square method), Logistic regression
uses the MLE approach (Maximum Likelihood Estimation) technique.
Logit model tries to estimate the odds that the dependent variable can be predicted
using the independent variable values.
This is done by starting out with random set of coefficients and then improving
iteratively based on the log likelihood measure.
The iterative process stops when further improvement in the prediction is insignificant.
Ex: Data relating to 30 respondents, 15 of whom are brand loyal (coded as 1) and
those are not loyal coded as (0).
We are trying to ascertain consumers loyalty towards a retail store which deals
with apparel of a specific brand. Consumers’ loyalty is described as the function
of ‘attitude toward the brand’, ‘attitude toward the product category’, and ‘attitude
toward shopping’ all on a scale of 1 (highly unfavourable) to 7 (favourable) scale.
Data follows in the excel sheet
First, we run an OLS regression to illustrate the limitations of this procedure for
analysing binary data. The estimated equation is given below.
Standardiz
ed
Unstandardized Coefficient
Coefficients s
Model B Std. Error Beta t Sig.
1 (Constant) -.684 .321 -2.129 .043
VAR00002 .183 .050 .587 3.675 .001
VAR00003 .020 .046 .065 .441 .663
VAR00004 .074 .068 .174 1.088 .286
a. Dependent Variable: VAR00001
Limitation of OLS – When Dependent is Binary
Interpretations from the OLS regression results.
• Likewise, the estimated values of p are greater than 1 when brand =7, product =7,
and shopping = 4. This is intuitively and conceptually wrong because p is a
probability and must lie between 0 and 1.
Where,
ai = logistical coefficient for that predictor variable
SEai = standard error of the logistical coefficient
The Wald statistic is chi-square distributed with 1 degree of freedom if the
variable is metric and the number of categories minus 1 if the variable is
nonmetric.
Interpretation of Coefficients
• If Xi is increased by one unit, the log odds will change by ai units, when the
effect of other independent variables is held constant. Thus, ai is the size of
the increase in the log odds of the dependent variable.
• The sign of ai will determine whether the probability increases (if the sign is
positive) or decreases (if the sign is negative) by this amount.
No. Loyalty Brand Product Shopping No. Loyalty Brand Product Shopping
1 1 4 3 5 16 0 3 1 3
2 1 6 4 4 17 0 4 6 2
3 1 5 2 4 18 0 2 5 2
4 1 7 5 5 19 0 5 2 4
5 1 6 3 4 20 0 4 1 3
6 1 3 4 5 21 0 3 3 4
7 1 5 5 5 22 0 3 4 5
8 1 5 4 2 23 0 3 6 3
9 1 7 5 4 24 0 4 4 2
10 1 7 6 4 25 0 6 3 6
11 1 6 7 2 26 0 3 6 3
12 1 5 6 4 27 0 4 3 2
13 1 7 3 3 28 0 3 5 2
14 1 5 1 4 29 0 5 5 3
15 1 7 5 5 30 0 1 3 2
Results of Logistic Regression
Dependent Variable Encoding
Model Summary
Step −2 Log Likelihood Cox & Snell R Square Nagelkerke R Square
1 23.471a .453 .604
a Estimation terminated at iteration number 6 because parameter estimates changed by less than .
001.
McFadden R Square value in between 0.2 to 0.4 is considered very good model fit.
Results of Logistic Regression
Classification Tablea
blank blank blank blank Predicted blank
blank blank blank Loyalty to the Brand blank blank
blank Observed blank Not Loyal Loyal Percentage Correct
Step 1 Loyalty to the brand Not loyal 12 3 80.0
blank blank Loyal 3 12 80.0
blank Overall percentage blank blank blank 80.0