Logistic Regression
Logistic Regression
School of Business
DEPARTMENT -Management
Master of Business Administration
MARKETING ANALYTICS
(23BAT737)
Parveen Abrol
Assistant Professor
Chandigarh University
DISCOVER . LEARN .
EMPOWER
1
Learning Objectives
Introduction to
marketing
analytics
CO Title Level
Numb
er
How to measure
success?
2
Logistic Regression vs TGDA
• Two-Group Discriminant Analysis
– Implicitly assumes that the Xs are Multivariate Normally
(MVN) Distributed
– This assumption is violated if Xs are categorical variables
• Logistic Regression does not impose any restriction
on the distribution of the Xs
• Logistic Regression is the recommended approach if
at least some of the Xs are categorical variables
Data
Favored Stock Less Favored Stock
Success Size Success Size
1 1 0 1
1 1 0 0
1 1 0 0
1 1 0 0
1 1 0 0
1 1 0 0
1 1 0 0
1 1 0 0
1 1 0 0
1 1 0 0
1 0 0 0
1 0 0 0
Contingency Table
Not 1 11 12
Preferred
Total 11 13 24
Basic Concepts
• Probability
– Probability of being a preferred stock = 12/24 =
0.5
– Probability that a company’s stock is preferred
given that the company is large = 10/11 = 0.909
– Probability that a company’s stock is preferred
given that the company is small = 2/13 = 0.154
Concepts … contd.
• Odds
– Odds of a preferred stock = 12/12 = 1
– Odds of a preferred stock given that the company
is large = 10/1 = 10
– Odds of a preferred stock given that the company
is small = 2/11 = 0.182
Odds and Probability
• Odds(Event) = Prob(Event)/(1-Prob(Event))
• Prob(Event) = Odds(Event)/(1+Odds(Event))
Logistic Regression
• Take Natural Log of the odds:
– ln(odds(Preferred|Large)) = ln(10) = 2.303
– ln(odds(Preferred|Small)) = ln(0.182) = -1.704
• Recall:
– Odds = p/(1-p)
• p= e 0 1 X 1
1 e 0 1 X 1
• p= 1
1 e ( 0 1 X 1 )
Logistic Function
1
0.9
0.8
0.7
0.6
0.5 Series1
p
0.4
0.3
0.2
0.1
0
-50 -30 -10 10 30 50
X
Estimation
• Coefficients in the regression model are
estimated by minimizing the sum of squared
errors
• Since, p is non-linear in the parameter
estimates we need a non-linear estimation
technique
– Maximum-Likelihood Approach
– Non-Linear Least Squares
Maximum Likelihood Approach
• Conditional on parameter , write out the probability
of observing the data
• Write this probability out for each observation
• Multiply the probability of each observation out to
get the joint probability of observing the data
condition on
• Find the that maximizes the conditional probability
of realizing this data
Logistic Regression
• Logistic Regression with one categorical
explanatory variable reduces to an analysis of
the contingency table
Interpretation of Results
Look at the –2 Log L statistic
• Intercept only: 33.271
• Intercept and Covariates: 17.864
• Difference: 15.407 with 1 DF (p=0.0001)
• Means that the size variable is explaining a lot
Do the Variables Have a Significant
Impact?
• Like testing whether the coefficients in the
regression model are different from zero
• Look at the output from Analysis of Maximum
Likelihood Estimates
– Loosely, the column Pr>Chi-Square gives you the
probability of realizing the estimate in the Parameter
estimate column if the estimate were truly zero – if this
value is < 0.05 the estimate is considered to be significant
Other things to Look for
• Akaike’s Information Criterion (AIC),
Schwartz’s Criterion (SC) – this like Adj-R2 – so
there is a penalty for having additional
covariates
• The larger the difference between the second
and third columns – the better the model fit
Interpretation of the Parameter
Estimates
• ln(p/(1-p)) = -1.705 + 4.007*Size
• p= e 0 1Size 2 FP
1 e 0 1Size 2 FP
• p= 1
1 e ( 0 1Size 2 FP )
Estimation & Interpretation of the
Results
Reference Books:-
•1. Halligan, Bryan., Shah, Dharmesh., Inbound Marketing, John Wiley, 1 st Edition.
2016.
•2. Sauro, Jeff., Customer Analytics for Dummies, John Wiley 1 st Edition, 2017.
27