0% found this document useful (0 votes)
87 views

Chapter 10 - Logistic Regression: Data Mining For Business Intelligence

This document discusses logistic regression, a powerful classification model. It relates predictor variables to a binary outcome using the logit function and maximum likelihood estimation. It also covers interpreting coefficients, variable selection, and assessing significance of predictors.

Uploaded by

jay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views

Chapter 10 - Logistic Regression: Data Mining For Business Intelligence

This document discusses logistic regression, a powerful classification model. It relates predictor variables to a binary outcome using the logit function and maximum likelihood estimation. It also covers interpreting coefficients, variable selection, and assessing significance of predictors.

Uploaded by

jay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 20

Chapter 10 – Logistic Regression

Data Mining for Business Intelligence


Shmueli, Patel & Bruce

© Galit Shmueli and Peter Bruce 2010


Logistic Regression
 Powerful model-based classification tool
 Extends idea of linear regression to situation where
outcome variable is categorical
Model relates predictors with the outcome
Example: Y denotes recommendation on
holding/selling/buying a stock – categorical variable with 3
categories
 We focus on binary classification, i.e. Y=0 or Y=1 but
predictors can be categorical or continuous
 Widely used, particularly where a structured model is
useful
The Logit
Goal: Find a function of the predictor variables that relates
them to a 0/1 outcome

 Instead of Y as outcome variable (like in linear regression),


we use a function of Prob(Y=1) called the logit
 Logit can be modeled as a linear function of the predictors
 The logit can be mapped back to a probability, which, in
turn, can be mapped to a class
Using cut-off value on the probability of belonging to class 1,
P(Y=1)
Step 1: Logistic Response Function
• Let p = probability of belonging to class 1
• Logistic regression relates p to predictors with a function
that guarantees 0  p  1

Standard linear function (shown below) does not:

+…

q = number of predictors
The Fix:
use logistic response function

Equation 10.2 in textbook


Step 2: The Odds

The odds of an event are defined as:

p
eq. 10.3 Odds  p = probability of event
1 p

Or, given the odds of an event, the probability of the


event can be computed by:

Odds
eq. 10.4
p
1  Odds
We can also relate the Odds to the
predictors:

 0  1 x1   2 x2   q xq
eq. 10.5 Odds  e

Recall that:
Step 3: Take log on both sides
• This gives us the logit:

log(Odds)   0  1 x1   2 x2     q xq eq. 10.6

• Log(odds) is called the logit and it takes values from –∞


to +∞
• Logit is the dependent variable, and is a linear function of
the predictors x1, x2, …, xq
• Helps make interpretations easier
Example: Acceptance of Personal Loan
Offer
Outcome variable: accept bank loan (0/1)

Predictors: Demographic (age, income, etc.), and information about


their bank relationship (mortgage, securities account, etc.)

Data: 5000 customers – 480 (9.6%) accepted the loan offer previously

Goal: find characteristics of customers who are most likely to accept


loan offer in future mailings
Data preprocessing
Partition 60% training, 40% validation
Create 0/1 dummy variables for categorical predictors
Single Predictor Model
Modeling loan acceptance on income (x)

Fitted coefficients: b0 = -6.3525, b1 = 0.0392


Last step - classification
 Model produces an estimated probability of being a “ 1”
Example: P(accept loan|income)
 Convert to a classification by establishing cutoff level
 If estimated prob. > cutoff, classify as “ 1”
 Thus model helps in classification as well as predicting the
probability of belonging to one class
 Default cut-off value: 0.50 but can be changed to:
Maximize classification accuracy
Example: Parameter estimation
Estimates of ’s are derived through an iterative
process called maximum likelihood estimation

Let us include all 12 predictors in the model now


Estimated Equation for Logit

• Interpreting binary predictor effects:


• The odds of accepting the loan offer for those who already have a CD account with the
bank is 32.1 times as the odds of accepting the loan offer for those who do not have a
CD account (p value < 0.001).

• Interpreting continuous predictor effects:


• The odds of accepting the loan offer increases by 77.1% if the family size increases by
one (p value < 0.001).
• The odds of accepting the loan offer decreases by 4.4% if a client is 1 year older (p value
= 0.624).
http://www.ats.ucla.edu/stat/mult_pkg/faq/general/odds_ratio.htm
Variable Selection
Problems:
As in linear regression, correlated predictors introduce bias in
the method
 Overly complex models have the danger of overfitting

Solution: Remove extreme redundancies by dropping


predictors via automated selection of variable subsets (like
linear regressions) or by data reduction methods such as PCA
P-values for Predictors
 Test null hypothesis that coefficient = 0
P-values with the coefficients display results of these tests
Coefficients with low p-values (close to 0) are statistically
significant
 Useful for review to determine whether to include variable
in model
 Key in profiling tasks, but less important in predictive
classification
Summary
 Logistic regression is similar to linear regression, except
that it is used with a categorical response
 It can be used for explanatory tasks (=profiling) or
predictive tasks (=classification)
 The predictors are related to the response Y via a nonlinear
function called the logit
 As in linear regression, reducing predictors can be done via
variable selection
 Logistic regression can be generalized to more than two
classes

You might also like