0% found this document useful (0 votes)

128 views24 pages

Logistic Regression

Logistic regression is used to predict a binary categorical dependent variable from continuous and/or categorical independent variables. It is appropriate when the dependent variable violates the assumption of linearity required for normal regression. The document discusses the theory, assumptions, and interpretation of logistic regression models, including how to assess model fit and predictor effects. It also covers multinomial logistic regression for dependent variables with more than two categories.

Uploaded by

Veerpal Khaira

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

128 views24 pages

Logistic Regression

Uploaded by

Veerpal Khaira

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 24

1

• When and Why do we Use Logistic

Regression?
• Binary
• Multinomial
• Theory Behind Logistic Regression
• Assessing the Model
• Assessing predictors
• Things that can go Wrong
• Interpreting Logistic Regression
When And Why

• To predict an outcome variable that is

categorical from one or more categorical or
continuous predictor variables.
• Used because having a categorical outcome
variable violates the assumption of linearity in
normal regression.
• Does not assume a linear relationship
between DV and IV
No assumptions about the distributions of the predictor
variables.

Predictors do not have to be normally distributed

Logistic regression does not make any assumptions of normality, linearity,

and homogeneity of variance for the independent variables.

Because it does not impose these

requirements, it is preferred to discriminant
4
analysis when the data does not satisfy these assumptions.
• Logistic regression is used to analyze relationships between a
dichotomous dependent variable and continue or dichotomous
independent variables.

• Logistic regression combines the independent variables to

estimate the probability that a particular event will occur, i.e. a
subject will be a member of one of the groups defined by the
dichotomous dependent variable
1
P(Y)  X  )
1e (
b 0  b1 1 i

• Outcome
• We predict the probability of the outcome
occurring
• b0 and b0
• Can be thought of in much the same way as
multiple regression
• Note the normal regression equation forms part
of the logistic regression equation
1
P(Y) 
1e ( b0  b1X 1 b2X2 ... b n X n  i)

• Outcome
• We still predict the probability of the
outcome occurring
• Differences
• Note the multiple regression equation forms
part of the logistic regression equation
• This part of the equation expands to
accommodate additional predictors
Logit p = α + β1X1 +β2X2 + .. + βpXp
α represents the overall disease risk
β1 represents the fraction by which the disease risk is altered by a unit
change in X1
β2 is the fraction by which the disease risk is altered by a unit change in
X2
……. and so on.
What changes is the log odds. The odds themselves are changed by eβ
If β = 1.6 the odds are e1.6 = 4.95
Measuring the Probability of Outcome
The probability of the outcome is measured by the odds of occurrence
of an event.
If P is the probability of an event, then (1-P) is the probability of it not
occurring.
Odds of success = P / 1-P

P
1 P
• Forced Entry: All variables entered simultaneously.
• Hierarchical: Variables entered in blocks.
• Blocks should be based on past research, or theory being tested. Good
Method.
• Stepwise: Variables entered on the basis of statistical criteria
(i.e. relative contribution to predicting outcome).
• Should be used only for exploratory analysis.
Stage 1:
Objectives Of logistic regression
Identify the independent variable that impact on
the dependent variable

 Establishing classification system based on the

logistic model for determining the group
membership
• BINARY LOGISTIC REGRESSION
It is used when the dependent variable is dichotomous.

MULTINOMIAL LOGISTIC REGRESSION

It is used when the dependent or outcomes variable has more than
two categories.
Independent Dependent
Variable Variable

13
Independent Dependent
Variable Variable

14
Binary logistic regression expression

BINARY Y = Dependent Variables

ß˚ = Constant
ß1 = Coefficient of variable X1
X1 = Independent Variables
E = Error Term
• Very small samples have so much sampling
errors.
• Very large sample size decreases the chances of
errors.
• Logistic requires larger sample size than multiple
regression.
• Researchers recommended sample size greater
than 400.
Sample Size Per Category Of The Independent Variable

The recommended sample size for each group is at

least 10 observations per estimated parameters.
Logistic relationship describe earlier in both estimating the logistic
model and establishing the relationship between the dependent
and independent variables.
Result is a unique transformation of dependent variables which
impacts not only the estimation process but also the resulting
coefficients of independent variables.
Maximum Likelihood Estimation (MLE)

 MLE is a statistical method for estimating the

coefficients of a model.
 The likelihood function (L) measures the
probability of observing the particular set of
dependent variable values (p1, p2, ..., pn) that
occur in the sample:
L = Prob (p1* p2* * * pn)
 The higher the L, the higher the probability of
observing the ps in the sample.
 S-shaped
 Range (0-1)
LP Model

1
Logit Model

0
The data used to conduct logistic regression is from a survey of 30
homeowners conducted by an electricity company about an offer of roof
solar panels with a 50% subsidy from the state government as part of the
state’s environmental policy.
The variables are:

IVs: household income measured in units of a thousand dollars , age of

householder, monthly mortgage, size of family household
DV: whether the householder would take or decline the offer.

Take the offer was coded as 1 and decline the offer was coded as 0.
To determine whether household income and monthly
mortgage will predict taking or declining the solar
panel offer

Independent Variables: household income and monthly

mortgage

Dependent Variables: Take the offer or decline the offer

Two hypotheses to be tested
• There are two hypotheses to test in relation to the overall fit of the
model:

H0: The model is a good fitting model

H1: The model is not a good fitting model (i.e. the predictors have a
significant effect)

3 Regression Diagnostics
100% (1)
3 Regression Diagnostics
53 pages
Linear Regression: in Machine Learning
No ratings yet
Linear Regression: in Machine Learning
6 pages
Essentials of Linear Regression in Python
No ratings yet
Essentials of Linear Regression in Python
23 pages
Chapter 5.3-Mulitple Linear Regression
No ratings yet
Chapter 5.3-Mulitple Linear Regression
26 pages
Eda PDF
100% (1)
Eda PDF
45 pages
Regression Analysis
100% (2)
Regression Analysis
9 pages
Cheatsheet Machine Learning Tips and Tricks PDF
No ratings yet
Cheatsheet Machine Learning Tips and Tricks PDF
2 pages
Finance-Focused Big Data Techniques
100% (1)
Finance-Focused Big Data Techniques
23 pages
1694600777-Unit2.2 Logistic Regression CU 2.0
100% (1)
1694600777-Unit2.2 Logistic Regression CU 2.0
37 pages
Lecture 9 PDF
100% (1)
Lecture 9 PDF
28 pages
Customer Data Analysis & Feature Engineering
No ratings yet
Customer Data Analysis & Feature Engineering
35 pages
Statistical Machine Learning
100% (1)
Statistical Machine Learning
12 pages
Notes For 18.6501x, Fundamentals of Statistics: v0.2 (2019 April 24)
100% (1)
Notes For 18.6501x, Fundamentals of Statistics: v0.2 (2019 April 24)
14 pages
Bootstrap Powerpoint
100% (1)
Bootstrap Powerpoint
20 pages
LDA 01 Linear Discriminant Analysis
No ratings yet
LDA 01 Linear Discriminant Analysis
65 pages
Probability Guide for Students
100% (1)
Probability Guide for Students
7 pages
Data Science Interview Stats Q&A
No ratings yet
Data Science Interview Stats Q&A
5 pages
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
100% (1)
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
15 pages
Data Science Probability Cheat Sheet
50% (2)
Data Science Probability Cheat Sheet
74 pages
0.1 Stock Data
100% (1)
0.1 Stock Data
4 pages
Unit-5 Decision Trees and Ensemble Learning
100% (1)
Unit-5 Decision Trees and Ensemble Learning
162 pages
Oil Export Indonesia
100% (1)
Oil Export Indonesia
12 pages
Using Categorical Data With One Hot Encoding - Kaggle PDF
No ratings yet
Using Categorical Data With One Hot Encoding - Kaggle PDF
4 pages
Variosalgoritmos - Jupyter Notebook
100% (1)
Variosalgoritmos - Jupyter Notebook
9 pages
Logistic Regression
100% (1)
Logistic Regression
14 pages
Supervised Learning - Regression - Annotated
No ratings yet
Supervised Learning - Regression - Annotated
97 pages
Random Forest: Machine Learning Guide
100% (1)
Random Forest: Machine Learning Guide
32 pages
Linear Regression With Multiple Variables - Machine Learning, Deep Learning, and Computer Vision
100% (1)
Linear Regression With Multiple Variables - Machine Learning, Deep Learning, and Computer Vision
12 pages
Intro to Machine Learning Basics
100% (1)
Intro to Machine Learning Basics
52 pages
Ensemble Classifiers
100% (1)
Ensemble Classifiers
37 pages
U02Lecture07 Classification
100% (1)
U02Lecture07 Classification
56 pages
Lead Scoring Case Study Presentation
100% (2)
Lead Scoring Case Study Presentation
11 pages
Data Visualisation Using Pyplot
No ratings yet
Data Visualisation Using Pyplot
20 pages
Churn Data
100% (1)
Churn Data
56 pages
Least Squares Problems: How To State and Solve Them, Then Evaluate Their Solutions
100% (1)
Least Squares Problems: How To State and Solve Them, Then Evaluate Their Solutions
63 pages
Outlier Detection Techniques
No ratings yet
Outlier Detection Techniques
55 pages
Bioinformatics F&amp M 20100722 Bujak
100% (1)
Bioinformatics F&amp M 20100722 Bujak
27 pages
Random Forest: Implementaciones de Scikit-Learn Sobre QSAR
100% (1)
Random Forest: Implementaciones de Scikit-Learn Sobre QSAR
11 pages
ML MU Unit 2
100% (3)
ML MU Unit 2
84 pages
Linear Regression with Scikit-Learn
No ratings yet
Linear Regression with Scikit-Learn
8 pages
Predictive Model for Retailers
100% (1)
Predictive Model for Retailers
3 pages
Logistic Regression
100% (1)
Logistic Regression
21 pages
Survival Competing Risk
No ratings yet
Survival Competing Risk
29 pages
ML/DS Interview Cheat Sheets
No ratings yet
ML/DS Interview Cheat Sheets
16 pages
Introduction to Statistics Basics
100% (1)
Introduction to Statistics Basics
46 pages
Data Science & Statistics Cheat Sheet
100% (1)
Data Science & Statistics Cheat Sheet
13 pages
Day 5 Supervised Technique-Decision Tree For Classification PDF
100% (1)
Day 5 Supervised Technique-Decision Tree For Classification PDF
58 pages
6 XG Boost - Jupyter Notebook
100% (1)
6 XG Boost - Jupyter Notebook
3 pages
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
100% (1)
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
6 pages
Ensemble Learning: Wisdom of The Crowd
100% (1)
Ensemble Learning: Wisdom of The Crowd
12 pages
Random Forest
No ratings yet
Random Forest
5 pages
Decision Trees
No ratings yet
Decision Trees
25 pages
Python-Linear Regression
No ratings yet
Python-Linear Regression
72 pages
7 Time Series Datasets For Machine Learning
No ratings yet
7 Time Series Datasets For Machine Learning
8 pages
Course Notes - Basic Probability
No ratings yet
Course Notes - Basic Probability
6 pages
Weka Tutorial
No ratings yet
Weka Tutorial
2 pages
R for NGS Data Analysis Beginners
No ratings yet
R for NGS Data Analysis Beginners
5 pages
M8 Logreg
No ratings yet
M8 Logreg
10 pages
Logistic Regression Notes
No ratings yet
Logistic Regression Notes
16 pages
Dissertation Help: Logistic Regression
100% (2)
Dissertation Help: Logistic Regression
6 pages
Discrete Choice Models in Demand Forecasting
No ratings yet
Discrete Choice Models in Demand Forecasting
21 pages
Chernozhukov Et Al. On Double - Debiased Machine Learning
No ratings yet
Chernozhukov Et Al. On Double - Debiased Machine Learning
271 pages
Business Stats Pyq
No ratings yet
Business Stats Pyq
13 pages
V-Vi Semester Syllabus Cse-Iot 22
No ratings yet
V-Vi Semester Syllabus Cse-Iot 22
39 pages
Risks 12 00146
No ratings yet
Risks 12 00146
23 pages
Unit 5 and 6 - Inferential Statistics and Regression Analysis
No ratings yet
Unit 5 and 6 - Inferential Statistics and Regression Analysis
68 pages
Ethiopia Study On Bank Branch Location
No ratings yet
Ethiopia Study On Bank Branch Location
41 pages
Regression MCQuestions
No ratings yet
Regression MCQuestions
8 pages
E.Samoli Et Al 2020
No ratings yet
E.Samoli Et Al 2020
9 pages
CLRM Assumptions & OLS Violations
No ratings yet
CLRM Assumptions & OLS Violations
54 pages
Assignment I Questions Econ. For Acct & Fin. May 2021
80% (5)
Assignment I Questions Econ. For Acct & Fin. May 2021
3 pages
Load Forecasting & Characteristics
No ratings yet
Load Forecasting & Characteristics
23 pages
Conducting Regression Analysis Using SPSS: A Hands-On Guide With
No ratings yet
Conducting Regression Analysis Using SPSS: A Hands-On Guide With
15 pages
Antazo2018 Pedsi Rmhi 18ff
No ratings yet
Antazo2018 Pedsi Rmhi 18ff
11 pages
Capstone Presentation Final
No ratings yet
Capstone Presentation Final
14 pages
Resilient Supplier Selection Guide
No ratings yet
Resilient Supplier Selection Guide
22 pages
Classification (NaiveBayes KNN SVM DecisionTrees)
No ratings yet
Classification (NaiveBayes KNN SVM DecisionTrees)
105 pages
P4 Project Report
No ratings yet
P4 Project Report
28 pages
Predictive Analytics for Businesses
100% (1)
Predictive Analytics for Businesses
32 pages
NPTEL ML Assignment Week1
100% (5)
NPTEL ML Assignment Week1
5 pages
Agri-Business Management
No ratings yet
Agri-Business Management
5 pages
Stolzenberg, R. M. 1980. "The Measurement and Decomposition of Causal
No ratings yet
Stolzenberg, R. M. 1980. "The Measurement and Decomposition of Causal
31 pages
Business Statistics and Analytics in Practice 9th Edition Bowerman Unlocked Test Bank
No ratings yet
Business Statistics and Analytics in Practice 9th Edition Bowerman Unlocked Test Bank
326 pages
3 - Machine Learning Cheat Sheet - DataCamp
No ratings yet
3 - Machine Learning Cheat Sheet - DataCamp
22 pages
MBA Statistics for Management Course Guide
No ratings yet
MBA Statistics for Management Course Guide
31 pages
Resume 190922 - Restri Ayu Safarina
No ratings yet
Resume 190922 - Restri Ayu Safarina
3 pages
ROI, EPS & Stock Price Impact on IDX
No ratings yet
ROI, EPS & Stock Price Impact on IDX
15 pages
Comprehensive SAS & R Analytics Course
No ratings yet
Comprehensive SAS & R Analytics Course
4 pages
Pham Khoa Vien GBS190915 GBS0903 Truong Ngoc Thinh
No ratings yet
Pham Khoa Vien GBS190915 GBS0903 Truong Ngoc Thinh
58 pages
This Content Downloaded From 49.249.225.158 On Mon, 20 Feb 2023 09:13:48 UTC
No ratings yet
This Content Downloaded From 49.249.225.158 On Mon, 20 Feb 2023 09:13:48 UTC
22 pages

Logistic Regression

Uploaded by

Logistic Regression

Uploaded by

1

• When and Why do we Use Logistic

• To predict an outcome variable that is

Predictors do not have to be normally distributed

Logistic regression does not make any assumptions of normality, linearity,

Because it does not impose these

• Logistic regression combines the independent variables to

 Establishing classification system based on the

MULTINOMIAL LOGISTIC REGRESSION

BINARY Y = Dependent Variables

The recommended sample size for each group is at

 MLE is a statistical method for estimating the

IVs: household income measured in units of a thousand dollars , age of

Independent Variables: household income and monthly

Dependent Variables: Take the offer or decline the offer

H0: The model is a good fitting model

You might also like