0% found this document useful (0 votes)
7 views6 pages

Chp2 Logistic Regression

Logistic regression is a supervised classification algorithm used to predict binary outcomes based on a set of independent variables. It estimates the probability of a dependent variable being in one of two classes using a logit transformation and a sigmoid function. The document also includes examples of applying logistic regression to predict outcomes based on student exam results and customer savings, along with accuracy calculations.

Uploaded by

pahaj4135
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views6 pages

Chp2 Logistic Regression

Logistic regression is a supervised classification algorithm used to predict binary outcomes based on a set of independent variables. It estimates the probability of a dependent variable being in one of two classes using a logit transformation and a sigmoid function. The document also includes examples of applying logistic regression to predict outcomes based on student exam results and customer savings, along with accuracy calculations.

Uploaded by

pahaj4135
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Logistic Regression

 Logistic regression is basically a supervised classification algorithm.


 In a classification problem, the target variable (or output), y, can take only discrete values
for given set of features (or inputs), X.
 In logistic regression, the dependent variable is binary or dichotomous, i.e. it only
contains data coded as 1 (TRUE, success, Infectious, etc.) or 0 (FALSE, failure, non-
infectious, etc.).
 The goal of logistic regression is to find the best fitting model to describe the relationship
between the dichotomous characteristic of interest (dependent variable = response or
outcome variable) and a set of independent (predictor or explanatory) variables.
 Logistic regression generates the coefficients (and its standard errors and significance
levels) of a formula to predict a logit transformation of the probability of dependent
variable:
logit(p)=b0+b1X1+b2X2+b3X3+...+bkXk
where p is the probability of presence of the dependent variable.
 The logit transformation is defined as the logged odds, where
𝑝 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑝𝑟𝑒𝑠𝑒𝑛𝑐𝑒 𝑜𝑓 𝑐ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟𝑖𝑠𝑡𝑖𝑐𝑠
𝑜𝑑𝑑𝑠 = =
1−𝑝 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑎𝑏𝑠𝑒𝑛𝑐𝑒 𝑜𝑓 𝑐ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟𝑖𝑠𝑡𝑖𝑐𝑠

𝑝
and 𝑙𝑜𝑔𝑖𝑡(𝑝) = ln(1−𝑝)
 Rather than choosing parameters that minimize the sum of squared errors (like in ordinary
regression), estimation in logistic regression chooses parameters that maximize the
likelihood of observing the sample values.
 Logistic regression’s ability to provide probabilities and classify new samples using
continuous and discrete measurements makes it a popular machine learning method.
 The logistic regression technique involves dependent variable which can be represented
in the binary (0 or 1, true or false, yes or no) values, means that the outcome could only be
in either one form of two. For example, it can be utilized when we need to find the
probability of successful or fail event.
 Here, the same formula as of linear regression is used with the additional sigmoid function,
and the value of Y ranges from 0 to 1.
 Logistic regression equation:
Linear regression Y=b0+b1×X1+b2×X2+...+bk×Xk
1
Sigmoid Function 𝑃 = 1+𝑒 −𝑌
By putting Y in Sigmoid function, we get the following result.
𝑝
ln ( )=b0+b1×X1+b2×X2+...+bk×Xk
1−𝑝
 The logistic function has asymptotes at 0 and 1, and it crosses the y-axis at 0.5 as shown in
figure below.
 A prediction function in logistic regression returns the probability of our observation being
positive, True, or “Yes”. We call this class 1 and its notation is P(class=1).
 In order to map this to a discrete class (true/false, cat/dog), we select a threshold value or
tipping point above which we will classify values into class 1 and below which we classify
values into class 2.
 p≥0.5, class=1
 p<0.5, class=0

Q1. Say we’re given data on student exam results and our goal is to predict whether a student will
pass or fail based on number of hours slept and hours spent studying. We have two features (hours
slept, hours studied) and two classes: passed (1) and failed (0).

Studied (x1) Slept (x2) Passed (y)


4.85 9.63 0
8.62 3.23 1
5.43 8.23 0
9.21 6.34 1
Let b0 = -0.406, b1 = 0.8525, b2 = -1.105. Determine the predicted value of y.
For 1st input tuple, x1 = 4.85 and x2 = 9.63,
1 1
𝑦= = = 0.0022
1 + 𝑒 −(𝑏0 +𝑏1𝑥1 +𝑏2𝑥2 ) 1 + 𝑒 −(−0.406+0.8525∗4.85−1.105∗9.63)
For 2nd input tuple, x1 = 8.62 and x2 = 3.23,
1 1
𝑦= −(𝑏 +𝑏 𝑥 +𝑏 𝑥 )
= −(−0.406+0.8525∗8.62−1.105∗3.23)
= 0.96
1+𝑒 0 1 1 2 2 1+𝑒
For 3rd input tuple, x1 = 5.43 and x2 = 8.23,
1 1
𝑦= −(𝑏 +𝑏 𝑥 +𝑏 𝑥 )
= −(−0.406+0.8525∗5.43−1.105∗8.23)
= 0.0076
1+𝑒 0 1 1 2 2 1+𝑒
For 4th input tuple, x1 = 9.21 and x2 = 6.34,
1 1
𝑦= −(𝑏 +𝑏 𝑥 +𝑏 𝑥 )
= −(−0.406+0.8525∗4.85−1.105∗9.63)
= 0.6082
1+𝑒 0 1 1 2 2 1+𝑒

The predicted and actual value are shown in table below:


Studied (x1) Slept (x2) Passed (y) Fitted Value Prediction
4.85 9.63 0 0.0022 0
8.62 3.23 1 0.96 1
5.43 8.23 0 0.0076 0
9.21 6.34 1 0.6082 1
Confusion Matrix: Helps classify the values that were correctly predicted using the model
built.
Predicted 0 Predicted 1
Actual 0 True Negative (TN) False Positive (FP)
Actual 1 False Negative (FN) True Positive (TP)

From the table above for the example:


Predicted 0 Predicted 1
Actual 0 2 (TN) 0 (FP)
Actual 1 0 (FN) 2 (TP)

𝑇𝑃 + 𝑇𝑁 2 + 2 4
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = = = = 1 𝑖. 𝑒. 100%
𝑁 4 4
𝑇𝑃 2
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 (𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑣𝑒 𝑅𝑎𝑡𝑒) = = =1
𝑇𝑃 + 𝐹𝑃 2 + 0
𝑇𝑁 2
(𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑣𝑒 𝑅𝑎𝑡𝑒) = = =1
𝑇𝑁 + 𝐹𝑁 2 + 0
𝑇𝑃 2
𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 = = =1
𝑇𝑃 + 𝐹𝑁 2 + 0
𝑇𝑁 2
𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 = = =1
𝑇𝑁 + 𝐹𝑃 2 + 0

Q2. Table below shows the amount of saving of individual customers (in lakhs) and whether they
are loan non-defaulters (0 indicates loan defaulters and 1 indicates loan non-defaulters). Determine
the predicted value of each savings using Logistic Regression. Let b0 = -4.07778, b1 = 1.5046.
Also calculate the accuracy.
Amount in Savings (in lakhs) Loan Non-Defaulter
0.50 0
0.75 0
1.00 0
1.25 0
1.50 0
1.75 0
1.75 1
2.00 0
2.25 1
2.50 0
2.75 1
3.00 0
3.25 1
3.50 0
4.00 1
4.25 1
4.50 1
4.75 1
5.00 1
5.50 1
Solution:

Given b0 = -4.07778, b1 = 1.5046


Using Logistic Regression, probability of customer being loan non-defaulter is given by
1 1
𝑦= =
1 + 𝑒 −(𝑏0 +𝑏1 𝑥1 ) 1 + 𝑒 −(−4.0778+1.5046∗𝑠𝑎𝑣𝑖𝑛𝑔𝑠)
Amount in Savings (in lakhs) Loan Non-Defaulter/Defaulter Fitted Value Prediction
0.50 0 0.035 0
0.75 0 0.049 0
1.00 0 0.071 0
1.25 0 0.100 0
1.50 0 0.139 0
1.75 0 0.191 0
1.75 1 0.191 0
2.00 0 0.256 0
2.25 1 0.334 0
2.50 0 0.422 0
2.75 1 0.515 1
3.00 0 0.607 1
3.25 1 0.693 1
3.50 0 0.766 1
4.00 1 0.874 1
4.25 1 0.910 1
4.50 1 0.937 1
4.75 1 0.956 1
5.00 1 0.969 1
5.50 1 0.985 1
From the table above for the example, the confusion matrix is:
Predicted 0 Predicted 1
Actual 0 8 (TN) 2 (FP)
Actual 1 2 (FN) 8 (TP)

𝑇𝑃 + 𝑇𝑁 8 + 8 16
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = = = = 0.8 𝑖. 𝑒. 80%
𝑁 20 20
Linear Regression Vs Logistic Regression
Linear Regression Logistic Regression
Linear regression is used to predict the Logistic Regression is used to predict the
continuous dependent variable using a given categorical dependent variable using a given
set of independent variables. set of independent variables.
Linear Regression is used for solving Logistic regression is used for solving
Regression problem. Classification problems.
In Linear regression, we predict the value of In logistic Regression, we predict the values of
continuous variables. categorical variables.
In linear regression, we find the best fit line, by
In Logistic Regression, we find the S-curve by
which we can easily predict the output. which we can classify the samples.
Least square estimation method is used for Maximum likelihood estimation method is
estimation of accuracy. used for estimation of accuracy.
The output for Linear Regression must be a The output of Logistic Regression must be a
continuous value, such as price, age, etc. Categorical value such as 0 or 1, Yes or No,
etc.
In Linear regression, it is required that In Logistic regression, it is not required to have
relationship between dependent variable and the linear relationship between the dependent
independent variable must be linear. and independent variable.
In linear regression, there may be collinearity In logistic regression, there should not be
between the independent variables. collinearity between the independent variable
Exercise Questions

1. Apply logistic regression algorithm to classify the following data. Use b0 = -0.4, b1 = 0.8
and b2 = -1.1.
X1 X2 Y
1.64 2.63 0
3.4 4.1 0
7.2 2.7 1
6.5 1.8 1
7.6 3.5 1
Also find the accuracy after classification.
Solution:
X1 X2 Y Fitted Value Predicted Value
1.64 2.63 0 0.1212 0
3.4 4.1 0 0.1006 0
7.2 2.7 1 0.9160 1
6.5 1.8 1 0.9437 1
7.6 3.5 1 0.8617 1
Accuracy = 100%

You might also like