Chp2 Logistic Regression
Chp2 Logistic Regression
𝑝
and 𝑙𝑜𝑔𝑖𝑡(𝑝) = ln(1−𝑝)
Rather than choosing parameters that minimize the sum of squared errors (like in ordinary
regression), estimation in logistic regression chooses parameters that maximize the
likelihood of observing the sample values.
Logistic regression’s ability to provide probabilities and classify new samples using
continuous and discrete measurements makes it a popular machine learning method.
The logistic regression technique involves dependent variable which can be represented
in the binary (0 or 1, true or false, yes or no) values, means that the outcome could only be
in either one form of two. For example, it can be utilized when we need to find the
probability of successful or fail event.
Here, the same formula as of linear regression is used with the additional sigmoid function,
and the value of Y ranges from 0 to 1.
Logistic regression equation:
Linear regression Y=b0+b1×X1+b2×X2+...+bk×Xk
1
Sigmoid Function 𝑃 = 1+𝑒 −𝑌
By putting Y in Sigmoid function, we get the following result.
𝑝
ln ( )=b0+b1×X1+b2×X2+...+bk×Xk
1−𝑝
The logistic function has asymptotes at 0 and 1, and it crosses the y-axis at 0.5 as shown in
figure below.
A prediction function in logistic regression returns the probability of our observation being
positive, True, or “Yes”. We call this class 1 and its notation is P(class=1).
In order to map this to a discrete class (true/false, cat/dog), we select a threshold value or
tipping point above which we will classify values into class 1 and below which we classify
values into class 2.
p≥0.5, class=1
p<0.5, class=0
Q1. Say we’re given data on student exam results and our goal is to predict whether a student will
pass or fail based on number of hours slept and hours spent studying. We have two features (hours
slept, hours studied) and two classes: passed (1) and failed (0).
𝑇𝑃 + 𝑇𝑁 2 + 2 4
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = = = = 1 𝑖. 𝑒. 100%
𝑁 4 4
𝑇𝑃 2
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 (𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑣𝑒 𝑅𝑎𝑡𝑒) = = =1
𝑇𝑃 + 𝐹𝑃 2 + 0
𝑇𝑁 2
(𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑣𝑒 𝑅𝑎𝑡𝑒) = = =1
𝑇𝑁 + 𝐹𝑁 2 + 0
𝑇𝑃 2
𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 = = =1
𝑇𝑃 + 𝐹𝑁 2 + 0
𝑇𝑁 2
𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 = = =1
𝑇𝑁 + 𝐹𝑃 2 + 0
Q2. Table below shows the amount of saving of individual customers (in lakhs) and whether they
are loan non-defaulters (0 indicates loan defaulters and 1 indicates loan non-defaulters). Determine
the predicted value of each savings using Logistic Regression. Let b0 = -4.07778, b1 = 1.5046.
Also calculate the accuracy.
Amount in Savings (in lakhs) Loan Non-Defaulter
0.50 0
0.75 0
1.00 0
1.25 0
1.50 0
1.75 0
1.75 1
2.00 0
2.25 1
2.50 0
2.75 1
3.00 0
3.25 1
3.50 0
4.00 1
4.25 1
4.50 1
4.75 1
5.00 1
5.50 1
Solution:
𝑇𝑃 + 𝑇𝑁 8 + 8 16
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = = = = 0.8 𝑖. 𝑒. 80%
𝑁 20 20
Linear Regression Vs Logistic Regression
Linear Regression Logistic Regression
Linear regression is used to predict the Logistic Regression is used to predict the
continuous dependent variable using a given categorical dependent variable using a given
set of independent variables. set of independent variables.
Linear Regression is used for solving Logistic regression is used for solving
Regression problem. Classification problems.
In Linear regression, we predict the value of In logistic Regression, we predict the values of
continuous variables. categorical variables.
In linear regression, we find the best fit line, by
In Logistic Regression, we find the S-curve by
which we can easily predict the output. which we can classify the samples.
Least square estimation method is used for Maximum likelihood estimation method is
estimation of accuracy. used for estimation of accuracy.
The output for Linear Regression must be a The output of Logistic Regression must be a
continuous value, such as price, age, etc. Categorical value such as 0 or 1, Yes or No,
etc.
In Linear regression, it is required that In Logistic regression, it is not required to have
relationship between dependent variable and the linear relationship between the dependent
independent variable must be linear. and independent variable.
In linear regression, there may be collinearity In logistic regression, there should not be
between the independent variables. collinearity between the independent variable
Exercise Questions
1. Apply logistic regression algorithm to classify the following data. Use b0 = -0.4, b1 = 0.8
and b2 = -1.1.
X1 X2 Y
1.64 2.63 0
3.4 4.1 0
7.2 2.7 1
6.5 1.8 1
7.6 3.5 1
Also find the accuracy after classification.
Solution:
X1 X2 Y Fitted Value Predicted Value
1.64 2.63 0 0.1212 0
3.4 4.1 0 0.1006 0
7.2 2.7 1 0.9160 1
6.5 1.8 1 0.9437 1
7.6 3.5 1 0.8617 1
Accuracy = 100%