Logistic Regression
Logistic Regression
https://amjadmajid.github.io/tutorials/LR.html 1/9
11/23/24, 1:17 PM logistic Regression
Logistic Regression
Classification
In a classification problem a classifier (a piece of software) maps the inputs to a small and discrete set of
outputs. For example, classifying incoming emails into (i) the spam folder, or (ii) the inbox folder.
In a Binary classification problem there are only two possible outputs. We can label these groups as 1
(positive class) and 0 (negative class), y ∈ {0, 1} . For instance, a binary classification problems can be
Since classification is not a linear function, using linear regression to solve a classification can be a bad
choice.
Hypothesis representation
We are going to use logistic regression to solve calssification problems. Logistic regression hypothesis
accepts arbitrary inputs and bounds the output between zero and one, 0 ≤ y ≤ 1.
The logistic regression hypothesis function is constructed by feeding a sigmoid (or logistic) function a linear
model. A sigmoid function (g(z)
1
=
1+e − z
) has the following shape
By replacing z with a linear (or polynomial) function θ0 + θ1 x 1 + … (written in linear algebra notation as
θ x ) we construct the logistic regression hypothesis function,
T
1
h θ (x) =
T
−θ x
1 + e
As it can bee seen from the figure and the function form above our new hypothesis takes any number of
features and any input values and bounds the output between 0 and 1.
To turn this hypothesis function into a binary classifier we need to set a boundary and map some of the
output interval to 1 and the rest to 0. Choosing 0.5 as the boundary result in
https://amjadmajid.github.io/tutorials/LR.html 2/9
11/23/24, 1:17 PM logistic Regression
considered as part of the area where y = 0 or y = 1 , here we choose to make it belong to the area
where y = 1.
Cost Function
The squared error cost function, used with linear regression, results in many local minimum when applied to
logistic regression. This prevents gradient descent from finding the global minimum of the cost function.
Therefore, we will choose the following cost function which can be shown using maximum likelyhood theory
from that it is a convex function--a bow function with a single (global) minimum.
m
1 (i) (i) (i) (i)
J (θ) = − ∑[−y log(h θ (x )) + (1 − y ) log(1 − h θ (x ))]
m
i=1
The intuition behind choosing this cost function. Since the output y is either 1 or 0 this cost function will be in
one of the following two cases:
case 1: where y = 0
m
1 (i)
J (θ) = − ∑ log(1 − h θ (x ))
m
i=1
As we can see from the figure above, when hθ (x) output ≈ 0 , the error ≈ 0 also, and when the prediction
of hθ (x) approaches 1 the error approaches ∞
case 2: where y = 1
m
1 (i)
J (θ) = − ∑ log(h θ (x ))
m
i=1
Using analogous arguement we can see that, in case 2, when hθ (x) output ≈ 1 , the error ≈ 0 and when
h θ (x) output ≈ 0 , the error ≈ ∞
Gradient Descent
The Gradient Descent algorithm is
repeat:{
∂
θj = θj − α J (θ)
∂ θj
https://amjadmajid.github.io/tutorials/LR.html 3/9
11/23/24, 1:17 PM logistic Regression
}.
repeat:{
m
α (i) (i) (i)
θj = θj − ∑(h θ (x ) − y )x
j
m
i=1
}.
Email tagging: tag incoming emailt with labels like 'friends', 'work', 'hobby'.
Classifying patients with stuffy nose to fine, cold, and flu.
Using the One-vs-all method we can turn a multi-class classification prblem into a set of binary classification
problems. Let us consider the email tagging problem. We need to find, using the same procedure for the
binary classification problems, three hypotheses (i) one to classify friends vs the rest, (ii) one to classify work
vs the rest, and (iii) the last one to classify hobby vs the rest. Any testing input is fed to the three classifiers
(hypotheses). The hypothesis that produces the highest output value (as the output of a logistic regression
funciton is bounded between 1 and 0) classify the input to its group.
https://amjadmajid.github.io/tutorials/LR.html 4/9
11/23/24, 1:17 PM logistic Regression
Regularization
Overfitting refers to the problem of generating too complex hypothesis funciton that fits the training data very
well, but it does not generalize for new testing data becuase the hypothesis line is very wavy.
This terminology is applied to both linear and logistic regression. There are two main options to address the
issue of overfitting problem:
2) Regularization
where cost(hθ (x) − y) can be the part of cost function of linear regression or logistic regression.
λ is the regularization parameter that controls θ adaptation.
https://amjadmajid.github.io/tutorials/LR.html 5/9
11/23/24, 1:17 PM logistic Regression
Example:
θ1
Minimize the following cost function J (θ) = ( θ 1 − 5)
2
+ ( θ 2 − 5)
2
over θ = [ ] , (min J (θ)).
θ
θ2
To use the advanced optimization algorithms we need to follow the following procedures:
1) write a function that takes θ and returns the value of the cost function and its partial derivatives at the
given θ .
2) Set the maximum number of iterations, the initial θ values and enable the gradient optimization.
3) Call fminunc (stands for function minimization unconstained). @costFunction is a poiner to the
costFunction() function.
https://amjadmajid.github.io/tutorials/LR.html 6/9
11/23/24, 1:17 PM logistic Regression
%matplotlib inline
y=[]
x=np.arange(-10,10,0.2)
for i in x:
y.append(1/(1+m.exp(-i)))
plt.figure(figsize=(8,3))
plt.plot(x,y)
plt.grid()
plt.savefig('sigmoid.png')
https://amjadmajid.github.io/tutorials/LR.html 7/9
11/23/24, 1:17 PM logistic Regression
%matplotlib inline
y=[]
x=np.arange(0.01,1,0.01)
for i in x:
y.append(-m.log(i))
plt.figure(figsize=(8,5))
plt.plot(x,y)
plt.grid()
plt.xlabel(r'$h_{\theta}(x)$', fontsize=20)
plt.savefig('minusLog.png')
https://amjadmajid.github.io/tutorials/LR.html 8/9
11/23/24, 1:17 PM logistic Regression
%matplotlib inline
y=[]
x=np.arange(0,1,0.01)
for i in x:
y.append(-m.log(1-i))
plt.figure(figsize=(8,5))
plt.plot(x,y)
plt.grid()
plt.xlabel(r'$h_{\theta}(x)$', fontsize=20)
plt.savefig('oneMinusLog.png')
https://amjadmajid.github.io/tutorials/LR.html 9/9