0% found this document useful (0 votes)
45 views18 pages

W8 - Logistic Regression

Logistic regression is a statistical method used for classification problems. It uses a logistic function to model the probabilities of different classes. The logistic regression model calculates a hypothetical function using weights and a bias, and applies a sigmoid activation function to convert the output to a probability value between 0 and 1. It uses a logistic cost function instead of mean squared error for optimization. Gradient descent is used to update the weights iteratively to minimize the cost function. The weights are adjusted to increase the likelihood of the training data. Once trained, the model predicts class 1 if the probability is above 0.5, and class 0 if below 0.5. Examples demonstrate training a logistic regression model on sample data to classify articles as
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views18 pages

W8 - Logistic Regression

Logistic regression is a statistical method used for classification problems. It uses a logistic function to model the probabilities of different classes. The logistic regression model calculates a hypothetical function using weights and a bias, and applies a sigmoid activation function to convert the output to a probability value between 0 and 1. It uses a logistic cost function instead of mean squared error for optimization. Gradient descent is used to update the weights iteratively to minimize the cost function. The weights are adjusted to increase the likelihood of the training data. Once trained, the model predicts class 1 if the probability is above 0.5, and class 0 if below 0.5. Examples demonstrate training a logistic regression model on sample data to classify articles as
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18

LOGISTIC REGRESSION

Likelihood Vs Probability
Probability vs Statistics
• In probability theory we consider some underlying process which has
some randomness or uncertainty modeled by random variables, and
we figure out what happens.
• In statistics we observe something that has happened, and try to
figure out what underlying process would explain those observations.
• Likelihood function is a fundamental concept in statistical inference.
• It indicates how likely a particular population is to produce an
observed sample.
• Probability points to chances while likelihood denotes a possibility.
Likelihood Vs Probability
• Probability is simply how likely something is to happen.
• The occurrence of discrete values yk is expressed by the probability P(yk).
• The distribution of all possible values of discrete random variable y is
expressed as probability distribution.
• We assume that there is some a priori probability (or simply prior) P(yk)
that the next feature vector belongs to the class k.
• P(x| yk) is called the class likelihood and is the conditional probability that a
pattern belonging to class yk has the associated observation value x.
• Any class that maximizes P(x| yq) is called Maximum Likelihood (ML) class.
Likelihood Vs Probability
• Probability follows clear parameters and computations while a likelihood is
based merely on observed factors/data.

• P(data; μ, σ) It means “the probability density of observing the data with


model parameters μ and σ”. It’s worth noting that we can generalise this to
any number of parameters and any distribution.
• On the other hand L(μ, σ; data) means “the likelihood of the parameters μ
and σ taking certain values given that we’ve observed a bunch of data.”
• But despite these two things being equal, the likelihood and the
probability density are fundamentally asking different questions — one is
asking about the data and the other is asking about the parameter values.
Example of Probability
• Consider a dataset containing the heights of the people of a particular
country. Let’s say the mean of the data is 170 & the standard deviation is 3.5.
• When Probability has to be calculated of any situation using this dataset,
then the dataset features will be constant i.e. mean & standard deviation of
the dataset will be constant, they will not be altered.
• Let’s say the probability of height > 170 cm has to be calculated for a random
record in the dataset, then that will be calculated using the information
shown below:

• While calculating probability, feature value can be varied, but the


characteristics(mean & Standard Deviation) of the data distribution cannot
be altered.
Example of Likelihood
• Likelihood calculation involves calculating the best distribution or best
characteristics of data given a particular feature value or situation.
• Consider the exactly same dataset example as provided above for
probability, if their likelihood of height > 170 cm has to be calculated then it
will be done using the information shown below:

• In the calculation of the Likelihood, the equation of the conditional


probability flips as compared to the equation in the probability calculation.
• Here, the dataset features will be varied, i.e. Mean & Standard Deviation of
the dataset will be varied to get the maximum likelihood for height > 170 cm.
• The likelihood in very simple terms means to increase the chances of a
particular situation to happen/occur by varying the characteristics of the
dataset distribution.
Logistic Regression Implementation
Hypothetical function
• In Logistic Regression, we apply the sigmoid activation function on the
hypothetical function of linear regression.
• So the resultant hypothetical function for logistic regression is given below:
h( x ) = sigmoid( wx + b )
Here, w is the weight vector.
x is the feature vector.
b is the bias.
sigmoid( z ) = 1 / ( 1 + e( - z ) )
Cost function
• The cost function of linear regression (mean square error) can’t be used in
logistic regression because it is a non-convex function of weights.
• Optimizing algorithms like i.e gradient descent only converge convex
function into a global minimum.
• So, the simplified cost function we use :
J = - ylog( h(x) ) - ( 1 - y )log( 1 - h(x) ) (it’s derived in last class)
here, y is the real target value
h( x ) = sigmoid( wx + b )
For y = 0, J = - log( 1 - h(x) )
and y = 1, J = - log( h(x) )
Gradient Descent Calculation
repeat until convergence {
tmpi = wi - alpha * dwi
wi = tmpi
}
where alpha is the learning rate.
• The chain rule is used to calculate the gradients like i.e dwi.

• here, a = sigmoid( z ) and z = wx + b.


Next?
• Update weights in an iterative process
• After completing all iterations, calculate Hypothetical function h( x )

Threshold classifier output h( x ) at 0.5:


If h( x ) , predict “y = 1”
If h( x ) , predict “y = 0”
Logistic Regression Numerical
Example 1
• Some samples of two classes of
articles: Technical (1) and Non-
technical (0) are given.
• Each class has two features:
• Time, which represent the
average time required to read an
article in hours,
• Sentences, representing a
number of sentences in a book
• first, we need to train our logistic
regression model.

1.9 3.1 ?
Example 1
• Training involves finding optimal
values of coefficients which are B0,
β1, and β2.
• While training, we find some value
of coefficients in the first step and
use those coefficients in another
step to optimize their value.
• We continue to do it until we get
consistent accuracy from the model.

=
1.9 3.1 ?
Example 1
• After 20 iteration, we get:
B0 = -0.1068913
B1 = 0.41444855
B2 = -0.2486209
• Thus, the decision boundary is given
as:
Z = B0+B1*X1+B2*X2
Z = -0.1068913 +0.41444855*Time-
0.2486209*Sentences

1.9 3.1 ?
Example 1
• For, X1 = 1.9 and X2 = 3.1, we get:
Z = -0.101818+0.41444855*1.9 -
0.2486209*3.1
Z = -0.085090545
• Now, we use sigmoid function to
find the probability and thus
predicting the class of given
variables.

• As y=0.477, is less than 0.5, we can


safely classify given sample to class
Non-technical. 1.9 3.1 ?
Examples 2 & 3
• Can be seen here in the following links
• https://machinelearningmastery.com/logistic-regression-tutorial-for-
machine-learning/
• https://courses.lumenlearning.com/introstats1/chapter/introduction-
to-logistic-regression/

You might also like