Experiment No 8
Experiment No 8
8
Title: Logistic Regression
Aim: Implement and evaluate Logistic Regression algorithm on binary classification problem.
Outcomes: At the end of the experiment the student should be able to:
1. Understand Logistic Regression algorithm
2. Implement and evaluate Logistic Regression algorithm on binary classification
problem.
Contents:
Logistic regression is a supervised machine learning algorithm mainly used for
classification tasks where the goal is to predict the probability that an instance belongs to a
given class or not. It is a kind of statistical algorithm, which analyze the relationship between
a set of independent variables and the dependent binary variables. It is a powerful tool for
decision-making. For example email spam or not.
Logistic Regression:
Logistic regression is a supervised machine learning algorithm mainly used for binary
classification where we use a logistic function, also known as a sigmoid function that takes
input as independent variables and produces a probability value between 0 and 1. For example,
we have two classes Class 0 and Class 1 if the value of the logistic function for an input is
greater than 0.5 (threshold value) then it belongs to Class 1 it belongs to Class 0. It’s referred
to as regression because it is the extension of linear regression but is mainly used for
classification problems. The difference between linear regression and logistic regression is that
linear regression output is the continuous value that can be anything while logistic regression
predicts the probability that an instance belongs to a given class or not.
Understanding Logistic Regression
It is used for predicting the categorical dependent variable using a given set of independent
variables.
Logistic regression predicts the output of a categorical dependent variable.
Therefore the outcome must be a categorical or discrete value.
It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the
exact value as 0 and 1, it gives the probabilistic values which lie between 0 and 1.
Logistic Regression is much similar to the Linear Regression except that how
they are used. Linear Regression is used for solving Regression problems, whereas
Logistic regression is used for solving the classification problems.
In Logistic regression, instead of fitting a regression line, we fit an “S” shaped
logistic function, which predicts two maximum values (0 or 1).
The curve from the logistic function indicates the likelihood of something such
as whether the cells are cancerous or not, a mouse is obese or not based on its
weight, etc.
Logistic Regression is a significant machine learning algorithm because it has
the ability to provide probabilities and classify new data using continuous and
discrete datasets.
Logistic Regression can be used to classify the observations using different
types of data and can easily determine the most effective variables used for the
classification.
Logistic Function (Sigmoid Function):
The sigmoid function is a mathematical function used to map the predicted
values to probabilities.
It maps any real value into another value within a range of 0 and 1. The value of
the logistic regression must be between 0 and 1, which cannot go beyond this
limit, so it forms a curve like the “S” form.
The S-form curve is called the Sigmoid function or the logistic function.
In logistic regression, we use the concept of the threshold value, which defines
the probability of either 0 or 1. Such as values above the threshold value tends to
1, and a value below the threshold values tends to 0.
Terminologies involved in Logistic Regression:
Independent variables: The input characteristics or predictor factors applied to
the dependent variable’s predictions.
Dependent variable: The target variable in a logistic regression model, which
we are trying to predict.
Logistic function: The formula used to represent how the independent and
dependent variables relate to one another. The logistic function transforms the
input variables into a probability value between 0 and 1, which represents the
likelihood of the dependent variable being 1 or 0.
Odds: It is the ratio of something occurring to something not occurring. it is
different from probability as the probability is the ratio of something occurring to
everything that could possibly occur.
Log-odds: The log-odds, also known as the logit function, is the natural
logarithm of the odds. In logistic regression, the log odds of the dependent
variable are modeled as a linear combination of the independent variables and the
intercept.
Coefficient: The logistic regression model’s estimated parameters, show how
the independent and dependent variables relate to one another.
Intercept: A constant term in the logistic regression model, which represents
the log odds when all independent variables are equal to zero.
Maximum likelihood estimation: The method used to estimate the coefficients
of the logistic regression model, which maximizes the likelihood of observing the
data given the model.
How does Logistic Regression work?
The logistic regression model transforms the linear regression function continuous value
output into categorical value output using a sigmoid function, which maps any real-valued
set of independent variables input into a value between 0 and 1. This function is known as
the logistic function.
Let the independent input features be
Sigmoid function
As shown above, the figure sigmoid function converts the continuous variable data into
the probability i.e. between 0 and 1.
tends towards 1 as
tends towards 0 as
is always bounded between 0 and 1
where the probability of being a class can be measured as: