0% found this document useful (0 votes)
3 views

Logistic Regression

Uploaded by

Vidhi Tanwar
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Logistic Regression

Uploaded by

Vidhi Tanwar
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Logistic regression

Logistic regression is a supervised machine learning algorithm used


for classification tasks where the goal is to predict the probability that an instance
belongs to a given class or not. Logistic regression is a statistical algorithm which
analyze the relationship between two data factors.
What is Logistic Regression?
Logistic regression is used for binary classification where we use sigmoid function,
that takes input as independent variables and produces a probability value
between 0 and 1.
For example, we have two classes Class 0 and Class 1 if the value of the logistic
function for an input is greater than 0.5 (threshold value) then it belongs to Class
1 otherwise it belongs to Class 0. It’s referred to as regression because it is the
extension of linear regression but is mainly used for classification problems.
Logistic Function – Sigmoid Function
 The sigmoid function is a mathematical function used to map the predicted
values to probabilities.
 It maps any real value into another value within a range of 0 and 1. The
value of the logistic regression must be between 0 and 1, which cannot go
beyond this limit, so it forms a curve like the “S” form.
 The S-form curve is called the Sigmoid function or the logistic function.
 In logistic regression, we use the concept of the threshold value, which
defines the probability of either 0 or 1. Such as values above the threshold
value tends to 1, and a value below the threshold values tends to 0.
 Sigmoid Function:
As shown above, the figure sigmoid function converts the continuous variable
data into the probability i.e. between 0 and 1.
 σ(z) tends towards 1 as z→∞
 σ(z) tends towards 0 as z→−∞
 σ(z) is always bounded between 0 and 1
Assumptions of Logistic Regression:
1. Independent observations: Each observation is independent of the other.
meaning there is no correlation between any input variables.
2. Binary dependent variables: It takes the assumption that the dependent
variable must be binary or dichotomous, meaning it can take only two
values. For more than two categories SoftMax functions are used.
3. Linearity relationship between independent variables and log odds: The
relationship between the independent variables and the log odds of the
dependent variable should be linear.
4. No outliers: There should be no outliers in the dataset.
5. Large sample size: The sample size is sufficiently large.
Types of Logistic Regression
On the basis of the categories, Logistic Regression can be classified into three
types:
1. Binomial: In binomial Logistic regression, there can be only two possible
types of the dependent variables, such as 0 or 1, Pass or Fail, etc.
2. Multinomial: In multinomial Logistic regression, there can be 3 or more
possible unordered types of the dependent variable, such as cat, dogs, or
sheep
How to Evaluate Logistic Regression Model?
We can evaluate the logistic regression model using the following metrics:

 Confusion matrix:

A confusion matrix is a matrix that summarizes the performance of a


machine learning model on a set of test data. It is a means of displaying the
number of accurate and inaccurate instances based on the model’s
predictions. It is often used to measure the performance of classification
models, which aim to predict a categorical label for each input instance.
The matrix displays the number of instances produced by the model on the
test data.
1. True Positive (TP): The model correctly predicted a positive outcome
(the actual outcome was positive).
2. True Negative (TN): The model correctly predicted a negative
outcome (the actual outcome was negative).
3. False Positive (FP): The model incorrectly predicted a positive
outcome (the actual outcome was negative). Also known as a Type I
error.
4. False Negative (FN): The model incorrectly predicted a negative
outcome (the actual outcome was positive). Also known as a Type II
error.

 Accuracy: provides the proportion of correctly classified instances.


Accuracy=(TruePositives+TrueNegatives) /TP+TN+FP+FN
 Precision: focuses on the accuracy of positive predictions.
Precision=TruePositives/TruePositives+FalsePositives
 Recall (Sensitivity or True Positive Rate): measures the proportion of
correctly predicted positive instances among all actual positive instances.
Recall=TruePositives/TruePositives+FalseNegatives
 F1 Score: is the harmonic mean of precision and recall.
F1Score=2∗Precision∗Recall/Precision+Recall

 Area Under the Receiver Operating Characteristic Curve (AUC-ROC): The


ROC curve plots the true positive rate against the false positive rate at
various thresholds.it measures the area under this curve, providing an
aggregate measure of a model’s performance across different classification
thresholds.
 Area Under the Precision-Recall Curve (AUC-PR): Similar to AUC-ROC, it
measures the area under the precision-recall curve, providing a summary of
a model’s performance across different precision-recall trade-offs.

You might also like