Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5
Logistic regression
Logistic regression is a supervised machine learning algorithm used
for classification tasks where the goal is to predict the probability that an instance belongs to a given class or not. Logistic regression is a statistical algorithm which analyze the relationship between two data factors. What is Logistic Regression? Logistic regression is used for binary classification where we use sigmoid function, that takes input as independent variables and produces a probability value between 0 and 1. For example, we have two classes Class 0 and Class 1 if the value of the logistic function for an input is greater than 0.5 (threshold value) then it belongs to Class 1 otherwise it belongs to Class 0. It’s referred to as regression because it is the extension of linear regression but is mainly used for classification problems. Logistic Function – Sigmoid Function The sigmoid function is a mathematical function used to map the predicted values to probabilities. It maps any real value into another value within a range of 0 and 1. The value of the logistic regression must be between 0 and 1, which cannot go beyond this limit, so it forms a curve like the “S” form. The S-form curve is called the Sigmoid function or the logistic function. In logistic regression, we use the concept of the threshold value, which defines the probability of either 0 or 1. Such as values above the threshold value tends to 1, and a value below the threshold values tends to 0. Sigmoid Function: As shown above, the figure sigmoid function converts the continuous variable data into the probability i.e. between 0 and 1. σ(z) tends towards 1 as z→∞ σ(z) tends towards 0 as z→−∞ σ(z) is always bounded between 0 and 1 Assumptions of Logistic Regression: 1. Independent observations: Each observation is independent of the other. meaning there is no correlation between any input variables. 2. Binary dependent variables: It takes the assumption that the dependent variable must be binary or dichotomous, meaning it can take only two values. For more than two categories SoftMax functions are used. 3. Linearity relationship between independent variables and log odds: The relationship between the independent variables and the log odds of the dependent variable should be linear. 4. No outliers: There should be no outliers in the dataset. 5. Large sample size: The sample size is sufficiently large. Types of Logistic Regression On the basis of the categories, Logistic Regression can be classified into three types: 1. Binomial: In binomial Logistic regression, there can be only two possible types of the dependent variables, such as 0 or 1, Pass or Fail, etc. 2. Multinomial: In multinomial Logistic regression, there can be 3 or more possible unordered types of the dependent variable, such as cat, dogs, or sheep How to Evaluate Logistic Regression Model? We can evaluate the logistic regression model using the following metrics:
Confusion matrix:
A confusion matrix is a matrix that summarizes the performance of a
machine learning model on a set of test data. It is a means of displaying the number of accurate and inaccurate instances based on the model’s predictions. It is often used to measure the performance of classification models, which aim to predict a categorical label for each input instance. The matrix displays the number of instances produced by the model on the test data. 1. True Positive (TP): The model correctly predicted a positive outcome (the actual outcome was positive). 2. True Negative (TN): The model correctly predicted a negative outcome (the actual outcome was negative). 3. False Positive (FP): The model incorrectly predicted a positive outcome (the actual outcome was negative). Also known as a Type I error. 4. False Negative (FN): The model incorrectly predicted a negative outcome (the actual outcome was positive). Also known as a Type II error.
Accuracy: provides the proportion of correctly classified instances.
Accuracy=(TruePositives+TrueNegatives) /TP+TN+FP+FN Precision: focuses on the accuracy of positive predictions. Precision=TruePositives/TruePositives+FalsePositives Recall (Sensitivity or True Positive Rate): measures the proportion of correctly predicted positive instances among all actual positive instances. Recall=TruePositives/TruePositives+FalseNegatives F1 Score: is the harmonic mean of precision and recall. F1Score=2∗Precision∗Recall/Precision+Recall
Area Under the Receiver Operating Characteristic Curve (AUC-ROC): The
ROC curve plots the true positive rate against the false positive rate at various thresholds.it measures the area under this curve, providing an aggregate measure of a model’s performance across different classification thresholds. Area Under the Precision-Recall Curve (AUC-PR): Similar to AUC-ROC, it measures the area under the precision-recall curve, providing a summary of a model’s performance across different precision-recall trade-offs.