ML | Why Logistic Regression in Classification ?
Last Updated :
06 Mar, 2023
Using Linear Regression, all predictions >= 0.5 can be considered as 1 and rest all < 0.5 can be considered as 0. But then the question arises why classification can't be performed using it? Problem - Suppose we are classifying a mail as spam or not spam and our output is y, it can be 0(spam) or 1(not spam). In case of Linear Regression, hθ(x) can be > 1 or < 0. Although our prediction should be in between 0 and 1, the model will predict value out of the range i.e. maybe > 1 or < 0. So, that's why for a Classification task, Logistic/Sigmoid Regression plays its role.
Logistic regression is a statistical method commonly used in machine learning for binary classification problems, where the goal is to predict one of two possible outcomes, such as true/false or yes/no. Here are some reasons why logistic regression is widely used in classification tasks:
Simple and interpretable: Logistic regression is a relatively simple algorithm that is easy to understand and interpret. It can provide insights into the relationship between the independent variables and the probability of a particular outcome.
Linear decision boundary: Logistic regression can be used to model linear decision boundaries, which makes it useful for separating data points that belong to different classes.
Efficient training: Logistic regression can be trained quickly, even with large datasets, and is less computationally expensive than more complex models like neural networks.
Robust to noise: Logistic regression can handle noise in the input data and is less prone to overfitting compared to other machine learning algorithms.
Works well with small datasets: Logistic regression can perform well even when there is limited data available, making it a useful algorithm when dealing with small datasets.
Overall, logistic regression is a popular and effective method for binary classification problems. However, it may not be suitable for more complex classification problems where there are multiple classes or nonlinear relationships between the input variables and the outcome.

h_{\Theta} (x) = g (\Theta ^{T}x) z = \Theta ^{T}x g(z) = \frac{1}{1+e^{-z}}
Here, we plug θTx into logistic function where θ are the weights/parameters and x is the input and hθ(x) is the hypothesis function. g() is the sigmoid function.
h_{\Theta} (x) = P( y =1 | x ; \Theta )
It means that y = 1 probability when x is parameterized to θ To get the discrete values 0 or 1 for classification, discrete boundaries are defined. The hypothesis function cab be translated as
h_{\Theta} (x) \geq 0.5 \rightarrow y = 1 h_{\Theta} (x) < 0.5 \rightarrow y = 0
{g(z) \geq 0.5} \\ {\Rightarrow \Theta ^{T}x \geq 0.5} \\ {\Rightarrow z \geq 0.5 }
Decision Boundary is the line that distinguishes the area where y=0 and where y=1. These decision boundaries result from the hypothesis function under consideration. Understanding Decision Boundary with an example - Let our hypothesis function be
h_{\Theta}(x)= g[\Theta_{0}+ \Theta_1x_1+\Theta_2x_2+ \Theta_3x_1^2 + \Theta_4x_2^2 ]
Then the decision boundary looks like
Let out weights or parameters be -
\Theta=\begin{bmatrix} -1\\ 0\\ 0\\ 1\\ 1 \end{bmatrix}
So, it predicts y = 1 if
-1 + x_{1}^2 + x_{2}^2 \geqslant 0
\Rightarrow x_{1}^2 + x_{2}^2 \geqslant 1
And that is the equation of a circle with radius = 1 and origin as the center. This is the Decision Boundary for our defined hypothesis.
Similar Reads
ML | Logistic Regression v/s Decision Tree Classification Logistic Regression and Decision Tree classification are two of the most popular and basic classification algorithms being used today. None of the algorithms is better than the other and one's superior performance is often credited to the nature of the data being worked upon. We can compare the two
2 min read
Text Classification using Logistic Regression Text classification is a fundamental task in Natural Language Processing (NLP) that involves assigning predefined categories or labels to textual data. It has a wide range of applications, including spam detection, sentiment analysis, topic categorization, and language identification. Logistic Regre
4 min read
Classification vs Regression in Machine Learning Classification and regression are two primary tasks in supervised machine learning, where key difference lies in the nature of the output: classification deals with discrete outcomes (e.g., yes/no, categories), while regression handles continuous values (e.g., price, temperature).Both approaches req
5 min read
Outlier Detection in Logistic Regression Outliers, data points that deviate significantly from the rest, can significantly impact the performance of logistic regression models. In this article we will explore various techniques for detecting and handling outliers in Logistic regression. What are Outliers?An outlier is an observation that f
8 min read
Logistic Regression Vs Random Forest Classifier A statistical technique called logistic regression is used to solve problems involving binary classification, in which the objective is to predict a binary result (such as yes/no, true/false, or 0/1) based on one or more predictor variables (also known as independent variables, features, or predicto
7 min read