Types of Regression Techniques in ML

Last Updated : 15 Jan, 2025

Regression Analysis is a fundamental concept in machine learning used to model relationships between dependent and independent variables. Various regression techniques are tailored to different data structures and objectives. Below is an exploration of key regression techniques, their significance, and practical examples.

Types of Regression Techniques

1. Linear Regression

Linear regression is used for predictive analysis. Linear regression is a linear approach for modeling the relationship between the criterion or the scalar response and the multiple predictors or explanatory variables. Linear regression focuses on the conditional probability distribution of the response given the values of the predictors. For linear regression, there is a danger of overfitting. The formula for linear regression is:

Syntax:
y = θx + b
where,
θ - It is the model weights or parameters
b - It is known as the bias.

This is the most basic form of regression analysis and is used to model a linear relationship between a single dependent variable and one or more independent variables.

Here, a linear regression model is instantiated to fit a linear relationship between input features (X) and target values (y). This code is used for simple demonstration of the approach.

Python

from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X, y)

y_pred = model.predict(X_new)

Note: This code demonstrates the basic workflow of creating, training, and utilizing a linear regression model for predictive modeling tasks.

2. Polynomial Regression

This is an extension of linear regression and is used to model a non-linear relationship between the dependent variable and independent variables. Here as well syntax remains the same but now in the input variables we include some polynomial or higher degree terms of some already existing features as well. Linear regression was only able to fit a linear model to the data at hand but with polynomial features, we can easily fit some non-linear relationship between the target as well as input features.

Here is the code for simple demonstration of the Polynomial regression approach.

Python

from sklearn.linear_model import PolynomialRegression

model = PolynomialRegression(degree=2)

model.fit(X, y)
y_pred = model.predict(X_new)

Note: This code demonstrates the basic workflow of creating, training, and utilizing a Polynomial regression model for predictive modeling tasks.

3. Stepwise Regression

Stepwise regression is used for fitting regression models with predictive models. It is carried out automatically. With each step, the variable is added or subtracted from the set of explanatory variables. The approaches for stepwise regression are forward selection, backward elimination, and bidirectional elimination. The formula for stepwise regression is

b_{j.std} = b_{j}(s_{x} s_{y}^{-1})

Here is the code for simple demonstration of the stepwise regression approach.

Python

from sklearn.linear_model import StepwiseLinearRegression

model = StepwiseLinearRegression(forward=True,backward=True,
                                 verbose=1)

model.fit(X, y)
y_pred = model.predict(X_new)

Note: This code demonstrates the basic workflow of creating, training, and utilizing a Stepwise regression model for predictive modeling tasks.

4. Decision Tree Regression

A Decision Tree is the most powerful and popular tool for classification and prediction. A Decision tree is a flowchart-like tree structure, where each internal node denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node (terminal node) holds a class label. There is a non-parametric method used to model a decision tree to predict a continuous outcome.

Here is the code for simple demonstration of the Decision Tree regression approach.

Python

from sklearn.tree import DecisionTreeRegressor

model = DecisionTreeRegressor()
model.fit(X, y)

y_pred = model.predict(X_new)

Note: This code demonstrates the basic workflow of creating, training, and utilizing a Decision Tree regression model for predictive modeling tasks.

5. Random Forest Regression

Random Forest is an ensemble technique capable of performing both regression and classification tasks with the use of multiple decision trees and a technique called Bootstrap and Aggregation, commonly known as bagging. The basic idea behind this is to combine multiple decision trees in determining the final output rather than relying on individual decision trees.

Random Forest has multiple decision trees as base learning models. We randomly perform row sampling and feature sampling from the dataset forming sample datasets for every model. This part is called Bootstrap.

Here is the code for simple demonstration of the Random Forest regression approach.

Python

from sklearn.ensemble import RandomForestRegressor

model = RandomForestRegressor(n_estimators=100)
model.fit(X, y)

y_pred = model.predict(X_new)

Note: This code demonstrates the basic workflow of creating, training, and utilizing a Random Forest regression model for predictive modeling tasks.

6. Support Vector Regression (SVR)

Support vector regression (SVR) is a type of support vector machine (SVM) that is used for regression tasks. It tries to find a function that best predicts the continuous output value for a given input value.

SVR can use both linear and non-linear kernels. A linear kernel is a simple dot product between two input vectors, while a non-linear kernel is a more complex function that can capture more intricate patterns in the data. The choice of kernel depends on the data’s characteristics and the task’s complexity.

Here is the code for simple demonstration of the Support vector regression approach.

Python

from sklearn.svm import SVR

model = SVR(kernel='linear')
model.fit(X, y)

y_pred = model.predict(X_new)

Note: This code demonstrates the basic workflow of creating, training, and utilizing a Support vector regression model for predictive modeling tasks.

7. Ridge Regression

Ridge regression is a technique for analyzing multiple regression data. When multicollinearity occurs, least squares estimates are unbiased. This is a regularized linear regression model, it tries to reduce the model complexity by adding a penalty term to the cost function. A degree of bias is added to the regression estimates, and as a result, ridge regression reduces the standard errors.

\textrm{Cost} = \underset{\beta \in \mathbb{R}}{\textrm{argmin}}\left\| i-X\beta\right\|^2 + \lambda \left\| \beta\right\|^2

Here is the code for simple demonstration of the Ridge regression approach.

Python

from sklearn.linear_model import Ridge

model = Ridge(alpha=0.1)
model.fit(X, y)

y_pred = model.predict(X_new)

Note: This code demonstrates the basic workflow of creating, training, and utilizing a Ridge regression model for predictive modeling tasks.

8. Lasso Regression

Lasso regression is a regression analysis method that performs both variable selection and regularization. Lasso regression uses soft thresholding. Lasso regression selects only a subset of the provided covariates for use in the final model.

This is another regularized linear regression model, it works by adding a penalty term to the cost function, but it tends to zero out some features' coefficients, which makes it useful for feature selection.

Here is the code for simple demonstration of the Lasso regression approach.

Python

from sklearn.linear_model import Lasso

model = Lasso(alpha=0.1)
model.fit(X, y)

y_pred = model.predict(X_new)

Note: This code demonstrates the basic workflow of creating, training, and utilizing a Lasso regression model for predictive modeling tasks.

9. ElasticNet Regression

Linear Regression suffers from overfitting and can’t deal with collinear data. When there are many features in the dataset and even some of them are not relevant to the predictive model. This makes the model more complex with a too-inaccurate prediction on the test set (or overfitting). Such a model with high variance does not generalize on the new data. So, to deal with these issues, we include both L-2 and L-1 norm regularization to get the benefits of both Ridge and Lasso at the same time. The resultant model has better predictive power than Lasso. It performs feature selection and also makes the hypothesis simpler. The modified cost function for Elastic-Net Regression is given below:

\frac{1}{m}\left[\sum_{l=1}^{m}\left(y^{(i)}-h\left(x^{(i)}\right)\right)^{2}+\lambda_{1} \sum_{j=1}^{n} w_{j}+\lambda_{2} \sum_{j=1}^{n} w_{j}^{2}\right]

where,

w(j) represents the weight for the j^th feature.
n is the number of features in the dataset.
lambda1 is the regularization strength for the L1 norm.
lambda2 is the regularization strength for the L2 norm.

Here is the code for simple demonstration of the Elasticnet regression approach.

Python

from sklearn.linear_model import ElasticNet

model = ElasticNet(alpha=0.1, l1_ratio=0.5)
model.fit(X, y)

y_pred = model.predict(X_new)

Note: This code demonstrates the basic workflow of creating, training, and utilizing a Elastic Net regression model for predictive modeling tasks.

10. Bayesian Linear Regression

As the name suggests this algorithm is purely based on Bayes Theorem. Because of this reason only we do not use the Least Square method to determine the coefficients of the regression model. So, the technique which is used here to find the model weights and parameters relies on features posterior distribution and this provides an extra stability factor to the regression model which is based on this technique.

Here is the code for simple demonstration of the Bayesian Linear regression approach.

Python

from sklearn.linear_model import BayesianLinearRegression

model = BayesianLinearRegression()
model.fit(X, y)

y_pred = model.predict(X_new)

Note: This code demonstrates the basic workflow of creating, training, and utilizing a Bayesian linear regression model for predictive modeling tasks.