Regression Analysis is a fundamental concept in machine learning used to model relationships between dependent and independent variables. Various regression techniques are tailored to different data structures and objectives. Below is an exploration of key regression techniques, their significance, and practical examples.
Types of Regression Techniques
- Linear Regression
- Polynomial Regression
- Stepwise Regression
- Decision Tree Regression
- Random Forest Regression
- Support Vector Regression
- Ridge Regression
- Lasso Regression
- ElasticNet Regression
- Bayesian Linear Regression
1. Linear Regression
Linear regression is used for predictive analysis. Linear regression is a linear approach for modeling the relationship between the criterion or the scalar response and the multiple predictors or explanatory variables. Linear regression focuses on the conditional probability distribution of the response given the values of the predictors. For linear regression, there is a danger of overfitting. The formula for linear regression is:
Syntax:
y = θx + b
where,
- θ – It is the model weights or parameters
- b – It is known as the bias.
This is the most basic form of regression analysis and is used to model a linear relationship between a single dependent variable and one or more independent variables.
Here, a linear regression model is instantiated to fit a linear relationship between input features (X) and target values (y). This code is used for simple demonstration of the approach.
Python
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X, y)
y_pred = model.predict(X_new)
Note: This code demonstrates the basic workflow of creating, training, and utilizing a linear regression model for predictive modeling tasks.
2. Polynomial Regression
This is an extension of linear regression and is used to model a non-linear relationship between the dependent variable and independent variables. Here as well syntax remains the same but now in the input variables we include some polynomial or higher degree terms of some already existing features as well. Linear regression was only able to fit a linear model to the data at hand but with polynomial features, we can easily fit some non-linear relationship between the target as well as input features.
Here is the code for simple demonstration of the Polynomial regression approach.
Python
from sklearn.linear_model import PolynomialRegression
model = PolynomialRegression(degree=2)
model.fit(X, y)
y_pred = model.predict(X_new)
Note: This code demonstrates the basic workflow of creating, training, and utilizing a Polynomial regression model for predictive modeling tasks.
3. Stepwise Regression
Stepwise regression is used for fitting regression models with predictive models. It is carried out automatically. With each step, the variable is added or subtracted from the set of explanatory variables. The approaches for stepwise regression are forward selection, backward elimination, and bidirectional elimination. The formula for stepwise regression is
[Tex]b_{j.std} = b_{j}(s_{x} s_{y}^{-1}) [/Tex]
Here is the code for simple demonstration of the stepwise regression approach.
Python
from sklearn.linear_model import StepwiseLinearRegression
model = StepwiseLinearRegression(forward=True,backward=True,
verbose=1)
model.fit(X, y)
y_pred = model.predict(X_new)
Note: This code demonstrates the basic workflow of creating, training, and utilizing a Stepwise regression model for predictive modeling tasks.
4. Decision Tree Regression
A Decision Tree is the most powerful and popular tool for classification and prediction. A Decision tree is a flowchart-like tree structure, where each internal node denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node (terminal node) holds a class label. There is a non-parametric method used to model a decision tree to predict a continuous outcome.
Here is the code for simple demonstration of the Decision Tree regression approach.
Python
from sklearn.tree import DecisionTreeRegressor
model = DecisionTreeRegressor()
model.fit(X, y)
y_pred = model.predict(X_new)
Note: This code demonstrates the basic workflow of creating, training, and utilizing a Decision Tree regression model for predictive modeling tasks.
5. Random Forest Regression
Random Forest is an ensemble technique capable of performing both regression and classification tasks with the use of multiple decision trees and a technique called Bootstrap and Aggregation, commonly known as bagging. The basic idea behind this is to combine multiple decision trees in determining the final output rather than relying on individual decision trees.
Random Forest has multiple decision trees as base learning models. We randomly perform row sampling and feature sampling from the dataset forming sample datasets for every model. This part is called Bootstrap.
Here is the code for simple demonstration of the Random Forest regression approach.
Python
from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor(n_estimators=100)
model.fit(X, y)
y_pred = model.predict(X_new)
Note: This code demonstrates the basic workflow of creating, training, and utilizing a Random Forest regression model for predictive modeling tasks.
6. Support Vector Regression (SVR)
Support vector regression (SVR) is a type of support vector machine (SVM) that is used for regression tasks. It tries to find a function that best predicts the continuous output value for a given input value.
SVR can use both linear and non-linear kernels. A linear kernel is a simple dot product between two input vectors, while a non-linear kernel is a more complex function that can capture more intricate patterns in the data. The choice of kernel depends on the data’s characteristics and the task’s complexity.
Here is the code for simple demonstration of the Support vector regression approach.
Python
from sklearn.svm import SVR
model = SVR(kernel='linear')
model.fit(X, y)
y_pred = model.predict(X_new)
Note: This code demonstrates the basic workflow of creating, training, and utilizing a Support vector regression model for predictive modeling tasks.
7. Ridge Regression
Ridge regression is a technique for analyzing multiple regression data. When multicollinearity occurs, least squares estimates are unbiased. This is a regularized linear regression model, it tries to reduce the model complexity by adding a penalty term to the cost function. A degree of bias is added to the regression estimates, and as a result, ridge regression reduces the standard errors.
[Tex]\textrm{Cost} = \underset{\beta \in \mathbb{R}}{\textrm{argmin}}\left\| i-X\beta\right\|^2 + \lambda \left\| \beta\right\|^2[/Tex]
Here is the code for simple demonstration of the Ridge regression approach.
Python
from sklearn.linear_model import Ridge
model = Ridge(alpha=0.1)
model.fit(X, y)
y_pred = model.predict(X_new)
Note: This code demonstrates the basic workflow of creating, training, and utilizing a Ridge regression model for predictive modeling tasks.
8. Lasso Regression
Lasso regression is a regression analysis method that performs both variable selection and regularization. Lasso regression uses soft thresholding. Lasso regression selects only a subset of the provided covariates for use in the final model.
This is another regularized linear regression model, it works by adding a penalty term to the cost function, but it tends to zero out some features’ coefficients, which makes it useful for feature selection.
Here is the code for simple demonstration of the Lasso regression approach.
Python
from sklearn.linear_model import Lasso
model = Lasso(alpha=0.1)
model.fit(X, y)
y_pred = model.predict(X_new)
Note: This code demonstrates the basic workflow of creating, training, and utilizing a Lasso regression model for predictive modeling tasks.
9. ElasticNet Regression
Linear Regression suffers from overfitting and can’t deal with collinear data. When there are many features in the dataset and even some of them are not relevant to the predictive model. This makes the model more complex with a too-inaccurate prediction on the test set (or overfitting). Such a model with high variance does not generalize on the new data. So, to deal with these issues, we include both L-2 and L-1 norm regularization to get the benefits of both Ridge and Lasso at the same time. The resultant model has better predictive power than Lasso. It performs feature selection and also makes the hypothesis simpler. The modified cost function for Elastic-Net Regression is given below:
[Tex]\frac{1}{m}\left[\sum_{l=1}^{m}\left(y^{(i)}-h\left(x^{(i)}\right)\right)^{2}+\lambda_{1} \sum_{j=1}^{n} w_{j}+\lambda_{2} \sum_{j=1}^{n} w_{j}^{2}\right][/Tex]
where,
- w(j) represents the weight for the jth feature.
- n is the number of features in the dataset.
- lambda1 is the regularization strength for the L1 norm.
- lambda2 is the regularization strength for the L2 norm.
Here is the code for simple demonstration of the Elasticnet regression approach.
Python
from sklearn.linear_model import ElasticNet
model = ElasticNet(alpha=0.1, l1_ratio=0.5)
model.fit(X, y)
y_pred = model.predict(X_new)
Note: This code demonstrates the basic workflow of creating, training, and utilizing a Elastic Net regression model for predictive modeling tasks.
10. Bayesian Linear Regression
As the name suggests this algorithm is purely based on Bayes Theorem. Because of this reason only we do not use the Least Square method to determine the coefficients of the regression model. So, the technique which is used here to find the model weights and parameters relies on features posterior distribution and this provides an extra stability factor to the regression model which is based on this technique.
Here is the code for simple demonstration of the Bayesian Linear regression approach.
Python
from sklearn.linear_model import BayesianLinearRegression
model = BayesianLinearRegression()
model.fit(X, y)
y_pred = model.predict(X_new)
Note: This code demonstrates the basic workflow of creating, training, and utilizing a Bayesian linear regression model for predictive modeling tasks.
Types of Regression Techniques in ML – FAQs
What are the 2 main types of regression?
The two main types of regression are linear regression and logistic regression. Linear regression is used to predict a continuous numerical outcome, while logistic regression is used to predict a binary categorical outcome (e.g., yes or no, pass or fail).
What are the two types of variables in regression?
The two types of variables in regression are independent variables and dependent variables. Independent variables are the inputs to the regression model, while the dependent variable is the output that the model is trying to predict.
Why is regression called regression?
The term “regression” was coined by Sir Francis Galton in the late 19th century. He used the term to describe the phenomenon of children’s heights tending to regress towards the mean of the population, meaning that taller-than-average parents tend to have children who are closer to the average height, and shorter-than-average parents tend to have children who are closer to the average height.
How to calculate regression?
There are many different ways to calculate regression, but the most common method is gradient descent. Gradient descent is an iterative algorithm that updates the parameters of the regression model in the direction that minimizes the error between the predicted and actual values of the dependent variable.
Why use regression?
Regression is a powerful tool for understanding and predicting relationships between variables. It is used in a wide variety of applications, including finance, economics, marketing, and medicine.
Similar Reads
ML - Different Regression types
Regression Analysis: It is a form of predictive modelling technique which investigates the relationship between a dependent (target) and independent variable (s) (predictor). To establish the possible relationship among different variables, various modes of statistical approaches are implemented, kn
6 min read
Stepwise Regression in Python
Stepwise regression is a method of fitting a regression model by iteratively adding or removing variables. It is used to build a model that is accurate and parsimonious, meaning that it has the smallest number of variables that can explain the data. There are two main types of stepwise regression: F
6 min read
Types of Data Analysis Techniques
Data analysis techniques have significantly evolved, providing a comprehensive toolkit for understanding, interpreting, and predicting data patterns. These methods are crucial in extracting actionable insights from data, enabling organizations to make informed decisions. This article will cover majo
7 min read
Variable Selection Techniques In R
Variable selection, also known as feature selection, is the process of identifying and choosing the most important predictors for a model. In R Programming Language This process leads to simpler, faster, and more interpretable models, and helps in preventing overfitting. Overfitting occurs when a mo
4 min read
Simple Linear Regression in R
Regression shows a line or curve that passes through all the data points on the target-predictor graph in such a way that the vertical distance between the data points and the regression line is minimum What is Linear Regression?Linear Regression is a commonly used type of predictive analysis. Linea
12 min read
Significance Test for Linear Regression in R
Linear regression is a statistical method for modeling the relationship between one or more independent variables and a dependent variable. It is frequently used to forecast the value of a dependent variable using the values of one or more independent factors. The lm() function in R can be used to c
5 min read
Simple Linear Regression in Python
Simple linear regression models the relationship between a dependent variable and a single independent variable. In this article, we will explore simple linear regression and it's implementation in Python using libraries such as NumPy, Pandas, and scikit-learn. Understanding Simple Linear Regression
7 min read
Effect of Transforming the Targets in Regression Model
Regression modelling plays a crucial role in predicting numerical outcomes and understanding the relationships between variables. One key aspect of building robust regression models is the careful consideration of the target variable, as its distribution and characteristics can significantly impact
8 min read
Regularization Techniques in Machine Learning
Overfitting is a major concern in the field of machine learning, as models aim to extract complex patterns from data. When a model learns to commit the training data to memory instead of making good generalizations to new data, this is known as overfitting. The model may perform poorly as a result w
10 min read
Top 10 ML Debugging Techniques
Do you ever wonder why your machine learning model is not performing as it should, although the data is right and the algorithm is good? It's frustrating when all the best practices are followed yet the results don't come even close to the ideal. Most of the time, the reason lies in debugging, a ste
8 min read