Univariate Linear Regression in Python
Last Updated :
22 Aug, 2024
In this article, we will explain univariate linear regression. It is one of the simplest types of regression. In this regression, we predict our target value on only one independent variable.
Univariate Linear Regression in Python
Univariate Linear Regression is a type of regression in which the target variable depends on only one independent variable. For univariate regression, we use univariate data. For instance, a dataset of points on a line can be considered as univariate data where abscissa can be considered as an input feature and ordinate can be considered as output/target.
Example Of Univariate Linear Regression
For line Y = 2X + 3; the Input feature will be X and Y will be the target.
Concept: For univariate linear regression, there is only one input feature vector. The line of regression will be in the form of the following:
Y = b0 + b1 * X Where, b0 and b1 are the coefficients of regression.
here we try to find the best b0 and b1 by training a model so that our predicted variable y has minimum difference with actual y.
A univariate linear regression model constitutes of several utility functions. We will define each function one by one and at the end, we will combine them in a class to form a working univariate linear regression model object.
Utility Functions in Univariate Linear Regression Model
- Prediction with linear regression
- Cost function
- Gradient Descent For Parameter Estimation
- Update Coefficients
- Stop Iterations
Prediction with linear regression
In this function, we predict the value of y on a given value of x by multiplying and adding the coefficient of regression to the x.
Python
# Y = b0 + b1 * X
def predict(x, b0, b1):
return b0 + b1 * x
Cost function For Univariate Linear Regression
The cost function computes the error with the current value of regression coefficients. It quantitatively defines how far the model predicted value is from the actual value wrt regression coefficients which have the lowest rate of error.
Mean-Squared Error(MSE) = sum of squares of difference between predicted and actual value
J(b_1, b_0) = \frac{1}{n} (y_p-y)^2
We use square so that positive and negative error does not cancel out each other.
Here:
- y is listed of expected values
- x is the independent variable
- b0 and b1 are regression coefficient
Python
def cost(x, y, b0, b1):
errors = []
for x, y in zip(x, y):
prediction = predict(x, b0, b1)
expected = y
difference = prediction-expected
errors.append(difference)
mse = sum([error * error for error in errors])/len(errors)
return mse
Gradient Descent For Parameter Estimation
We will use gradient descent for updating our regression coefficient. It is an optimization algorithm that we use to train our model. In gradient descent, we take the partial derivative of the cost function wrt to our regression coefficient and multiply with the learning rate alpha and subtract it from our coefficient to adjust our regression coefficient.
For the case of simplicity we will apply gradient descent on only one row element and
try to estimate our univaraite linear regression coeficient on the basis of
this gradient descent.
\begin {aligned} {J}'b_1 &=\frac{\partial J(b_1,b_0)}{\partial b_1} \\ &= \frac{\partial}{\partial b_1} \left[\frac{1}{n} (y_p-y)^2 \right] \\ &= \frac{2(y_p-y)}{n}\frac{\partial}{\partial b_1}\left [(y_p-y) \right ] \\ &= \frac{2(y_p-y)}{n}\frac{\partial}{\partial b_1}\left [((xb_1+b_0)-y) \right ] \\ &= \frac{2(y_p-y)}{n}\left[\frac{\partial(xb_1+b_0)}{\partial b_1}-\frac{\partial(y)}{\partial b_1}\right] \\ &= \frac{2(y_p-y)}{n}\left [ x - 0 \right ] \\ &= \frac{1}{n}(y_p-y)[2x] \end {aligned}
\begin {aligned} {J}'b_0 &=\frac{\partial J(b_1,b_0)}{\partial b_0} \\ &= \frac{\partial}{\partial b_0} \left[\frac{1}{n} (y_p-y)^2 \right] \\ &= \frac{2(y_p-y)}{n}\frac{\partial}{\partial b_0}\left [(y_p-y) \right ] \\ &= \frac{2(y_p-y)}{n}\frac{\partial}{\partial b}\left [((xW^T+b)-y) \right ] \\ &= \frac{2(y_p-y)}{n}\left[\frac{\partial(xb_1+b_0)}{\partial b_0}-\frac{\partial(y)}{\partial b_0}\right] \\ &= \frac{2(y_p-y)}{n}\left [ 1 - 0 \right ] \\ &= \frac{1}{n}(y_p-y)[2] \end {aligned}
Since our cost function has two parameters b_1 and b_0 we have taken the derivative of the cost function wrt b_1 and then wrt b_0.
Python function for Gradient Descent.
Python
def grad_fun(x, y, b0, b1, i):
return sum([
2*(predict(xi, b0, b1)-yi)*1
if i == 0
else 2*(predict(xi, b0, b1)-yi)*xi
for xi, yi in zip(x, y)
])/len(x)
Update Coefficients Of Univariate Linear Regression.
At each iteration (epoch), the values of the regression coefficient are updated by a specific value wrt to the error from the previous iteration. This updation is very crucial and is the crux of the machine learning applications that you write. Updating the coefficients is done by penalizing their value with a fraction of the error that its previous values caused. This fraction is called the learning rate. This defines how fast our model reaches to point of convergence(the point where the error is ideally 0).
b_i = b_i - \alpha * \left( \frac{\partial}{\partial b} cost(x, y) \right)
Python
def update_coeff(x, y, b0, b1, i, alpha):
bi -= alpha * cost_derivative(x, y, b0, b1, i)
return bi
Stop Iterations
This is the function that is used to specify when the iterations should stop. As per the user, the algorithm stop_iteration generally returns true in the following conditions:
- Max Iteration: Model is trained for a specified number of iterations.
- Error value: Depending upon the value of the previous error, the algorithm decides whether to continue or stop.
- Accuracy: Depending upon the last accuracy of the model, if it is larger than the mentioned accuracy, the algorithm returns True,
- Hybrid: This is more often used. This combines more than one above mentioned conditions along with an exceptional break option. The exceptional break is a condition where training continues until when something bad happens. Something bad might include an overflow of results, time constraints exceeded, etc.
Having all the utility functions defined let's see the pseudo-code followed by its implementation:
Pseudocode for linear regression:
x, y is the given data.
(b0, b1) <-- (0, 0)
i = 0
while True:
if stop_iteration(i):
break
else:
b0 = update_coeff(x, y, b0, b1, 0, alpha)
b1 = update_coeff(x, y, b0, b1, 1, alpha)
Full Implementation of univariate using Python
Python
class LinearRegressor:
def __init__(self, x, y, alpha=0.01, b0=0, b1=0):
"""
x: input feature
y: result / target
alpha: learning rate, default is 0.01
b0, b1: linear regression coefficient.
"""
self.i = 0
self.x = x
self.y = y
self.alpha = alpha
self.b0 = b0
self.b1 = b1
if len(x) != len(y):
raise TypeError("""x and y should have same number of rows.""")
def cost_derivative(x, y, b0, b1):
errors = []
for x, y in zip(x, y):
prediction = predict(x, b0, b1)
expected = y
difference = prediction-expected
errors.append(difference)
mse = sum([error * error for error in errors])/len(errors)
return mse
def predict(model, x):
"""Predicts the value of prediction based on
current value of regression coefficients
when input is x"""
return model.b0 + model.b1 * x
def grad_fun(model, i):
x, y, b0, b1 = model.x, model.y, model.b0, model.b1
predict = model.predict
return sum([
2 * (predict(xi) - yi) * 1
if i == 0
else (predict(xi) - yi) * xi
for xi, yi in zip(x, y)
]) / len(x)
def update_coeff(model, i):
cost_derivative = model.cost_derivative
if i == 0:
model.b0 -= model.alpha * cost_derivative(i)
elif i == 1:
model.b1 -= model.alpha * cost_derivative(i)
def stop_iteration(model, max_epochs=1000):
model.i += 1
if model.i == max_epochs:
return True
else:
return False
def fit(model):
update_coeff = model.update_coeff
model.i = 0
while True:
if model.stop_iteration():
break
else:
update_coeff(0)
update_coeff(1)
Initializing the Model object
Python
linearRegressor = LinearRegressor(
x=[i for i in range(12)],
y=[2 * i + 3 for i in range(12)],
alpha=0.03
)
linearRegressor.fit()
print(linearRegressor.predict(12))
Output:
27.00000004287766
Similar Reads
Linear Regression Using Tensorflow
We will briefly summarize Linear Regression before implementing it using TensorFlow. Since we will not get into the details of either Linear Regression or Tensorflow, please read the following articles for more details: Linear Regression (Python Implementation)Introduction to TensorFlowIntroduction
6 min read
Mean Squared Error in Python
Mean Squared Error (MSE) is one of the most common metrics used for evaluating the performance of regression models. It measures the average of the squares of the errorsâthat is, the average squared difference between the predicted and actual values. MSE provides a way to quantify how much error exi
3 min read
Softmax Regression Using Keras
Prerequisites: Logistic RegressionGetting Started With Keras: Deep learning is one of the major subfields of machine learning framework. It is supported by various libraries such as Theano, TensorFlow, Caffe, Mxnet etc., Keras is one of the most powerful and easy to use python library, which is buil
9 min read
Analyzing Selling Price of used Cars using Python
Analyzing the selling price of used cars is essential for making informed decisions in the automotive market. Using Python, we can efficiently process and visualize data to uncover key factors influencing car prices. This analysis not only aids buyers and sellers but also enables predictive modeling
4 min read
Mean of Tuple List - Python
We are give list of tuples we need to find the mean of each tuples. For example, a = [(1, 2, 3), (4, 5, 6), (7, 8, 9)] we need to find mean of each tuples so that output should be (4.0, 5.0, 6.0).Using sum() and len()We can find the mean of a tuple list by summing corresponding elements using sum()
2 min read
Lazy Predict Library in Python for Machine Learning
Python is a versatile language that you can use for just about anything. And one of the great things about Python is that there are so many libraries out there that make it even more powerful. Lazy predict is one of those libraries. It's a great tool for machine learning and data science. And in thi
7 min read
Simple Linear Regression in Python
Simple linear regression models the relationship between a dependent variable and a single independent variable. In this article, we will explore simple linear regression and it's implementation in Python using libraries such as NumPy, Pandas, and scikit-learn.Understanding Simple Linear RegressionS
7 min read
Solving Linear Regression in Python
Linear regression is a widely used statistical method to find the relationship between dependent variable and one or more independent variables. It is used to make predictions by finding a line that best fits the data we have. The most common approach to best fit a linear regression model is least-s
3 min read
Weighted Least Squares Regression in Python
Weighted Least Squares (WLS) regression is a powerful extension of ordinary least squares regression, particularly useful when dealing with data that violates the assumption of constant variance. In this guide, we will learn brief overview of Weighted Least Squares regression and demonstrate how to
6 min read
Linear Regression in Python using Statsmodels
In this article, we will discuss how to use statsmodels using Linear Regression in Python. Linear regression analysis is a statistical technique for predicting the value of one variable(dependent variable) based on the value of another(independent variable). The dependent variable is the variable th
4 min read