0% found this document useful (0 votes)
4 views

Linear regression case study

This case study explores linear regression, detailing its types (simple and multiple), cost functions, and evaluation metrics such as MSE, MAE, RMSE, and R-squared. It also discusses regularization techniques like Lasso, Ridge, and Elastic Net to prevent overfitting. The applications of linear regression span various fields, including finance and economics, for predicting relationships between variables.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Linear regression case study

This case study explores linear regression, detailing its types (simple and multiple), cost functions, and evaluation metrics such as MSE, MAE, RMSE, and R-squared. It also discusses regularization techniques like Lasso, Ridge, and Elastic Net to prevent overfitting. The applications of linear regression span various fields, including finance and economics, for predicting relationships between variables.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Case Study on

Logistic Regression

Aim: To study Linear Regression

Theory:

Linear regression:

Linear regression is a type of supervised machine learning algorithm that computes the linear
relationship between the dependent variable and one or more independent features by fitting a
linear equation to observed data. When there is only one independent feature, it is known
as Simple Linear Regression, and when there are more than one feature, it is known as Multiple
Linear Regression.

Types of Linear Regression:

1. Simple Linear Regression


This is the simplest form of linear regression, and it involves only one independent
variable and one dependent variable. The equation for simple linear regression is:

𝑦=𝛽0+𝛽1𝑋y=β0+β1X
where:
• Y is the dependent variable
• X is the independent variable
• β0 is the intercept
• β1 is the slope

2. Multiple Linear Regression


This involves more than one independent variable and one dependent variable. The
equation for multiple linear regression is:

𝑦=𝛽0+𝛽1𝑋+𝛽2𝑋+………𝛽𝑛𝑋y=β0+β1X+β2X+………βnX
where:
• Y is the dependent variable
• X1, X2, …, Xn are the independent variables
• β0 is the intercept
• β1, β2, …, βn are the slopes
Cost function for Linear Regression:

The cost function or the loss function is nothing but the error or difference between the predicted
value 𝑌^ and the true value Y.
In Linear Regression, the Mean Squared Error (MSE) cost function is employed, which
calculates the average of the squared errors between the predicted values 𝑦^𝑖 and the actual
values 𝑦i.
MSE function can be calculated as:

Cost function(𝐽)=1𝑛∑𝑛𝑖(𝑦𝑖^−𝑦𝑖)2

Gradient Descent for Linear Regression:

A linear regression model can be trained using the optimization algorithm gradient descent by
iteratively modifying the model’s parameters to reduce the mean squared error (MSE) of the
model on a training dataset. To update θ1 and θ2 values in order to reduce the Cost function
(minimizing RMSE value) and achieve the best-fit line the model uses Gradient Descent.

θ1=θ1–α(J’θ1)

Evaluation Metrics for Linear Regression:

The most common measurements are:

Mean Square Error (MSE)

Mean Squared Error (MSE) is an evaluation metric that calculates the average of the squared
differences between the actual and predicted values for all the data points. The difference is
squared to ensure that negative and positive differences don’t cancel each other out.

𝑀𝑆𝐸=1𝑛∑𝑖=1𝑛(𝑦𝑖–𝑦𝑖^)2
Here,
• n is the number of data points.
• yi is the actual or observed value for the ith data point.
• 𝑦𝑖^ is the predicted value for the ith data point.

Mean Absolute Error (MAE)

Mean Absolute Error is an evaluation metric used to calculate the accuracy of a regression
model. MAE measures the average absolute difference between the predicted values and actual
values.
Mathematically, MAE is expressed as:

𝑀𝐴𝐸=1𝑛∑𝑖=1𝑛∣𝑌𝑖–𝑌𝑖^∣
Here,
• n is the number of observations
• Yi represents the actual values.
• 𝑌𝑖^ represents the predicted values

Root Mean Squared Error (RMSE)

The square root of the residuals’ variance is the Root Mean Squared Error. It describes how well
the observed data points match the expected values, or the model’s absolute fit to the data.
In mathematical notation, it can be expressed as:

RMSE=√𝑅𝑆𝑆/𝑛

Coefficient of Determination (R-squared)

R-Squared is a statistic that indicates how much variation the developed model can explain or
capture.
R2=1−(TSS/RSS)

• Residual sum of Squares (RSS): The sum of squares of the residual for each data
point in the plot or data is known as the residual sum of squares, or RSS. It is a
measurement of the difference between the output that was observed and what was
anticipated.
𝑅𝑆𝑆=∑𝑖=2𝑛(𝑦𝑖−𝑏0−𝑏1𝑥𝑖)2

• Total Sum of Squares (TSS): The sum of the data points’ errors from the answer
variable’s mean is known as the total sum of squares, or TSS.

𝑇𝑆𝑆=∑(𝑦−𝑦𝑖^)2

Adjusted R-Squared Error

Adjusted R2 measures the proportion of variance in the dependent variable that is explained by
independent variables in a regression model. Adjusted R-square accounts the number of
predictors in the model and penalizes the model for including irrelevant predictors that don’t
contribute significantly to explain the variance in the dependent variables.
Mathematically, adjusted R2 is expressed as:

Adjusted R2=1–(1−R2).(n−1)/ n−k−1)


Here,
• n is the number of observations
• k is the number of predictors in the model
• R2 is coefficient of determination

Adjusted R-square helps to prevent overfitting. It penalizes the model with additional predictors
that do not contribute significantly to explain the variance in the dependent variable.

Regularization Techniques for Linear Models:

Lasso Regression (L1 Regularization):


Lasso Regression is a technique used for regularizing a linear regression model, it adds a
penalty term to the linear regression objective function to prevent overfitting.

Ridge Regression (L2 Regularization):


Ridge regression is a linear regression technique that adds a regularization term to the standard
linear objective. Again, the goal is to prevent overfitting by penalizing large coefficient in
linear regression equation. It useful when the dataset has multicollinearity where predictor
variables are highly correlated.

Elastic Net Regression:


Elastic Net Regression is a hybrid regularization technique that combines the power of both L1
and L2 regularization in linear regression objective.

Applications of Linear Regression:

Linear regression is used in many different fields, including finance, economics, and psychology,
to understand and predict the behavior of a particular variable. For example, in finance, linear
regression might be used to understand the relationship between a company’s stock price and its
earnings or to predict the future value of a currency based on its past performance.
Output:
Conclusion: Hence we have executed Linear Regression on random dataset with performance metrics.

You might also like