Linear regression case study
Linear regression case study
Logistic Regression
Theory:
Linear regression:
Linear regression is a type of supervised machine learning algorithm that computes the linear
relationship between the dependent variable and one or more independent features by fitting a
linear equation to observed data. When there is only one independent feature, it is known
as Simple Linear Regression, and when there are more than one feature, it is known as Multiple
Linear Regression.
𝑦=𝛽0+𝛽1𝑋y=β0+β1X
where:
• Y is the dependent variable
• X is the independent variable
• β0 is the intercept
• β1 is the slope
𝑦=𝛽0+𝛽1𝑋+𝛽2𝑋+………𝛽𝑛𝑋y=β0+β1X+β2X+………βnX
where:
• Y is the dependent variable
• X1, X2, …, Xn are the independent variables
• β0 is the intercept
• β1, β2, …, βn are the slopes
Cost function for Linear Regression:
The cost function or the loss function is nothing but the error or difference between the predicted
value 𝑌^ and the true value Y.
In Linear Regression, the Mean Squared Error (MSE) cost function is employed, which
calculates the average of the squared errors between the predicted values 𝑦^𝑖 and the actual
values 𝑦i.
MSE function can be calculated as:
Cost function(𝐽)=1𝑛∑𝑛𝑖(𝑦𝑖^−𝑦𝑖)2
A linear regression model can be trained using the optimization algorithm gradient descent by
iteratively modifying the model’s parameters to reduce the mean squared error (MSE) of the
model on a training dataset. To update θ1 and θ2 values in order to reduce the Cost function
(minimizing RMSE value) and achieve the best-fit line the model uses Gradient Descent.
θ1=θ1–α(J’θ1)
Mean Squared Error (MSE) is an evaluation metric that calculates the average of the squared
differences between the actual and predicted values for all the data points. The difference is
squared to ensure that negative and positive differences don’t cancel each other out.
𝑀𝑆𝐸=1𝑛∑𝑖=1𝑛(𝑦𝑖–𝑦𝑖^)2
Here,
• n is the number of data points.
• yi is the actual or observed value for the ith data point.
• 𝑦𝑖^ is the predicted value for the ith data point.
Mean Absolute Error is an evaluation metric used to calculate the accuracy of a regression
model. MAE measures the average absolute difference between the predicted values and actual
values.
Mathematically, MAE is expressed as:
𝑀𝐴𝐸=1𝑛∑𝑖=1𝑛∣𝑌𝑖–𝑌𝑖^∣
Here,
• n is the number of observations
• Yi represents the actual values.
• 𝑌𝑖^ represents the predicted values
The square root of the residuals’ variance is the Root Mean Squared Error. It describes how well
the observed data points match the expected values, or the model’s absolute fit to the data.
In mathematical notation, it can be expressed as:
RMSE=√𝑅𝑆𝑆/𝑛
R-Squared is a statistic that indicates how much variation the developed model can explain or
capture.
R2=1−(TSS/RSS)
• Residual sum of Squares (RSS): The sum of squares of the residual for each data
point in the plot or data is known as the residual sum of squares, or RSS. It is a
measurement of the difference between the output that was observed and what was
anticipated.
𝑅𝑆𝑆=∑𝑖=2𝑛(𝑦𝑖−𝑏0−𝑏1𝑥𝑖)2
• Total Sum of Squares (TSS): The sum of the data points’ errors from the answer
variable’s mean is known as the total sum of squares, or TSS.
𝑇𝑆𝑆=∑(𝑦−𝑦𝑖^)2
Adjusted R2 measures the proportion of variance in the dependent variable that is explained by
independent variables in a regression model. Adjusted R-square accounts the number of
predictors in the model and penalizes the model for including irrelevant predictors that don’t
contribute significantly to explain the variance in the dependent variables.
Mathematically, adjusted R2 is expressed as:
Adjusted R-square helps to prevent overfitting. It penalizes the model with additional predictors
that do not contribute significantly to explain the variance in the dependent variable.
Linear regression is used in many different fields, including finance, economics, and psychology,
to understand and predict the behavior of a particular variable. For example, in finance, linear
regression might be used to understand the relationship between a company’s stock price and its
earnings or to predict the future value of a currency based on its past performance.
Output:
Conclusion: Hence we have executed Linear Regression on random dataset with performance metrics.