0% found this document useful (0 votes)
2 views

linear regression (1)

The document provides an introduction to Simple Linear Regression, explaining its purpose of finding a linear relationship between independent and dependent variables. It details the mathematical computation of the best-fit line using methods like Ordinary Least Squares and Gradient Descent, along with evaluation metrics such as R-Squared and Root Mean Squared Error. Additionally, it outlines key assumptions necessary for the effective application of linear regression.

Uploaded by

shubham.kunjapur
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

linear regression (1)

The document provides an introduction to Simple Linear Regression, explaining its purpose of finding a linear relationship between independent and dependent variables. It details the mathematical computation of the best-fit line using methods like Ordinary Least Squares and Gradient Descent, along with evaluation metrics such as R-Squared and Root Mean Squared Error. Additionally, it outlines key assumptions necessary for the effective application of linear regression.

Uploaded by

shubham.kunjapur
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

#1.

Introduction to the Simple Linear


Regression
The realm of predictive analysis is vast, yet at its heart lies Linear Regression — the simplest
method to make sense of data trends.
🎯 The main goal?
Find a linear relationship between:
• The independent variable or predictor.
• The dependent variable or output
In plain talk, Linear Regression is all about finding a straight line that shows how two things are
connected — like how much you study (that’s the independent bit) and your test scores (that’s the
dependent bit).
The big idea is to see how one thing can predict the other.
Sounds interesting, right?
So now… let’s try to make some sense of Linear Regression wondering…

#2 How to compute it mathematically?


Think of it as a team effort where two things work together:
• One thing that depends on another — we’ll call it the outcome.
• Another thing that is independent — we’ll call it the predictor.
They need to have a straight-line relationship, kind of like following a straight path on a map that
leads us to the answers we’re looking for.
Now, finding the perfect path — or in our case, the best fit line — isn’t about wild guesses.
When discussing numerical data, we use a unique formula to identify the optimal line in Linear
Regression. Understanding that there’s a linear correlation between the variables, we can
straightforwardly apply a linear function to determine our predictor.
In this little math recipe, there are 4 main variables:
• Y is our outcome
• B sets up where the line starts.
• A is the angle of our slope.
• x is our predictor.
Using this, we can draw a straight line right through our data points that show us the connection
between the two variables we’re looking at.
So now that we know how to define such a relationship, you might be wondering…

#3. How to define this best fit?


So the ultimate idea is quite clear: We want a line that doesn’t make many mistakes.
And what do we consider a mistake?
A mistake, in this case, means how far off our guess is from what really happens. So we will define
a mistake as the difference between our predicted value and our actual one.

So, the best line is the one that has the smallest errors. Or to put it simply…
The little differences between what we thought would happen — our prediction — and
what actually did happen — the actual value.
We’ve got a special name for these mistakes: we call them residuals. We want those residuals, or
errors, to be as tiny as possible to make our predictions super accurate.

And this leads us to the next natural question…

#4. How to obtain this best-fit line


mathematically?
If we recall the linear equation, we have two important variables, that we need to figure out and we
will call weights from now on:
• A that stands for the angle of our slope.
• B that defines where the line starts.
To do this, we use a special tool called a cost function. Think of the cost function as a method that
helps us find the optimal values for our weights.
For linear regression, we use as a cost function the Mean Squared Error (MSE), a metric that
captures the average squared deviation between predicted values and actual outcomes.
By squaring each error before computing the average, we ensure that all discrepancies, regardless of
their direction — positive or negative — contribute equally to the overall measure.
This process accentuates larger errors, providing a clear picture of the model’s performance.
Once we have our cost function, there are two main ways to assess such an optimization problem:

#4.1 Ordinary Least Squares or OLS


The objective of Ordinary Least Squares (OLS) is to determine the optimal coefficients
A and B by minimizing the aggregate of squared prediction errors.
Leveraging calculus, we exploit the properties of partial derivatives to locate the minima of the cost
function, where these derivatives equal zero.
By solving those math problems, we get an exact closed mathematical formula for both A and B,
providing us with a direct route to the most accurate linear model.
#4.2 GRADIENT DESCENT
Gradient descent is a pivotal optimization algorithm used to minimize the cost function, aiding us in
our aim to find the most accurate weight values for our predictive model.
Envision standing atop a hill, your objective is the valley below — this represents our cost
function’s minimum point.
To reach it, we begin with initial guesses for our weights, A and B, and iteratively refine these
guesses.
The process is akin to descending a hill: with each step, we assess our surroundings and adjust our
trajectory to ensure each subsequent step brings us closer to the valley floor.
These steps are guided by the learning rate — a vital hyperparameter symbolized as lr in the
equations. This learning rate controls the size of our steps or adjustments to the parameters A and B,
ensuring that we do not overshoot the minimum.

As we take each step, we calculate the partial derivatives of the cost function with respect to A and
B, denoted as dA and dB respectively. These derivatives point us in the direction where the cost
function decreases the fastest, akin to finding the steepest descent on our metaphorical hill.
The updated equations for A and B in each iteration, factoring in the learning rate, are as follows:
This meticulous process is repeated until we reach a point where the cost function’s decrease is
negligible, suggesting we’ve arrived at or near the global minimum — our destination where the
predictive error is minimized, and our model’s accuracy is maximized.
#5 Evaluation
Alright, let’s talk about how we check if our Simple Linear Regression model is doing a good job.
There are two main ways to do this:

5.1 Coefficient of Determination or R-Squared (R²):


This is a fancy way of saying how much of the changes in what we’re trying to predict can be
explained by our independent variable — the one we think is causing the change. It’s like a score
between 0 and 1.
• If it’s close to 1, it means our model is explaining a lot of what’s happening.
• If it’s closer to 0, not so much.

5. 2 Root Mean Squared Error (RMSE):


This one is about how much error, or mistake, there is in our predictions. It’s like taking all our
errors, squaring them (which makes them all positive), averaging them, and then taking the square
root of that average. It gives us a number that tells us, on average, how far off our predictions are.
The smaller this number, the better our model is at predicting.

#6 Assumptions to apply linear regression


For Linear Regression to work well, there are a few key things we need to keep in mind before
applying it to a random dataset.
6.1 Linearity of the variables:
Both variables, independent and dependent, need to be connected in a straight-line way. This means
if one goes up or down, the other tends to follow in a predictable, straight-line pattern.

6.2 Independence of residuals:


Regarding errors — these are the little mistakes we make when predicting stuff. We want to make
sure these errors aren’t following any pattern or depending on each other. They should just happen
randomly.

6.3 Normal distribution of residuals


Also, when we look at the average of all these errors (or residuals, as they’re called), they should
spread out in a normal way, like most of them bunching up around the middle (which should be
close to zero).

6.4 The equal variance of residuals.


Lastly, these errors need to stay consistent in their spread — they shouldn’t get bigger or smaller as
we move across our data. They should vary, but the amount of variation should stay the same all the
way through.

Wrapping Up
We’ve explored the relationship between independent and dependent variables, emphasizing error
minimization for accurate predictions.
Our discussion included mathematical approaches like Ordinary Least Squares (OLS) and Gradient
Descent for optimizing the model.
We evaluated the model’s effectiveness using R² and RMSE metrics and stressed the importance of
meeting key assumptions for successful application.

You might also like