REGRESSION

da notes

Uploaded by

pujiswathy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

REGRESSION

da notes

Uploaded by

pujiswathy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 86

REGRESSION

AGENDA:

1. What is Regression and is there any relevance to ANOVA?

2. Difference between correlation and regression?
3. Difference between regression analysis and regression model
in ML?
4. Different types of regression?
5. What’s the equation of line?
6. What is sum of square error?
7. How does the curve look like for quadratic equation?
8. Optimization in Linear regression (LR)
9. Evaluating LR models?
10. Is there any problem/branch in Data Analytics involving
regression? Credits:
Presentation also
contains slides from
DeepLearning.AI
CORRELATION vs REGRESSION ANOVA vs REGRESSION
Correlation vs Covariance

Pearson's correlation coefficient, when applied

to a population, is commonly represented by
the Greek letter ρ (rho) and is defined as
LINEAR REGRESSION (LR)
A linear regression model can be defined as the Slope: a slope of m means that if you increase
function approximation that represents a continuous the x-value by 1 unit, then the y-value goes up
response variable as a function of one or more by m units; a negative slope means that the
predictor variables. While building a linear
regression model, the goal is to identify a linear
y-value would go down rather than up
equation that best predicts or models the If Y is the outcome variable (the DV) and X is
relationship between the response or dependent the predictor variable (the IV), then the formula
variable (y) and one or more predictor or that describes our regression is written like this:
independent variables (here, x).

The formula for a straight line is usually written like b0 always refers to the intercept term, and b1
this:
y=mx+c
refers to the slope. Refers to the estimate or
the prediction that our regression line is making.
The two variables are x and y, and we have two Difference between the model prediction and
coefficients, m and c. The coefficient m represents that actual data point as a residual, and we’ll
the slope of the line, and the coefficient c represents refer to it as ϵi.
the y-intercept of the line.
Thus our LR model is:
What is sum of squared error (SSE or RSS)?

In the case of a perfect fit, the RSS would be

0, meaning the estimated value is the same as
the actual value
OLS?
The ordinary least squares (OLS) method can be
defined as a linear regression technique that is
used to estimate the unknown parameters in a
model. The method relies on minimizing the sum
of squared residuals between the actual (observed
values of the dependent variable) and predicted
values from the model. The residual can be
defined as the difference between the actual value
and the predicted value. Another word for residual
can be error. The sum of the squared differences is
also known as the residual sum of squares (RSS).
The OLS method minimizes the RSS by finding
the values of the coefficients that result in the
smallest possible RSS. The resulting line is called
the regression line, which represents the best fit
for the data.
Fitting the line – best fit
Types of regression
Types of regression
ALTERNATIVES TO OLS

While OLS is a popular method for estimating linear Lasso Regression

regression models, there are several alternative Lasso regression is similar to ridge regression, but it
methods that can be used depending on the specific adds a penalty term that can result in some of the
requirements of the analysis. Let’s discuss some of coefficients being set to zero. This can help simplify
the popular alternative methods to OLS. the model and reduce the risk of overfitting.
•Ridge regression
•Lasso regression
Elastic Net Regression
•Elastic net regression Elastic net regression is a combination of ridge and
Ridge Regression lasso regression that adds both a L1 and L2 penalty
Ridge regression is a method that adds a penalty term to the OLS cost function. This method can help
term to the OLS cost function to prevent overfitting balance the advantages of both methods and can be
in scenarios where there are many independent particularly useful when there are many independent
variables or the independent variables are highly variables with varying degrees of importance.
correlated. The penalty term, known as the
shrinkage parameter, reduces the magnitude of the
coefficients and can help prevent the model from
being too complex.
Evaluating LR model
We need to able to measure how good our
model is (accuracy). There are many methods to
achieve this but we may use Root mean
squared error and coefficient of
Determination (R² Score).

Root Mean Squared Error is the square root of

the sum of all errors divided by the number of
values, or Mathematically,

Here yj^ is the ith predicted output values

Evaluating the LR model
After fitting the line, what is the significance of
your model?
Hypotheses testing for regression model:

After fitting the line, There are two different (but

related) kinds of hypothesis tests that we use in
our regression model:
• those in which we test whether the regression
model as a whole is performing significantly
better than a null model (F-test); and
• those in which we test whether a particular
regression coefficient is significantly different
from zero(t-test).
Much more frequently, the reasonableness of the
model is indicated by data – a scatter plot
exhibiting a substantial linear pattern.
Applications of regression
Sum of Squared Error –
Let’s start with one powerline, then move on to
two powerlines:
Two Powerlines problem:
In order to find
the minimum
point of a curve,
you find the point
where the slope
of the tangent = 0
To find that, we
take derivative of
the cost function
and equate it to 0
Linear Regression: Analytical
Approach
Linear Regression: Analytical
Approach

(1, 2) (2, 5) (3,3)

Linear Regression: Analytical
Approach

Location on
the x,y plane (1, (2, 5) (3,3)
2)
Linear Regression: Analytical
Approach y

Location on
the x,y plane (1, 2) (2, 5) (3,3)

x
Linear Regression: Analytical
Approach y

Location on
the x,y plane (1, 2) (2, 5) (3,3)

(1,2 )

x
Linear Regression: Analytical
Approach y

Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)

(1,2 )

x
Linear Regression: Analytical
Approach y

Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5 )

(3,3)
(1,2 )

x
Linear Regression: Analytical
Approach y

Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5 )

(3,3)
(1,2 )

x
Linear Regression: Analytical
Approach y

Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5 )

(3,3)
(1,2 )