5.linear Regression
5.linear Regression
REGRESSION
Linear regression is one of the easiest and most
popular Machine Learning algorithms.
WHAT IS Linear regression makes predictions for
LINEAR continuous/real or numeric variables
REGRESSI
Three main uses of regression analysis are:
1. Predicting share price
ON? 2. Analyzing the impact of price changes
LINEAR REGRESSION
Linear regression shows the linear relationship,
which means it finds how the value of the
dependent variable (y) is changing according to
the value of the independent variable (x).
The linear regression model provides a sloped
straight line representing the relationship
between the variables.
LINEAR REGRESSION
Mathematically, we can represent a linear regression
as:
y= a0+a1x+ ε
Here,
Y= Dependent Variable (Target Variable)
X= Independent Variable (predictor Variable)
a0= intercept of the line (Gives an additional degree of
freedom)
a1 = Linear regression coefficient (scale factor to each
input value).
ε = random error
TYPES OF LINEAR
REGRESSION
Simple Linear Regression:
If a single independent variable is used to predict the value of a numerical dependent
variable, then such a Linear Regression algorithm is called Simple Linear
Regression.
Multiple Linear regression:
If more than one independent variables are used to predict the value of a numerical
dependent variable, then such a Linear Regression algorithm is called Multiple
Linear Regression.
LINEAR REGRESSION LINE
Positive Linear Relationship: Negative Linear Relationship:
If the dependent variable increases on If the dependent variable decreases on
the Y-axis and independent variable the Y-axis and independent variable
increases on X-axis, then such a increases on the X-axis, then such a
relationship is termed as a Positive relationship is called a negative linear
linear relationship. relationship.
COST FUNCTION
Goal: find the best fit line that means the error between predicted values and actual values should be
minimized. The best fit line will have the least error.
The different values for weights or the coefficient of lines (a0, a1) gives a different line of regression,
so we need to calculate the best values for a0 and a1 to find the best fit line, so to calculate this we use
cost function.
For Linear Regression, we use the Mean Squared Error (MSE) cost function, which is the average
of squared error occurred between the predicted values and actual values. It can be written as:
MSE can be calculated as:
Where,
N=Total number of observation
Yi = Actual value
(a1xi+a0)= Predicted value.
GRADIENT DESCENT
Gradient descent is a method of updating a_0 and a_1 to reduce the cost
function(MSE). The idea is that we start with some values for a_0 and a_1 and then
we change these values iteratively to reduce the cost.
2
𝑅 =
∑ ( 𝑦 𝑝 − 𝑦)2
, 𝑅
2
=
1.6
, 𝑅
2
= 0.3
∑ ( 𝑦 −𝑦) 2
5.2
GOODNESS OF FIT
GOODNESS OF FIT
MULTIVARIATE LINEAR
REGRESSION
More than one independent variables are used to predict the value of a numerical
dependent variable.
MULTIVARIATE LINEAR
REGRESSION
Y=f(x,z)
y = m1.x + m2.z+ c
y is the dependent variable i.e. the variable that needs to be estimated and
predicted.
x is the first independent variable i.e. the variable that is controllable. It is the
first input.
m1 is the slope of x1. It determines what will be the angle of the line (x).
z is the second independent variable i.e. the variable that is controllable. It is
the second input.
m2 is the slope of z. It determines what will be the angle of the line (z).
c is the intercept. A constant that determines the value of y when x and z are 0.
MULTIVARIATE LINEAR
REGRESSIONREGRESSION
model with two input variables can be expressed as:
y = β0 + β1.x1 + β2.x2
In machine learning world, there can be many dimensions. A model with three
input variables can be expressed as:
y = β0 + β1.x1 + β2.x2 + β3.x3
A generalized equation for the multivariate regression model can be:
y = β0 + β1.x1 + β2.x2 +….. + βn.xn
MULTIVARIATE LINEAR
REGRESSION
When we have multiple features
and we want to train a model that
can predict the price given those
features, we can use a
multivariate linear regression.
The model will have to learn the
parameters(theta 0 to theta n) on
the training dataset below such
that if we want to predict the
price for a house that is not sold
yet, it can give us prediction that
is closer to what it will get sold
for.
MULTIVARIATE LINEAR
REGRESSION
COST FUNCTION AND GRADIENT
DESCENT FOR MULTIVARIATE LINEAR
REGRESSION
GRADIENT DESCENT
PRACTICAL IDEAS FOR MAKING
GRADIENT DESCENT WORK WELL
Use feature scaling to help gradient descent converge faster. Get every feature
between -1 and +1 range. It doesn’t have to be exactly in the -1 to +1 range but it
should be close to that range.
PRACTICAL IDEAS FOR
MAKING GRADIENT DESCENT
WORK WELL
PRACTICAL IDEAS FOR
MAKING GRADIENT DESCENT
WORK WELL
MODEL INTERPRETATION
y = -85090 + 102.85 * X1 + 43.79 * X2+ 1.52 * x3 - 37.91 * x4 + 908.12 * x5 + 364.33 * x6
x1: With all other predictors held constant, if the x1 is increased by one unit, the average price increases by
$102.85.
x2: With all other predictors held constant, if the x2 is increased by one unit, the average price increases by
$43.79.
x3: With all other predictors held constant, if the x3 is increased by one unit, the average price increases by
$1.52.
x4: With all other predictors held constant, if the x4 is increased by one unit, the average price decreases by
$37.91 (length has a -ve coefficient).
x5: With all other predictors held constant, if the x5 is increased by one unit, the average price increases by
$908.12
x6t: With all other predictors held constant, if the x6 is increased by one unit, the average
price increases by $364.33
REGULARIZATION
32
PROBLEM OF OVERFITTING
Regularisation is a technique used to reduce the errors by fitting the function appropriately on the given training set and
avoid overfitting.
33
REGULARISATION IN ML
L1 regularisation
L2 regularisation
A regression model which uses L1 Regularisation technique is called LASSO (Least
Absolute Shrinkage and Selection Operator) regression.
34
REGULARISATION
Lasso Regression adds “absolute value of magnitude” of coefficient as penalty term to the
loss function(L).
During Regularisation the output function(y_hat) does not change. The change is only in the loss function.
35
LOSS FUNCTION
36
PROBELM
A clinical trial gave the following data for BMI and
17 140
Cholesterol level for 10 patients. Predict the likely
value of cholesterol level for someone who has a 21 189
BMI of 27. 24 210
28 240
14 130
16 100
19 135
22 166
15 130
18 170
SOLUTION: