Unit 3c Linear Regression
Unit 3c Linear Regression
• Classification Problems
• Prediction of cancer
• Win prediction of Sh. Narendra Modi ji
• Diabetic Prediction
• Classification of e-mail
Understand the Data……..
Classification & Regression
Data Set : UCI Library
Google “uci dataset”
Linear Regression
A statistical method that is used for predictive analysis
Makes predictions for continuous/real or numeric variables such as sales, salary,
age, product price, etc.
linear regression shows the linear relationship, which means it finds how the
value of the dependent variable is changing according to the value of the
independent variable
Linear regression performs the task to predict a
dependent variable value (y) based on a given
independent variable (x)
If there is a single input variable (x), such linear regression is called simple linear regression
y= b*x+a
or
y= a0+a1*x
• Y= Dependent Variable (Target Variable) (ESTIMATED OUTPUT)
• X= Independent Variable (predictor Variable) (INPUT)
• a0= Intercept of the line (Gives an additional degree of freedom) (regression coefficient)
• a1 = Linear regression coefficient (scale factor to each input value) (SLOPE) (regression coefficient)
The motive of the linear regression algorithm is to find the best values for a_0 and a_1
X: amount of fertilizer, y: size of crop (every time we add a unit to X, the dep variable y increases
proportionally)
Multiple Linear Regression
If there is more than one input variable, such linear regression is called multiple linear
regression
y= a + b_1*x_1 + b_2*x_2 + b_3*x_3+… + b_n*x_n
or
y= a_0 + a_1*x_1 + a_2*x_2+.. + a_n*x_n
Eg: Add amount of sunlight and rainfall in a growing season to the fertilizer variable, with all 3
affecting y
This provides the average squared error over all the data points. Therefore, this cost function is also known as
the Mean Squared Error (MSE) function
Now, using this MSE function we are going to change the values of a_0 and a_1 such that the MSE value settles
at the minima
Linear regression: A sample curve-fitting
• Y = m*x + b
• Method of least
square
• Gradient descent
approach
16
Ordinary Least Squares (OLS) Method
y= m*x+b
• Gradient descent is a method of updating a_0 and a_1 to reduce the cost function (MSE)
• The idea is that we start with some values for a_0 and a_1 and then we change these
values iteratively to reduce the cost
To find these gradients, we take partial derivatives with respect to a_0 and a_1
The partial derivates are the gradients and they are used to update the values of
a_0 and a_1
Alpha is the learning rate which is a hyper-parameter that you must specify
A smaller learning rate could get you closer to the minima but takes more time to
reach the minima, a larger learning rate converges sooner but there is a chance
that you could overshoot the minima
Multiple/Multivariate Linear
Regression
Simple Linear Regression
It’s the simplest form of Linear Regression that is used when there is a single input
variable for the output variable
Multiple Linear Regression
If we have an additional variable (let’s say “experience”) in the previos equation, then it
becomes a multiple regression:
It’s a form of linear regression that is used when there are two or more predictors
Multiple Linear Regression
Multiple linear regression is used to estimate the relationship between two or more
independent variables and one dependent variable. You can use multiple linear regression
when you want to know:
How strong the relationship is between two or more independent variables and one
dependent variable (e.g. how rainfall, temperature, and amount of fertilizer added
affect crop growth)
The value of the dependent variable at a certain value of the independent variables
(e.g. the expected yield of a crop at certain levels of rainfall, temperature, and
fertilizer addition)
Applications
• Prediction of used-car prices based on make, model, year, shift and color
[make, model, year, shi , color] → car prices
• Prediction of the price for a house in the market based on location, lot
size, number of bed rooms, neighborhood characteristics etc.
[loca on, lot size, # bed rooms, crime rate, school ra ngs] → house prices
Simple vs. Multi Linear Regression
Can we use simple linear regression to study our output against all independent
variables separately? NO
Some of these factors will affect the price of the house positively
For example: more the area, the more the price
On the other hand, factors like distance from the workplace, and the crime rate can
influence your estimate of the house negatively
Running separate simple linear regressions will lead to different outcomes when we
are interested in just one
There may be an input variable that is itself correlated with or dependent on some
other predictor. This can cause wrong predictions and unsatisfactory results
Multiple Linear Regression
The summation of square of difference between our predicted value and the actual value
divided by twice of length of data set
A smaller mean squared error implies a better performance
In Simple Linear Regression, we can see how each advertising medium affects
sales when applied without the other two media
However, in practice, all three might be working together to impact net sales. We
did not consider the combined effect of these media on sales.
Multiple Linear Regression solves the problem by taking account of all the
variables in a single expression.
Applying Multi-Regression
This is done by minimizing the Residual Sum of Squares (RSS), which is obtained
by squaring the differences between actual and predicted outcomes.
Applying Multi Regression to get the coefficients
• If we fix the budget for TV &
Output: newspaper, then increasing the radio
Intercept 2.938889 budget by $1000 will lead to an
increase in sales by around 189
TV 0.045765 units(0.189*1000)
radio 0.188530
newspaper -0.001037
• Similarly, by fixing the radio &
newspaper, we infer an approximate
rise of 46 units of products per $1000
increase in the TV budget
Applying Multi Regression to get the coefficients
• If there is unit increase in sales value, then
we need to increase the expenditure of TV
Output: by 0.045 units and so on.
Intercept 2.938889 • For the newspaper budget, since the
coefficient is quite negligible (close to zero),
TV 0.045765 it’s evident that the newspaper is not
radio 0.188530 affecting the sales.
newspaper -0.001037 • In fact, it’s on the negative side of zero(-
0.001) which, if the magnitude was big
enough, could have meant that this agent is
rather causing the sales to fall.
• So we don’t have to do that much
expenditure
Applying Multi Regression to get the coefficients
Output:
If we run Simple Linear Regression using just
Intercept 2.938889
the newspaper budget against sales, we’ll
observe the coefficient value of around 0.055,
TV 0.045765
which is quite significant in comparison to what
radio 0.188530
we saw above
newspaper -0.001037
Collinearity
How these variables are correlated with each other?
If two independent variables are too highly correlated, then only one of them
should be used in the regression model.
ad = pd.read_csv("Advertising.csv")
ad.corr()
Backward Selection: We start with all variables in the model, and remove the variable
that is the least statistically significant. This is repeated until a stopping rule is reached.
For instance, we may stop when there is no further improvement in the model score.
R squared
The coefficient of determination (R-squared) is a statistical metric that is used to measure
how much of the variation in outcome can be explained by the variation in the
independent variables
R2 by itself can't thus be used to identify which predictors should be included in a model
and which should be excluded
0 indicates that the outcome cannot be predicted by any of the independent variables and
1 indicates that the outcome can be predicted without error from the independent
variables
R squared
R² closer to 1 indicates that the model is good and explains the variance in data well. A value
closer to zero indicates a poor model.
Implement multi-linear regression to calculate
price
https://github.com/codebasics/py/blob/master/ML/2_linear_reg_multivariate/2_linear_regression_multivariate.ipynb
Nonlinear Regression
Nonlinear Regression
Simple linear regression relates two variables (X and Y) with a straight line (y =
mx + b), while nonlinear regression relates the two variables in a nonlinear
(curved) relationship
48
Nonlinear Regression
If the data shows a curvy trend, then linear regression will not produce very
accurate results when compared to a non-linear regression because, as the
name implies, linear regression presumes that the data is linear
49
Why Nonlinear Regression?
50
Why Nonlinear Regression?
If the data shows a curvy trend, then linear regression would not produce very
accurate results when compared to a non-linear regression
Simply because, as the name implies, linear regression presumes that the data is
linear
51
Why Nonlinear Regression?
The scatter plot shows that there
seems to be a strong relationship
between GDP and time, but the
relationship is not linear
The growth starts off slowly, then
from 2005 onward, the growth is
very significant
Finally, it decelerates slightly in
the 2010
It looks like either a logistical or
exponential function
So, it requires a special estimation
method of the non-linear
regression procedure
52
Methods to handle nonlinear data
Polynomial Regression
53
Polynomial Regression
54
Polynomial Regression
Essentially any relationship that is not linear can be termed as non-linear and is
usually represented by the polynomial of degrees (maximum power of )
= + 1 + 2
2 +…… + k
k
55
Implementation
https://www.geeksforgeeks.org/python-implementation-of-polynomial-
regression/
56
Logistic Regression
Classification
Classification is a process of categorizing a given set of data into classes
The process starts with predicting the class of given data points
by measurement of root
Method of calculation by measuring accuracy
mean square error
For example,
To predict whether an email is spam (1) or not (0)
Logistic Regression model builds a regression model to predict the probability that a
given data entry belongs to the category numbered as “1”
Linear vs. Logistic Regression
Sigmoid Function (Threshold=0.5)
Sigmoid Function
Sigmoid Function σ(z) =
z = wi xi + b
= w1x1 + w2x2 +… wnxn + b
= w1x1 + w2x2 +… wnxn + b
= wTx+b
y= ( ) = ( )
Sigmoid Function
If ‘z’ goes to infinity,
Y(predicted) will become 1
The Weights (W) are trained so as to minimize the cost defined as Mean
Squared Error (MSE)
Loss (Error) Function
But we won't use this notation because it leads us to optimization problem which is non
convex, means it contains local optimum points
Loss (Error) Function
CASE 1: If y = 1 ==> L(y',1) = -log(y') ==> we want y' to be the largest ==> y' biggest value
is 1
(But y' can’t be larger that 1 (sigmoid function). So we want it to be close to 1
CASE 2: If y = 0 ==> L(y',0) = -log(1-y') ==> we want 1-y' to be the largest ==> y' to be
smaller as possible because it can only has 1 value
Loss (Error) Function
For single training example
For logistic regression, the Cost function is defined as:
−log(y') if y = 1
−log(1−y') if y = 0
The loss function computes the error for a single training example
The cost function is the average of the loss functions of the entire
training set
It is also known as logarithmic loss or log loss
Gradient Descent
• To minimize the cost function, we need to run the gradient descent function on
each parameter i.e.
w = w - alpha * dw
Gradient Descent
• First we initialize w and b to 0,0 or initialize them to a random value in the convex
function and then try to improve the values the reach minimum value
w = w - alpha * d(J(w,b) / dw) (how much the function slopes in the w direction)
b = b - alpha * d(J(w,b) / db) (how much the function slopes in the b direction)
Gradient Descent
Lower value of “alpha” is preferred, because if the learning rate is a big number then we may
miss the minimum point and keep on oscillating in the convex curve
Derivatives (slope of function)
• if a = 2 then f(a) = 6
To conclude, Derivative is the slope and slope is different in different points in the
function
Computational Graph
J(a,b,c) = 3(a+bc)
u = bc
v = a+u
J = 3v
Computational Graph
But for derivatives there is Right to Left computation (Backward) to yield derivative of
final output variable
Derivatives with Computational Graph
dJ/dV=?
If we take value of V and change it little bit, How would value of J change?
Derivatives with Computational Graph
=??? =3
Derivatives with Computational Graph
a = 5 = 5.001
v = 11 = 11.001
=1 J =33 = 33.003
=?
= = 3*1= 3
Derivatives with Computational Graph
Derivatives with Computational Graph
Logistic Regression Derivatives
Then from right to left we will calculate derivations compared to the result:
w1 = w1 - alpa * dw1
w2 = w2 - alpa * dw2
b = b - alpa * db
We are done with computing derivatives and implement GD w.r.t a single training example
For m training examples-> divide by m
Logistic Regression Pseudo code
J = 0; dw1 = 0; dw2 =0; db = 0; # Devs.
w1 = 0; w2 = 0; b=0; # Weights
for i = 1 to m
# Forward pass
z(i) = W1*x1(i) + W2*x2(i) + b
a(i) = Sigmoid(z(i))
J += (-Y(i)*log(a(i)) + (1-Y(i))*log(1-a(i)))
# Backward pass
dz(i) = a(i) - Y(i)
dw1 += dz(i) * x1(i)
dw2 += dz(i) * x2(i)
db += dz(i)
J /= m
dw1/= m
dw2/= m
db/= m
# Gradient descent
w1 = w1 - alpa * dw1
w2 = w2 - alpa * dw2
b = b - alpa * db
Practical: Implement logistic regression to
predict Heart Disease
https://www.w3schools.com/python/python_ml_logistic_regression.asp
https://www.kaggle.com/datasets/dileep070/heart-disease-prediction-using-logistic-regression