0% found this document useful (0 votes)
3 views11 pages

08_23ECE216_LinearRegressionUsingGradientDescent

The document discusses the process of linear regression using gradient descent to find the best-fit line for a dataset of known inputs and outputs. It explains the optimization of parameters m and c to minimize the Mean Squared Error (MSE) between the predicted and actual outputs. The document also includes a code snippet for implementing this regression analysis.

Uploaded by

pvsbym
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views11 pages

08_23ECE216_LinearRegressionUsingGradientDescent

The document discusses the process of linear regression using gradient descent to find the best-fit line for a dataset of known inputs and outputs. It explains the optimization of parameters m and c to minimize the Mean Squared Error (MSE) between the predicted and actual outputs. The document also includes a code snippet for implementing this regression analysis.

Uploaded by

pvsbym
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Om Namo Bhagavate Vasudevaya

23ECE216 Machine Learning


Linear Regression using Gradient Descent

Dr. Binoy B Nair


Regression using gradient descent
• We are given a dataset consisting of a set
of known inputs that were given to a
system and corresponding outputs were
noted down.

• We are now supposed to find out the


equation that governs the relationship
between the input and the output.

• i.e. we are supposed to fit a best-fit curve.

• Typically these types of problems are


called ‘regression’ or ‘curve-fitting’
problems.
Regression using gradient descent
• Consider the dataset:
Sample x Y
No.
1 x1 y1
2 x2 y2
3 x3 y3
4 X4 y4
• We are supposed to find the equation that represents the relation
between x and y, i.e of the form y=f(x).
• The simplest equation that can be used to represent this kind of
relationship between x and y is a linear relation : y=mx+c
Regression using gradient descent
• Now, since we have decided that the relationship is linear, we have to
find the optimal values of m and c that ensure that the best-fit line
passes through all the (x,y) points in the dataset, or at least, passes as
close as possible to all the points in the dataset.

• Hence, the problem is now that of finding an optimal m and c that will
minimize the error between the best-fit line and the actual data points.

• Typically, the error measure used in such cases is the Mean Squared
Error (MSE)
Regression using gradient descent
• Now considering the dataset given in the table earlier, the MSE (denoted by E below)
can be expressed as:
𝑁 Sample x Y
1 1 No.
2
𝐸 = . ෍ 𝑚𝑥𝑖 + 𝑐 − 𝑦𝑖 1 x1 y1
𝑁 2
𝑖=1 2 x2 y2
3 x3 y3
No. of samples Estimated y , typically denoted by 𝑦ො 4 X4 y4
i.e.

1 1 2 2 2 2
𝐸= . 𝑚𝑥1 + 𝑐 − 𝑦1 + 𝑚𝑥2 + 𝑐 − 𝑦2 + 𝑚𝑥3 + 𝑐 − 𝑦3 + 𝑚𝑥4 + 𝑐 − 𝑦4
𝑁 2
Regression using gradient descent
• Now, the two parameters that need to be optimized are: m and c

𝑚
• Hence, putting this in vector form, we get, say, 𝑃 = .
𝑐

• Starting with a random initial estimate of 𝑚 and 𝑐,say, 𝑚0 and 𝑐0 , the initial vector
is:
𝑚0
𝑃(0) = 𝑐
0

And the subsequent updates will be of the form:

𝑷 𝑘 + 1 = 𝑷 𝑘 − 𝛼𝛻𝐸 𝑷 𝑎𝑡 𝑷 = 𝑷(𝑘)
Sample x Y
No.

How to calculate 𝛻𝐸 ? 1
2
x1
x2
y1
y2
3 x3 y3
𝜕𝐸 𝜕𝐸
𝛻𝐸 = [ , ]
𝜕𝑚 𝜕𝑐 4 X4 y4
Here,
1 1 2 2 2 2
𝐸= . 𝑚𝑥1 + 𝑐 − 𝑦1 + 𝑚𝑥2 + 𝑐 − 𝑦2 + 𝑚𝑥3 + 𝑐 − 𝑦3 + 𝑚𝑥4 + 𝑐 − 𝑦4
4 2

𝜕𝐸 1
= . 𝑥1 𝑚𝑥1 + 𝑐 − 𝑦1 + 𝑥2 𝑚𝑥2 + 𝑐 − 𝑦2 + 𝑥3 𝑚𝑥3 + 𝑐 − 𝑦3 + 𝑥4 𝑚𝑥4 + 𝑐 − 𝑦4
𝜕𝑚 4

1
= . 𝑚𝑥12 + 𝑐𝑥1 − 𝑥1 𝑦1 + 𝑚𝑥22 + 𝑐𝑥2 − 𝑥2 𝑦2 + 𝑚𝑥32 + 𝑐𝑥3 − 𝑥3 𝑦3 + 𝑚𝑥42 + 𝑐𝑥4 − 𝑥4 𝑦4
4

1
= . 𝑚𝑥12 + 𝑚𝑥22 + 𝑚𝑥32 + 𝑚𝑥42 + 𝑐𝑥1 + 𝑐𝑥2 + 𝑐𝑥3 + 𝑐𝑥4 ) − (𝑥1 𝑦1 + 𝑥2 𝑦2 + 𝑥3 𝑦3 + 𝑥4 𝑦4
4

𝟏
= . 𝒎 𝒙𝟐𝟏 + 𝒙𝟐𝟐 + 𝒙𝟐𝟑 + 𝒙𝟐𝟒 + 𝒄 𝒙𝟏 + 𝒙𝟐 + 𝒙𝟑 + 𝒙𝟒 ) − (𝒙𝟏 𝒚𝟏 + 𝒙𝟐 𝒚𝟐 + 𝒙𝟑 𝒚𝟑 + 𝒙𝟒 𝒚𝟒
𝟒

Hence, generalizing:
𝑵 𝑵 𝑵
𝝏𝑬 𝟏
= 𝒎 ෍ 𝒙𝟐𝒊 + 𝒄 ෍ 𝒙𝒊 − ෍ 𝒚𝒊 𝒙𝒊
𝝏𝒎 𝑵
𝒊=𝟏 𝒊=𝟏 𝒊=𝟏
How to calculate 𝛻𝐸 ?.... Contd
𝜕𝐸 1
= . 𝑚𝑥1 + 𝑐 − 𝑦1 + 𝑚𝑥2 + 𝑐 − 𝑦2 + 𝑚𝑥3 + 𝑐 − 𝑦3 + 𝑚𝑥4 + 𝑐 − 𝑦4
𝜕𝑐 4

𝟏
= . 𝒎 𝒙𝟏 + 𝒙𝟐 + 𝒙𝟑 + 𝒙𝟒 + 𝟒𝒄 − (𝒚𝟏 + 𝒚𝟐 + 𝒚𝟑 + 𝒚𝟒 )
𝟒

Generalizing:

𝑵 𝑵
𝜕𝐸 𝟏
= 𝒎 ෍ 𝒙𝒊 + 𝑵𝒄 − ෍ 𝒚𝒊
𝜕𝑐 𝑵
𝒊=𝟏 𝒊=𝟏
Summarizing
• Given a dataset (𝑿, 𝒀):
• Initialize: α, 𝑷 𝟎 , 𝒕𝒐𝒍
• K=0

Do
𝑷 𝑘 + 1 = 𝑷 𝑘 − 𝛼𝛻𝐸 𝑷 𝑎𝑡 𝑷 = 𝑷(𝑘)
𝑘 =𝑘+1
While 𝑷 𝑘 − 𝑷 𝑘 − 1 > 𝑡𝑜𝑙
code
% code to find the optimal m and c values sum_y=sum(y); while change > tol
to fit a regression line y=mx+c
x_squared=x.^2; m=m_c_vector(1,count);
% when the inputs x and the corresponding c=m_c_vector(2,count);
outputs y are given, using gradient descent. sum_x_squared=sum(x_squared);
doE_by_dom=((m*sum_x_squared)+(c*sum_x)-
clc xy=x.*y; sum_xy)/N;
sum_xy=sum(xy); doE_by_doc=(m*sum_x+(c*N)-sum_y)/N;
clear all
grad_E(:,count)=[doE_by_dom,doE_by_doc]';
% input and corresponding outputs % initializing gradient vector m_c_vector(:,count+1)=m_c_vector(:,count)-
(usually supplied by user) begin
m_c_vector(:,1)=[0.4,0.4]'; % this vector (alpha*grad_E(:,count));
x=[1,2,3,4]';
will never be used, it is just to ensure that
y=[5,7,9,11]'; % in the first iteration, m_c_vector(:,count- count=count+1;
diff(:,count)=m_c_vector(:,count)-
scatter(x,y,'filled'); 1) does not show up error m_c_vector(:,count-1);
% input and corresponding outputs end % because count=0 will result in indexing change=sum(abs(diff(:,count)));
alpha=0.1; error end
m_c_vector(:,2)=[0.5,0.5]';
tol=0.001;
count=2; hold on
N=length(x);
sum_x=sum(x); diff(:,count)=m_c_vector(:,count)- for i=1:1:length(m_c_vector)
y_cap(:,i)=(m_c_vector(1,i)*x)+m_c_vector(2,i);
m_c_vector(:,count-1); plot(x,y_cap(:,i))
change=sum(abs(diff(:,count))); pause(0.1)
end
Thank You

You might also like