08_23ECE216_LinearRegressionUsingGradientDescent
08_23ECE216_LinearRegressionUsingGradientDescent
• Hence, the problem is now that of finding an optimal m and c that will
minimize the error between the best-fit line and the actual data points.
• Typically, the error measure used in such cases is the Mean Squared
Error (MSE)
Regression using gradient descent
• Now considering the dataset given in the table earlier, the MSE (denoted by E below)
can be expressed as:
𝑁 Sample x Y
1 1 No.
2
𝐸 = . 𝑚𝑥𝑖 + 𝑐 − 𝑦𝑖 1 x1 y1
𝑁 2
𝑖=1 2 x2 y2
3 x3 y3
No. of samples Estimated y , typically denoted by 𝑦ො 4 X4 y4
i.e.
1 1 2 2 2 2
𝐸= . 𝑚𝑥1 + 𝑐 − 𝑦1 + 𝑚𝑥2 + 𝑐 − 𝑦2 + 𝑚𝑥3 + 𝑐 − 𝑦3 + 𝑚𝑥4 + 𝑐 − 𝑦4
𝑁 2
Regression using gradient descent
• Now, the two parameters that need to be optimized are: m and c
𝑚
• Hence, putting this in vector form, we get, say, 𝑃 = .
𝑐
• Starting with a random initial estimate of 𝑚 and 𝑐,say, 𝑚0 and 𝑐0 , the initial vector
is:
𝑚0
𝑃(0) = 𝑐
0
𝑷 𝑘 + 1 = 𝑷 𝑘 − 𝛼𝛻𝐸 𝑷 𝑎𝑡 𝑷 = 𝑷(𝑘)
Sample x Y
No.
How to calculate 𝛻𝐸 ? 1
2
x1
x2
y1
y2
3 x3 y3
𝜕𝐸 𝜕𝐸
𝛻𝐸 = [ , ]
𝜕𝑚 𝜕𝑐 4 X4 y4
Here,
1 1 2 2 2 2
𝐸= . 𝑚𝑥1 + 𝑐 − 𝑦1 + 𝑚𝑥2 + 𝑐 − 𝑦2 + 𝑚𝑥3 + 𝑐 − 𝑦3 + 𝑚𝑥4 + 𝑐 − 𝑦4
4 2
𝜕𝐸 1
= . 𝑥1 𝑚𝑥1 + 𝑐 − 𝑦1 + 𝑥2 𝑚𝑥2 + 𝑐 − 𝑦2 + 𝑥3 𝑚𝑥3 + 𝑐 − 𝑦3 + 𝑥4 𝑚𝑥4 + 𝑐 − 𝑦4
𝜕𝑚 4
1
= . 𝑚𝑥12 + 𝑐𝑥1 − 𝑥1 𝑦1 + 𝑚𝑥22 + 𝑐𝑥2 − 𝑥2 𝑦2 + 𝑚𝑥32 + 𝑐𝑥3 − 𝑥3 𝑦3 + 𝑚𝑥42 + 𝑐𝑥4 − 𝑥4 𝑦4
4
1
= . 𝑚𝑥12 + 𝑚𝑥22 + 𝑚𝑥32 + 𝑚𝑥42 + 𝑐𝑥1 + 𝑐𝑥2 + 𝑐𝑥3 + 𝑐𝑥4 ) − (𝑥1 𝑦1 + 𝑥2 𝑦2 + 𝑥3 𝑦3 + 𝑥4 𝑦4
4
𝟏
= . 𝒎 𝒙𝟐𝟏 + 𝒙𝟐𝟐 + 𝒙𝟐𝟑 + 𝒙𝟐𝟒 + 𝒄 𝒙𝟏 + 𝒙𝟐 + 𝒙𝟑 + 𝒙𝟒 ) − (𝒙𝟏 𝒚𝟏 + 𝒙𝟐 𝒚𝟐 + 𝒙𝟑 𝒚𝟑 + 𝒙𝟒 𝒚𝟒
𝟒
Hence, generalizing:
𝑵 𝑵 𝑵
𝝏𝑬 𝟏
= 𝒎 𝒙𝟐𝒊 + 𝒄 𝒙𝒊 − 𝒚𝒊 𝒙𝒊
𝝏𝒎 𝑵
𝒊=𝟏 𝒊=𝟏 𝒊=𝟏
How to calculate 𝛻𝐸 ?.... Contd
𝜕𝐸 1
= . 𝑚𝑥1 + 𝑐 − 𝑦1 + 𝑚𝑥2 + 𝑐 − 𝑦2 + 𝑚𝑥3 + 𝑐 − 𝑦3 + 𝑚𝑥4 + 𝑐 − 𝑦4
𝜕𝑐 4
𝟏
= . 𝒎 𝒙𝟏 + 𝒙𝟐 + 𝒙𝟑 + 𝒙𝟒 + 𝟒𝒄 − (𝒚𝟏 + 𝒚𝟐 + 𝒚𝟑 + 𝒚𝟒 )
𝟒
Generalizing:
𝑵 𝑵
𝜕𝐸 𝟏
= 𝒎 𝒙𝒊 + 𝑵𝒄 − 𝒚𝒊
𝜕𝑐 𝑵
𝒊=𝟏 𝒊=𝟏
Summarizing
• Given a dataset (𝑿, 𝒀):
• Initialize: α, 𝑷 𝟎 , 𝒕𝒐𝒍
• K=0
Do
𝑷 𝑘 + 1 = 𝑷 𝑘 − 𝛼𝛻𝐸 𝑷 𝑎𝑡 𝑷 = 𝑷(𝑘)
𝑘 =𝑘+1
While 𝑷 𝑘 − 𝑷 𝑘 − 1 > 𝑡𝑜𝑙
code
% code to find the optimal m and c values sum_y=sum(y); while change > tol
to fit a regression line y=mx+c
x_squared=x.^2; m=m_c_vector(1,count);
% when the inputs x and the corresponding c=m_c_vector(2,count);
outputs y are given, using gradient descent. sum_x_squared=sum(x_squared);
doE_by_dom=((m*sum_x_squared)+(c*sum_x)-
clc xy=x.*y; sum_xy)/N;
sum_xy=sum(xy); doE_by_doc=(m*sum_x+(c*N)-sum_y)/N;
clear all
grad_E(:,count)=[doE_by_dom,doE_by_doc]';
% input and corresponding outputs % initializing gradient vector m_c_vector(:,count+1)=m_c_vector(:,count)-
(usually supplied by user) begin
m_c_vector(:,1)=[0.4,0.4]'; % this vector (alpha*grad_E(:,count));
x=[1,2,3,4]';
will never be used, it is just to ensure that
y=[5,7,9,11]'; % in the first iteration, m_c_vector(:,count- count=count+1;
diff(:,count)=m_c_vector(:,count)-
scatter(x,y,'filled'); 1) does not show up error m_c_vector(:,count-1);
% input and corresponding outputs end % because count=0 will result in indexing change=sum(abs(diff(:,count)));
alpha=0.1; error end
m_c_vector(:,2)=[0.5,0.5]';
tol=0.001;
count=2; hold on
N=length(x);
sum_x=sum(x); diff(:,count)=m_c_vector(:,count)- for i=1:1:length(m_c_vector)
y_cap(:,i)=(m_c_vector(1,i)*x)+m_c_vector(2,i);
m_c_vector(:,count-1); plot(x,y_cap(:,i))
change=sum(abs(diff(:,count))); pause(0.1)
end
Thank You