Linear Regression With Gradient Descent
Linear Regression With Gradient Descent
Now that we have understood how the gradient descent algorithm finds the optimal parameters of the model, in
this section, we will understand how can we use gradient descent in linear regression and find the optimal
parameter.
Thus, we have two parameters 𝑚 and 𝑏 . We will see how can we use gradient descent and find the optimal
values for these two parameters 𝑚 and 𝑏 .
Data Preparation
References https://medium.com/data-science-365/linear-regression-with-gradient-descent-895bb7d18d52
(https://medium.com/data-science-365/linear-regression-with-gradient-descent-895bb7d18d52)
In [2]: df = pd.read_csv('Advertising.csv')
df.head()
Out[2]:
TV Radio Newspaper Sales
The Advertising dataset captures sales revenue generated with respect to advertisement spends across multiple
channels like radio, TV and newspaper. As you can see, there are four columns in the dataset. Since our
problem definition involves only sales and TV columns in the dataset, we do not need radio and newspaper
columns.
file:///Users/anishroychowdhury/Downloads/Linear_Reg_Gradient_Descent.html 1/8
13/06/2021 Linear_Reg_Gradient_Descent
Out[3]:
TV Sales
0 230.1 22.1
1 44.5 10.4
2 17.2 9.3
3 151.5 18.5
4 180.8 12.9
In [4]: df.isnull().sum()
Out[4]: TV 0
Sales 0
dtype: int64
Parameter Initialization
We know that equation of a simple linear regression is expressed as:
𝑦̂ = 𝑚𝑥 + 𝑏
Thus, we have two parameters 𝑚 and 𝑏 . We store both of these parameter 𝑚 and 𝑏 in an array called theta.
First, we initialize theta with zeros
file:///Users/anishroychowdhury/Downloads/Linear_Reg_Gradient_Descent.html 2/8
13/06/2021 Linear_Reg_Gradient_Descent
Loss function
Mean Squared Error (MSE) of Regression is given as:
𝑁
1 2
𝐽 = ̂
(𝑦 − 𝑦) − −(2)
2𝑁 ∑
𝑖=1
Where 𝑁 is the number of training samples, 𝑦 is the actual value and 𝑦̂ is the predicted value.
We feed the data and the model parameter theta to the loss function which returns the MSE. Remember,
data[,0] has 𝑥 value and data[,1] has 𝑦 value. Similarly, theta [0] has a value of 𝑚 and theta[1] has a value of 𝑏 .
Now, we need to minimize this loss. In order to minimize the loss, we need to calculate the gradient of the loss
function 𝐽 with respect to the model parameters 𝑚 and 𝑏 and update the parameter according to the parameter
update rule. So, first, we will calculate the gradient of the loss function.
file:///Users/anishroychowdhury/Downloads/Linear_Reg_Gradient_Descent.html 3/8
13/06/2021 Linear_Reg_Gradient_Descent
Update Rule
After computing gradients we need to update our model paramater according to our update rule as given
below:
𝑑𝐽
𝑚 = 𝑚 − 𝛼 − −(5)
𝑑𝑚
𝑑𝐽
𝑏 = 𝑏 − 𝛼 − −(6)
𝑑𝑏
Since we stored 𝑚 in theta[0] and 𝑏 in theta[1], we can write our update equation as:
𝑑𝐽
𝜃 = 𝜃 − 𝛼 − −(7)
𝑑𝜃
As we learned in the previous section, updating gradients for just one time will not lead us to the convergence
i.e minimum of the cost function, so we need to compute gradients and the update the model parameter for
several iterations:
MODEL TRAIN
[[3.10205147]
[0.06843237]]
file:///Users/anishroychowdhury/Downloads/Linear_Reg_Gradient_Descent.html 4/8
13/06/2021 Linear_Reg_Gradient_Descent
In [11]: plot_loss(losses,alpha)
file:///Users/anishroychowdhury/Downloads/Linear_Reg_Gradient_Descent.html 5/8
13/06/2021 Linear_Reg_Gradient_Descent
MODEL TEST
Train Accuracy
R_sq RMSE
0 0.63 3.72
Test Accuracy
file:///Users/anishroychowdhury/Downloads/Linear_Reg_Gradient_Descent.html 6/8
13/06/2021 Linear_Reg_Gradient_Descent
R_sq RMSE
0 0.58 3.96
file:///Users/anishroychowdhury/Downloads/Linear_Reg_Gradient_Descent.html 7/8
13/06/2021 Linear_Reg_Gradient_Descent
plt.style.use('ggplot')
fig,ax = plt.subplots(figsize = (9,6))
# plot scatter of raw data
sns.scatterplot(x=xdata,y=ydata,ax=ax,color='red')
In [ ]:
file:///Users/anishroychowdhury/Downloads/Linear_Reg_Gradient_Descent.html 8/8