Gradient Descent With RMSProp from Scratch

Last Updated : 04 Mar, 2025

RMSprop modifies the traditional gradient descent algorithm by adapting the learning rate for each parameter based on the magnitude of recent gradients. The key advantage of RMSprop is that it helps to smooth the parameter updates and avoid oscillations, particularly when gradients fluctuate over time or dimensions.

The update rule for RMSprop is given by:

\theta_{new} = \theta_{old} - \frac{\eta}{\sqrt{E[\nabla_\theta J(\theta)]^2 + \epsilon}} \cdot \nabla_\theta J(\theta)

Key Steps of RMSprop:

Compute the gradient: As in gradient descent, calculate the gradient of the objective function with respect to each parameter.
Maintain an exponentially decaying average of the squared gradients: This helps adjust the step size dynamically for each parameter.
Update parameters: Instead of using a fixed learning rate, RMSprop uses the moving average of the squared gradients to normalize the updates.

Implementation of RMSprop from Scratch

Let’s implement the RMSprop optimizer from scratch and use it to minimize a simple quadratic objective function.

1. Defining the Objective Function

We will begin by defining a simple quadratic objective function:

f(x_1, x_2) = 5x_1^2 + 7x_2^2

This function is convex and has a global minimum at x_1 = 0, x_2 = 0, which makes it an ideal candidate for demonstrating optimization techniques.

Python

import numpy as np
import matplotlib.pyplot as plt
from numpy import arange, meshgrid

# Objective function
def objective(x1, x2):
    return 5 * x1**2.0 + 7 * x2**2.0

# Derivative of the objective function w.r.t x1
def derivative_x1(x1, x2):
    return 10.0 * x1

# Derivative of the objective function w.r.t x2
def derivative_x2(x1, x2):
    return 14.0 * x2

2. Visualizing the Objective Function

To better understand the optimization landscape, let's visualize the objective function using both a 3D surface plot and a contour plot.

Python

x1 = arange(-5.0, 5.0, 0.1)
x2 = arange(-5.0, 5.0, 0.1)

# Creating meshgrid for x1 and x2
x1, x2 = meshgrid(x1, x2)

# Calculating the objective function
y = objective(x1, x2)

# Plotting the objective function
fig = plt.figure(figsize=(12, 4))

# 3D Surface Plot
ax = fig.add_subplot(1, 2, 1, projection='3d')
ax.plot_surface(x1, x2, y, cmap='viridis')
ax.set_xlabel('x1')
ax.set_ylabel('x2')
ax.set_zlabel('y')
ax.set_title('3D plot of the objective function')

# 2D Contour Plot
ax = fig.add_subplot(1, 2, 2)
ax.contour(x1, x2, y, cmap='viridis', levels=20)
ax.set_xlabel('x1')
ax.set_ylabel('x2')
ax.set_title('Contour plot of the objective function')

plt.show()

Output:

3D-and-Contour-plot-of-Objective-Function — 3D and Contour plot of Objective Function

3. Implementing RMSprop

Next, we’ll implement the RMSprop optimization algorithm. The algorithm will update the parameters x_1 and x_2 iteratively by using the gradients and adjusting the learning rate dynamically.

Python

def rmsprop(x1, x2, derivative_x1, derivative_x2, learning_rate, gamma, epsilon, max_epochs): 
	x1_trajectory = [] 
	x2_trajectory = [] 
	y_trajectory = [] 

	x1_trajectory.append(x1) 
	x2_trajectory.append(x2) 
	y_trajectory.append(objective(x1, x2)) 

	e1 = 0
	e2 = 0

	# Gradient Descent Loop 
	for _ in range(max_epochs): 
		# Calculating derivatives of objective function
		gt_x1 = derivative_x1(x1, x2) 
		gt_x2 = derivative_x2(x1, x2) 

		# Calculating exponentially weighted averages of derivatives 
		e1 = gamma * e1 + (1 - gamma) * gt_x1**2.0
		e2 = gamma * e2 + (1 - gamma) * gt_x2**2.0

		# Updating the values of x1 and x2 
		x1 = x1 - learning_rate * gt_x1 / (np.sqrt(e1 + epsilon)) 
		x2 = x2 - learning_rate * gt_x2 / (np.sqrt(e2 + epsilon)) 

		# Appending the values of x1, x2, and y 
		x1_trajectory.append(x1) 
		x2_trajectory.append(x2) 
		y_trajectory.append(objective(x1, x2)) 

	return x1_trajectory, x2_trajectory, y_trajectory

4. Running the RMSprop Algorithm

Let’s now run the RMSprop algorithm for 50 iterations starting from an initial guess of x_1 = -4.0 and x_2 = 3.0.

Python

# Defining the initial values of x1, x2, and other hyperparameters 
x1_initial = -4.0
x2_initial = 3.0
learning_rate = 0.1
gamma = 0.9
epsilon = 1e-8
max_epochs = 50

# RMSprop algorithm 
x1_trajectory, x2_trajectory, y_trajectory = rmsprop( 
												x1_initial, 
												x2_initial, 
												derivative_x1, 
												derivative_x2, 
												learning_rate, 
												gamma, 
												epsilon, 
												max_epochs 
											) 

# Printing the optimal values of x1, x2, and y 
print('The optimal value of x1 is:', x1_trajectory[-1]) 
print('The optimal value of x2 is:', x2_trajectory[-1]) 
print('The optimal value of y is:', y_trajectory[-1])

Output:

The optimal value of x1 is: -0.10352260359924752
The optimal value of x2 is: 0.0025296212056016548
The optimal value of y is: 0.05362944016394148

5. Visualizing the Optimization Path

Finally, we will plot the path taken by the RMSprop optimizer on the contour plot of the objective function to visualize how it converges to the minimum.

Python

# Displying the path of y in each iteration on the contour plot 
fig = plt.figure(figsize=(6, 6)) 
ax = fig.add_subplot(1, 1, 1) 

# Plotting the contour plot 
ax.contour(x1, x2, y, cmap='viridis', levels=20) 

# Plotting the trajectory of y in each iteration 
ax.plot(x1_trajectory, x2_trajectory, '*', 
		markersize=7, color='dodgerblue') 

# Setting the labels and title of the plot 
ax.set_xlabel('x1') 
ax.set_ylabel('x2') 
ax.set_title('RMSprop Optimization path for ' + str(max_epochs) + ' iterations') 
plt.show()

Output:

RMSProp-Optimization-path-for-50-iterations-

The optimal values of x_1, x_2, and the objective function at the end of the optimization process. The plot shows the trajectory of the optimizer, indicating how the parameters gradually approach the minimum of the objective function.

Gradient Descent With RMSProp from Scratch

aditya_taparia

Improve

Article Tags :

Practice Tags :

Machine Learning

Gradient Descent With RMSProp from Scratch

Key Steps of RMSprop:

Implementation of RMSprop from Scratch

1. Defining the Objective Function

2. Visualizing the Objective Function

3. Implementing RMSprop

4. Running the RMSprop Algorithm

5. Visualizing the Optimization Path

Similar Reads

Thank You!

What kind of Experience do you want to share?