Gradient Descent With RMSProp from Scratch
Last Updated :
04 Mar, 2025
RMSprop modifies the traditional gradient descent algorithm by adapting the learning rate for each parameter based on the magnitude of recent gradients. The key advantage of RMSprop is that it helps to smooth the parameter updates and avoid oscillations, particularly when gradients fluctuate over time or dimensions.
The update rule for RMSprop is given by:
\theta_{new} = \theta_{old} - \frac{\eta}{\sqrt{E[\nabla_\theta J(\theta)]^2 + \epsilon}} \cdot \nabla_\theta J(\theta)
Key Steps of RMSprop:
- Compute the gradient: As in gradient descent, calculate the gradient of the objective function with respect to each parameter.
- Maintain an exponentially decaying average of the squared gradients: This helps adjust the step size dynamically for each parameter.
- Update parameters: Instead of using a fixed learning rate, RMSprop uses the moving average of the squared gradients to normalize the updates.
Implementation of RMSprop from Scratch
Let’s implement the RMSprop optimizer from scratch and use it to minimize a simple quadratic objective function.
1. Defining the Objective Function
We will begin by defining a simple quadratic objective function:
f(x_1, x_2) = 5x_1^2 + 7x_2^2
This function is convex and has a global minimum at x_1 = 0, x_2 = 0, which makes it an ideal candidate for demonstrating optimization techniques.
Python
import numpy as np
import matplotlib.pyplot as plt
from numpy import arange, meshgrid
# Objective function
def objective(x1, x2):
return 5 * x1**2.0 + 7 * x2**2.0
# Derivative of the objective function w.r.t x1
def derivative_x1(x1, x2):
return 10.0 * x1
# Derivative of the objective function w.r.t x2
def derivative_x2(x1, x2):
return 14.0 * x2
2. Visualizing the Objective Function
To better understand the optimization landscape, let's visualize the objective function using both a 3D surface plot and a contour plot.
Python
x1 = arange(-5.0, 5.0, 0.1)
x2 = arange(-5.0, 5.0, 0.1)
# Creating meshgrid for x1 and x2
x1, x2 = meshgrid(x1, x2)
# Calculating the objective function
y = objective(x1, x2)
# Plotting the objective function
fig = plt.figure(figsize=(12, 4))
# 3D Surface Plot
ax = fig.add_subplot(1, 2, 1, projection='3d')
ax.plot_surface(x1, x2, y, cmap='viridis')
ax.set_xlabel('x1')
ax.set_ylabel('x2')
ax.set_zlabel('y')
ax.set_title('3D plot of the objective function')
# 2D Contour Plot
ax = fig.add_subplot(1, 2, 2)
ax.contour(x1, x2, y, cmap='viridis', levels=20)
ax.set_xlabel('x1')
ax.set_ylabel('x2')
ax.set_title('Contour plot of the objective function')
plt.show()
Output:
3D and Contour plot of Objective Function3. Implementing RMSprop
Next, we’ll implement the RMSprop optimization algorithm. The algorithm will update the parameters x_1 and x_2​ iteratively by using the gradients and adjusting the learning rate dynamically.
Python
def rmsprop(x1, x2, derivative_x1, derivative_x2, learning_rate, gamma, epsilon, max_epochs):
x1_trajectory = []
x2_trajectory = []
y_trajectory = []
x1_trajectory.append(x1)
x2_trajectory.append(x2)
y_trajectory.append(objective(x1, x2))
e1 = 0
e2 = 0
# Gradient Descent Loop
for _ in range(max_epochs):
# Calculating derivatives of objective function
gt_x1 = derivative_x1(x1, x2)
gt_x2 = derivative_x2(x1, x2)
# Calculating exponentially weighted averages of derivatives
e1 = gamma * e1 + (1 - gamma) * gt_x1**2.0
e2 = gamma * e2 + (1 - gamma) * gt_x2**2.0
# Updating the values of x1 and x2
x1 = x1 - learning_rate * gt_x1 / (np.sqrt(e1 + epsilon))
x2 = x2 - learning_rate * gt_x2 / (np.sqrt(e2 + epsilon))
# Appending the values of x1, x2, and y
x1_trajectory.append(x1)
x2_trajectory.append(x2)
y_trajectory.append(objective(x1, x2))
return x1_trajectory, x2_trajectory, y_trajectory
4. Running the RMSprop Algorithm
Let’s now run the RMSprop algorithm for 50 iterations starting from an initial guess of x_1 = -4.0 and x_2 = 3.0.
Python
# Defining the initial values of x1, x2, and other hyperparameters
x1_initial = -4.0
x2_initial = 3.0
learning_rate = 0.1
gamma = 0.9
epsilon = 1e-8
max_epochs = 50
# RMSprop algorithm
x1_trajectory, x2_trajectory, y_trajectory = rmsprop(
x1_initial,
x2_initial,
derivative_x1,
derivative_x2,
learning_rate,
gamma,
epsilon,
max_epochs
)
# Printing the optimal values of x1, x2, and y
print('The optimal value of x1 is:', x1_trajectory[-1])
print('The optimal value of x2 is:', x2_trajectory[-1])
print('The optimal value of y is:', y_trajectory[-1])
Output:
The optimal value of x1 is: -0.10352260359924752
The optimal value of x2 is: 0.0025296212056016548
The optimal value of y is: 0.05362944016394148
5. Visualizing the Optimization Path
Finally, we will plot the path taken by the RMSprop optimizer on the contour plot of the objective function to visualize how it converges to the minimum.
Python
# Displying the path of y in each iteration on the contour plot
fig = plt.figure(figsize=(6, 6))
ax = fig.add_subplot(1, 1, 1)
# Plotting the contour plot
ax.contour(x1, x2, y, cmap='viridis', levels=20)
# Plotting the trajectory of y in each iteration
ax.plot(x1_trajectory, x2_trajectory, '*',
markersize=7, color='dodgerblue')
# Setting the labels and title of the plot
ax.set_xlabel('x1')
ax.set_ylabel('x2')
ax.set_title('RMSprop Optimization path for ' + str(max_epochs) + ' iterations')
plt.show()
Output:

The optimal values of x_1, x_2, and the objective function at the end of the optimization process. The plot shows the trajectory of the optimizer, indicating how the parameters gradually approach the minimum of the objective function.