IMPLEMENT THE GRADIENT DESCENT ALGORITHM FOR A THREE- STAGE FULLY
CONNECTED NEURAL NETWORK USING PYTHON AND
BASIC PACKAGES SUCH AS NUMPY AND MATPLOTLIB
EXPERIMENT 1
STRUCTURE OF THE NEURAL NETWORK
• Three-Stage Network:
• Input Layer: Takes the
input data.
• Hidden Layer: Processes
input from the input layer.
• Output Layer: Produces
the final output.
GRADIENT DESCENT
• Definition: Gradient Descent is an optimization algorithm used to
minimize the cost function by iteratively adjusting the weights.
• A first order Optimization Algorithm used to find Minima(Primarily) or
Maxima(Gradient Ascent) of a given function.
• Steps Involved:
1. Calculate the Predicted Output (Forward Propagation)
2. Compute the Loss/Error
3. Calculate the Gradient of the Loss (Backward Propagation)
4. Update the Weights Using the Gradients
REAL-TIME WORKING OF GRADIENT
DESCENT ALGORITHM
• Initialize Weights: Start with random weights.
• Forward Propagation: Calculate the predicted output using current weights.
• Compute Loss: Measure the error between predicted and actual output.
• Back Propagation: Calculate the gradient of the loss function w.r.t weights.
• Update Weights: Adjust weights in the direction of the negative gradient.
• Repeat: Iterate through the above steps until convergence.
MATHEMATICS HERE?
• We assume a parabola here, as its helpful in visualizing the basic principles of
gradient Descent.
• Updating Parameters based on gradient.
• Concept of moving towards a minimum.
• A single parabola will have only one minima & that would be global minima
• Cost Function(loss) we take as Mean Squared Error form.
• Aim is to minimize the cost.
• There are certain parameters involved here along with input & output.
FINDING MINIMA
• Gradient Descent in Action:
• Start at an initial point.
• Compute the gradient (slope) of the cost function.
• Take steps proportional to the negative gradient.
• Continue until the slope is near zero, indicating a minimum.
• Challenges:
• May get stuck in local minima.
• Requires careful tuning of parameters.
LEARNING RATE:
• Definition: The learning rate is a hyperparameter that controls how much
to change the model(how big the steps) in response to the estimated error
each time the model weights are updated.
• Importance: Determines the size of the steps taken towards the minimum
of the cost function.
• Effects:
• Too High: Can cause the algorithm to converge too quickly to a suboptimal
solution or even diverge.
• Too Low: Results in a slow convergence, potentially getting stuck in local minima.
BATCH SIZE
• Definition: The number of training examples utilized in one iteration.
• Variants:
• Batch Gradient Descent: Uses the entire dataset to compute the gradient.
• Stochastic Gradient Descent (SGD): Uses one training example per
iteration.
• Mini-batch Gradient Descent: Uses a subset of the training data (a mini-
batch) for each iteration.
• Importance: Affects the stability and speed of convergence.
NUMBER OF ITERATIONS (EPOCHS)
• Definition: The number of times the entire training dataset is passed
forward and backward through the neural network.
• Importance: Determines how many times the weights are updated.
• Effects:
• More Iterations: Typically improves the accuracy but requires more
computational resources.
• Fewer Iterations: Can lead to underfitting, where the model fails to learn
adequately from the training data.
CODING
• Importing Libraries
1. NumPy: Provides support for arrays and matrices.
2. Matplotlib: Used for plotting graphs.
CODING HINTS
• Defining the Neural Network Class which encapsulates the neural network logic within a
class for better organization.
• Initialization of Weights: Randomly initialize weights & Set a learning rate for the gradient
descent.
• Forward Propagation: Compute the dot product of inputs and weights.
• Backward Propagation and Gradient Calculation: Derivative of the sigmoid function, used
to calculate gradients.
• Gradient Descent Method: Iteratively update weights to minimize the cost function.
• Training the Network: Define the dataset & Train the neural network using gradient
descent.
CONCLUSION:
• Implemented a three-stage neural network with Python using NumPy
and Matplotlib.
• Applied gradient descent to optimize weights and minimize the cost
function.
• Understanding gradient descent provides a foundation for deeper
machine learning exploration, after which we can explore more
advanced optimization techniques.
THANK YOU