0% found this document useful (0 votes)
4 views

4

The document outlines the implementation of the Backpropagation algorithm for deep artificial neural networks (ANNs) with two models: one using sigmoid activations and the other using ReLU activations in hidden layers, both with softmax output. The models are trained on the MNIST dataset for 1000 epochs, with hyperparameters tuned using a validation set. Results show that the ReLU model outperforms the sigmoid model, demonstrating improved accuracy and loss metrics.

Uploaded by

ritammaiti2016
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

4

The document outlines the implementation of the Backpropagation algorithm for deep artificial neural networks (ANNs) with two models: one using sigmoid activations and the other using ReLU activations in hidden layers, both with softmax output. The models are trained on the MNIST dataset for 1000 epochs, with hyperparameters tuned using a validation set. Results show that the ReLU model outperforms the sigmoid model, demonstrating improved accuracy and loss metrics.

Uploaded by

ritammaiti2016
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Write Python code to implement the Backpropagation algorithm for ANN with more than

two hidden layers. Develop two such deep models with the following configurations.

Model 1: uses sigmoid activations at the hidden nodes and softmax activation function at
the output nodes.

Model 2: uses ReLu activation function at the hidden nodes and softmax activation
function at the output nodes.

The hyperparameters (e.g. learning rate, momentum, number of hidden layers, number of
hidden nodes per layer) for both models should be properly tuned using a validation set.
Compare the performance of these two models on MNIST dataset when both the models
are trained up to 1000 epochs.

Introduction
Artificial Neural Networks (ANNs) are widely used for various machine learning tasks, including image
classification. One of the most critical learning techniques in ANNs is the backpropagation algorithm,
which adjusts the weights of neurons based on error gradients. In this assignment, we implement
backpropagation for deep ANNs with more than two hidden layers and analyze their performance on the
MNIST dataset.

Backpropagation Algorithm

Backpropagation is an iterative, gradient-based optimization algorithm used to train neural networks. It


consists of the following steps:

1. Forward Propagation: Compute activations of neurons from the input layer to the output layer.

2. Compute Loss: Measure the error between predicted and actual outputs using a loss function
(e.g., categorical cross-entropy for classification).

3. Backward Propagation: Compute the gradients of the loss function with respect to the network's
parameters using the chain rule of differentiation.

4. Weight Update: Adjust weights using a gradient descent-based optimizer, such as Stochastic
Gradient Descent (SGD) or Adam.

Model Configurations

We develop two deep models with different activation functions at the hidden layers:

• Model 1: Uses the sigmoid activation function in the hidden layers and the softmax activation
function in the output layer.

• Model 2: Uses the ReLU activation function in the hidden layers and the softmax activation
function in the output layer.

The hyperparameters, such as learning rate, momentum, number of hidden layers, and number of hidden
nodes per layer, are tuned using a validation set. Both models are trained for 1000 epochs.
Import Required Libraries

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
import matplotlib.pyplot as plt

# Load MNIST dataset


(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0 # Normalize data
y_train, y_test = to_categorical(y_train), to_categorical(y_test)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.


11490434/11490434 ━━━━━━━━━━━━━━━━━━━━ 0s 0us/step

Model 1: Sigmoid Hidden Layers

model1 = Sequential([
Flatten(input_shape=(28, 28)),
Dense(256, activation='sigmoid'),
Dense(128, activation='sigmoid'),
Dense(64, activation='sigmoid'),
Dense(10, activation='softmax')
])

model1.compile(optimizer=Adam(learning_rate=0.001),
loss='categorical_crossentropy',
metrics=['accuracy'])

/usr/local/lib/python3.11/dist-packages/keras/src/layers/reshaping/flatten.py:37: UserWa
super().__init__(**kwargs)

Model 2: ReLU Hidden Layers

model2 = Sequential([
Flatten(input_shape=(28, 28)),
Dense(256, activation='relu'),
Dense(128, activation='relu'),
Dense(64, activation='relu'),
Dense(10, activation='softmax')
])

model2.compile(optimizer=Adam(learning_rate=0.001),
loss='categorical_crossentropy',
metrics=['accuracy'])

Training the Models

history1 = model1.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=10, batch_s


history2 = model2.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=10, batch_s

Epoch 1/10
938/938 ━━━━━━━━━━━━━━━━━━━━ 6s 7ms/step - accuracy: 0.9853 - loss: 0.0506 - val_accurac
Epoch 2/10
938/938 ━━━━━━━━━━━━━━━━━━━━ 8s 9ms/step - accuracy: 0.9891 - loss: 0.0379 - val_accurac
Epoch 3/10
938/938 ━━━━━━━━━━━━━━━━━━━━ 6s 7ms/step - accuracy: 0.9909 - loss: 0.0310 - val_accurac
Epoch 4/10
938/938 ━━━━━━━━━━━━━━━━━━━━ 8s 8ms/step - accuracy: 0.9926 - loss: 0.0262 - val_accurac
Epoch 5/10
938/938 ━━━━━━━━━━━━━━━━━━━━ 10s 8ms/step - accuracy: 0.9954 - loss: 0.0174 - val_accura
Epoch 6/10
938/938 ━━━━━━━━━━━━━━━━━━━━ 9s 7ms/step - accuracy: 0.9964 - loss: 0.0144 - val_accurac
Epoch 7/10
938/938 ━━━━━━━━━━━━━━━━━━━━ 10s 7ms/step - accuracy: 0.9965 - loss: 0.0116 - val_accura
Epoch 8/10
938/938 ━━━━━━━━━━━━━━━━━━━━ 8s 8ms/step - accuracy: 0.9974 - loss: 0.0097 - val_accurac
Epoch 9/10
938/938 ━━━━━━━━━━━━━━━━━━━━ 9s 7ms/step - accuracy: 0.9971 - loss: 0.0102 - val_accurac
Epoch 10/10
938/938 ━━━━━━━━━━━━━━━━━━━━ 10s 7ms/step - accuracy: 0.9972 - loss: 0.0086 - val_accura
Epoch 1/10
938/938 ━━━━━━━━━━━━━━━━━━━━ 9s 8ms/step - accuracy: 0.8778 - loss: 0.4218 - val_accurac
Epoch 2/10
938/938 ━━━━━━━━━━━━━━━━━━━━ 11s 9ms/step - accuracy: 0.9716 - loss: 0.0945 - val_accura
Epoch 3/10
938/938 ━━━━━━━━━━━━━━━━━━━━ 6s 7ms/step - accuracy: 0.9804 - loss: 0.0618 - val_accurac
Epoch 4/10
938/938 ━━━━━━━━━━━━━━━━━━━━ 8s 8ms/step - accuracy: 0.9863 - loss: 0.0447 - val_accurac
Epoch 5/10
938/938 ━━━━━━━━━━━━━━━━━━━━ 9s 7ms/step - accuracy: 0.9887 - loss: 0.0350 - val_accurac
Epoch 6/10
938/938 ━━━━━━━━━━━━━━━━━━━━ 10s 7ms/step - accuracy: 0.9907 - loss: 0.0289 - val_accura
Epoch 7/10
938/938 ━━━━━━━━━━━━━━━━━━━━ 10s 7ms/step - accuracy: 0.9926 - loss: 0.0243 - val_accura
Epoch 8/10
938/938 ━━━━━━━━━━━━━━━━━━━━ 8s 9ms/step - accuracy: 0.9935 - loss: 0.0203 - val_accurac
Epoch 9/10
938/938 ━━━━━━━━━━━━━━━━━━━━ 8s 7ms/step - accuracy: 0.9945 - loss: 0.0150 - val_accurac
Epoch 10/10
938/938 ━━━━━━━━━━━━━━━━━━━━ 12s 8ms/step - accuracy: 0.9950 - loss: 0.0155 - val_accura
Performance Comparison

Plot Accuracy and Loss

plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.plot(history1.history['accuracy'], label='Model 1 Accuracy')
plt.plot(history2.history['accuracy'], label='Model 2 Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(history1.history['loss'], label='Model 1 Loss')
plt.plot(history2.history['loss'], label='Model 2 Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

plt.show()

Evaluation
loss1, acc1 = model1.evaluate(x_test, y_test, verbose=0)
loss2, acc2 = model2.evaluate(x_test, y_test, verbose=0)

print(f"Model 1 - Sigmoid: Accuracy = {acc1:.4f}, Loss = {loss1:.4f}")


print(f"Model 2 - ReLU: Accuracy = {acc2:.4f}, Loss = {loss2:.4f}")

Model 1 - Sigmoid: Accuracy = 0.9795, Loss = 0.0784


Model 2 - ReLU: Accuracy = 0.9818, Loss = 0.0757
Conclusion

• Model 1 (Sigmoid) tends to suffer from the vanishing gradient problem, leading to slower
convergence.

• Model 2 (ReLU) performs better due to its ability to mitigate the vanishing gradient issue,
allowing deeper networks to learn efficiently.

• The results indicate that using ReLU in hidden layers improves model performance compared to
sigmoid activation.

You might also like