4
4
two hidden layers. Develop two such deep models with the following configurations.
Model 1: uses sigmoid activations at the hidden nodes and softmax activation function at
the output nodes.
Model 2: uses ReLu activation function at the hidden nodes and softmax activation
function at the output nodes.
The hyperparameters (e.g. learning rate, momentum, number of hidden layers, number of
hidden nodes per layer) for both models should be properly tuned using a validation set.
Compare the performance of these two models on MNIST dataset when both the models
are trained up to 1000 epochs.
Introduction
Artificial Neural Networks (ANNs) are widely used for various machine learning tasks, including image
classification. One of the most critical learning techniques in ANNs is the backpropagation algorithm,
which adjusts the weights of neurons based on error gradients. In this assignment, we implement
backpropagation for deep ANNs with more than two hidden layers and analyze their performance on the
MNIST dataset.
Backpropagation Algorithm
1. Forward Propagation: Compute activations of neurons from the input layer to the output layer.
2. Compute Loss: Measure the error between predicted and actual outputs using a loss function
(e.g., categorical cross-entropy for classification).
3. Backward Propagation: Compute the gradients of the loss function with respect to the network's
parameters using the chain rule of differentiation.
4. Weight Update: Adjust weights using a gradient descent-based optimizer, such as Stochastic
Gradient Descent (SGD) or Adam.
Model Configurations
We develop two deep models with different activation functions at the hidden layers:
• Model 1: Uses the sigmoid activation function in the hidden layers and the softmax activation
function in the output layer.
• Model 2: Uses the ReLU activation function in the hidden layers and the softmax activation
function in the output layer.
The hyperparameters, such as learning rate, momentum, number of hidden layers, and number of hidden
nodes per layer, are tuned using a validation set. Both models are trained for 1000 epochs.
Import Required Libraries
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
import matplotlib.pyplot as plt
model1 = Sequential([
Flatten(input_shape=(28, 28)),
Dense(256, activation='sigmoid'),
Dense(128, activation='sigmoid'),
Dense(64, activation='sigmoid'),
Dense(10, activation='softmax')
])
model1.compile(optimizer=Adam(learning_rate=0.001),
loss='categorical_crossentropy',
metrics=['accuracy'])
/usr/local/lib/python3.11/dist-packages/keras/src/layers/reshaping/flatten.py:37: UserWa
super().__init__(**kwargs)
model2 = Sequential([
Flatten(input_shape=(28, 28)),
Dense(256, activation='relu'),
Dense(128, activation='relu'),
Dense(64, activation='relu'),
Dense(10, activation='softmax')
])
model2.compile(optimizer=Adam(learning_rate=0.001),
loss='categorical_crossentropy',
metrics=['accuracy'])
Epoch 1/10
938/938 ━━━━━━━━━━━━━━━━━━━━ 6s 7ms/step - accuracy: 0.9853 - loss: 0.0506 - val_accurac
Epoch 2/10
938/938 ━━━━━━━━━━━━━━━━━━━━ 8s 9ms/step - accuracy: 0.9891 - loss: 0.0379 - val_accurac
Epoch 3/10
938/938 ━━━━━━━━━━━━━━━━━━━━ 6s 7ms/step - accuracy: 0.9909 - loss: 0.0310 - val_accurac
Epoch 4/10
938/938 ━━━━━━━━━━━━━━━━━━━━ 8s 8ms/step - accuracy: 0.9926 - loss: 0.0262 - val_accurac
Epoch 5/10
938/938 ━━━━━━━━━━━━━━━━━━━━ 10s 8ms/step - accuracy: 0.9954 - loss: 0.0174 - val_accura
Epoch 6/10
938/938 ━━━━━━━━━━━━━━━━━━━━ 9s 7ms/step - accuracy: 0.9964 - loss: 0.0144 - val_accurac
Epoch 7/10
938/938 ━━━━━━━━━━━━━━━━━━━━ 10s 7ms/step - accuracy: 0.9965 - loss: 0.0116 - val_accura
Epoch 8/10
938/938 ━━━━━━━━━━━━━━━━━━━━ 8s 8ms/step - accuracy: 0.9974 - loss: 0.0097 - val_accurac
Epoch 9/10
938/938 ━━━━━━━━━━━━━━━━━━━━ 9s 7ms/step - accuracy: 0.9971 - loss: 0.0102 - val_accurac
Epoch 10/10
938/938 ━━━━━━━━━━━━━━━━━━━━ 10s 7ms/step - accuracy: 0.9972 - loss: 0.0086 - val_accura
Epoch 1/10
938/938 ━━━━━━━━━━━━━━━━━━━━ 9s 8ms/step - accuracy: 0.8778 - loss: 0.4218 - val_accurac
Epoch 2/10
938/938 ━━━━━━━━━━━━━━━━━━━━ 11s 9ms/step - accuracy: 0.9716 - loss: 0.0945 - val_accura
Epoch 3/10
938/938 ━━━━━━━━━━━━━━━━━━━━ 6s 7ms/step - accuracy: 0.9804 - loss: 0.0618 - val_accurac
Epoch 4/10
938/938 ━━━━━━━━━━━━━━━━━━━━ 8s 8ms/step - accuracy: 0.9863 - loss: 0.0447 - val_accurac
Epoch 5/10
938/938 ━━━━━━━━━━━━━━━━━━━━ 9s 7ms/step - accuracy: 0.9887 - loss: 0.0350 - val_accurac
Epoch 6/10
938/938 ━━━━━━━━━━━━━━━━━━━━ 10s 7ms/step - accuracy: 0.9907 - loss: 0.0289 - val_accura
Epoch 7/10
938/938 ━━━━━━━━━━━━━━━━━━━━ 10s 7ms/step - accuracy: 0.9926 - loss: 0.0243 - val_accura
Epoch 8/10
938/938 ━━━━━━━━━━━━━━━━━━━━ 8s 9ms/step - accuracy: 0.9935 - loss: 0.0203 - val_accurac
Epoch 9/10
938/938 ━━━━━━━━━━━━━━━━━━━━ 8s 7ms/step - accuracy: 0.9945 - loss: 0.0150 - val_accurac
Epoch 10/10
938/938 ━━━━━━━━━━━━━━━━━━━━ 12s 8ms/step - accuracy: 0.9950 - loss: 0.0155 - val_accura
Performance Comparison
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.plot(history1.history['accuracy'], label='Model 1 Accuracy')
plt.plot(history2.history['accuracy'], label='Model 2 Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.subplot(1, 2, 2)
plt.plot(history1.history['loss'], label='Model 1 Loss')
plt.plot(history2.history['loss'], label='Model 2 Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
Evaluation
loss1, acc1 = model1.evaluate(x_test, y_test, verbose=0)
loss2, acc2 = model2.evaluate(x_test, y_test, verbose=0)
• Model 1 (Sigmoid) tends to suffer from the vanishing gradient problem, leading to slower
convergence.
• Model 2 (ReLU) performs better due to its ability to mitigate the vanishing gradient issue,
allowing deeper networks to learn efficiently.
• The results indicate that using ReLU in hidden layers improves model performance compared to
sigmoid activation.