exp3
exp3
1. Architecture of an MLP
● Input layer: The first layer where the input features are fed into the network. The
number of nodes in this layer is equal to the number of input features. For
example, for an image input of size 28x28 pixels (like in MNIST), the input layer
will have 784 nodes (28x28=784).
● Hidden layers: One or more layers of neurons between the input and output
layers. These layers are where most of the computation happens. The number
of neurons in each hidden layer is a hyperparameter, and its selection often
2. Working of an MLP
Each neuron in the hidden layer receives inputs from all neurons in the previous layer
(input layer or another hidden layer), applies a weighted sum, and passes the result
through an activation function. Mathematically, this can be written as:
Where:
This weighted sum z is then passed through an activation function, such as ReLU
(Rectified Linear Unit), Sigmoid, or Tanh, to introduce non-linearity. The output of this
activation is the input to the next layer.
b) Activation Functions
The role of activation functions is crucial in allowing the network to learn complex
patterns. Without activation functions, the neural network would just be a linear function,
regardless of how many layers it has.
c) Output Layer
The output layer is where the final prediction is made. For classification, if the problem is
binary, the output can be a single neuron with a Sigmoid activation function to output a
The training of an MLP involves adjusting the weights and biases to minimize the error
between predicted and actual outputs. This is done through a process called
backpropagation, which uses gradient descent to optimize the model.
a) Forward Propagation
1. Pass the input through the network (from the input layer to the output layer) to
generate predictions.
b) Loss Calculation
1. Calculate the loss (or error) by comparing the network's predictions to the true
labels. Common loss functions include:
○ Cross-entropy loss (for classification tasks)
○ Mean squared error (MSE) (for regression tasks)
1. Backpropagate the error by computing the gradient of the loss with respect to
each weight and bias. This is done using the chain rule of calculus to compute
gradients layer by layer.
1. Update the weights using gradient descent, which aims to reduce the loss by
adjusting the weights in the direction that minimizes the error. The update rule for
weights www and biases b is:
# Confusion matrix cm =
confusion_matrix(y_true_classes, y_pred)
# Precision, Recall, F1
precision=precision_score(y_true_classes,y_pred,average='macro')
# Macro averages across all classes recall =
recall_score(y_true_classes, y_pred, average='macro') f1 =
f1_score(y_true_classes, y_pred, average='macro')
print(f"Precision:
{precision}") print(f"Recall:
{recall}") print(f"F1 Score:
{f1}")
Evaluation Metrics:
Confusion Matrix: