ANN-TP (1)
ANN-TP (1)
Introduction
Processing Massively parallel, slow but superior Massively parallel, fast but inferior
than ANN than BNN
11
Size 10 neurons and 102 to 104 nodes
15
10 interconnections
Storage capacity Stores the information in the Stores the information in continuous
synapse memory location.
Comparison
McCulloch-Pitts Neuron
• Input Layer
• Hidden Layer
• Output Layer
• Activation functions
• Loss functions
• Backpropagation
and Optimizer
Hidden Layer
Activation Functions
Loss Function & Cost Function
• A loss function is computed for a • L1 Loss function
single training example.
• A cost function computes an
average loss over the entire
training dataset.
• Binary cross entropy compares
• L2 Loss function each of the predicted probabilities
to actual class output which can be
either 0 or 1.
• Binary Cross Entropy is the
negative average of the log of
corrected predicted probabilities.
Cross Entropy Loss Function
import numpy as np
# The probabilities predicted by
the neural network
y_pred = [0.1, 0.3, 0.4, 0.2]
# one-hot-encoded ground truth
label
y_hat =[0, 1, 0, 0]
cross_entropy = -
np.sum(np.log(y_pred)*y_hat)
print(cross_entropy)
# The Result is 1.20
Cross Entropy Loss Function
import numpy as np
# The probabilities predicted by
the neural network
y_pred = [0.1, 0.3, 0.4, 0.2]
# one-hot-encoded ground truth
label
y_hat =[0, 1, 0, 0]
cross_entropy = -
np.sum(np.log(y_pred)*y_hat)
print(cross_entropy)
Type of Data
• A simple perceptron is
capable of learning a
linear decision boundary.
• In machine learning ,
feature engineering
techniques are used to
learn the non-linear
decision boundaries.
• The network will be
fragile in the presence of
called as Overfitting.
http://playground.tensorflow.org/
Linearly Separable Dataset
Dataset with circular boundary
Introduction
• Backpropagation algorithm is
fundamental building block in
a neural network.
• Train a neural network
through a method called chain
rule.
• Backpropagation aims to
minimize the cost function by
adjusting network’s weights
and biases.
Backpropagation Algorithm
• Initialization:
• Initialize the weights and biases of the neural network randomly.
• Set the learning rate (a hyperparameter that determines the size of the steps
taken during optimization).
• Forward Pass:
• Input a training example into the network and perform a forward pass through
the network to compute the predicted output.
• Calculate the error (loss) between the predicted output and the actual output
using a loss function.
• Backward Pass (Backpropagation):
• Calculate the gradient of the loss with respect to the weights and biases using the chain
rule of calculus.
• Update the weights and biases in the opposite direction of the gradient to minimize the
loss.
Optimization in Deep Learning
• Optimizers are algorithms or methods • Gradient Descent
used to minimize an error function/loss • Stochastic Gradient
function. Descent
• Mini-Batch Gradient
• Optimizers are mathematical functions Descent
which are dependent on model’s learnable • SGD with Momentum
parameters i.e Weights & Biases.
• AdaGrad(Adaptive
Gradient Descent)
• Optimizers help to know how to change • AdaDelta
weights and learning rate of neural
network to reduce the losses. • Adam(Adaptive
Moment Estimation)
Learning rate
Backpropagation with Gradient Descent
Input: Training dataset (X, y)
Output: Trained neural network weights and biases
Initialize weights and biases randomly
Repeat until convergence:
Perform forward propagation:
Compute the weighted sum of inputs and activation for each neuron in each layer
Compute the output error:
Calculate the derivative of the loss function with respect to the predicted output
Perform backpropagation:
Compute the gradients of the loss function with respect to the weights and
biases in each layer
Update the weights and biases using the gradients and learning rate
Update the weights and biases:
Update the weights and biases using the computed gradients and learning
rate
Stochastic and Mini-Batch Gradient Descent
• Stochastic GD is a variant of Gradient Descent. It update the model parameters one by one.
• If the model has 10K dataset SGD will update the model parameters 10k times.
• Mini-Batch Gradient Descent is a combination of the concepts of SGD and batch gradient
descent.
• It simply splits the training dataset into small batches and performs an update for each of
those batches.
• This creates a balance between the robustness of stochastic gradient descent and the
efficiency of batch gradient descent.
• It can reduce the variance when the parameters are updated, and the convergence is more
stable.
• It splits the data set in batches in between 50 to 256 examples, chosen at random.
SGD with Momentum
where
w(t) = value of w at current iteration, w(t-1) = value of w at
previous iteration and η = learning rate.
alpha(t) denotes different learning rates for each weight at
each iteration.
Here, η is a constant number, epsilon is a small positive value
number to avoid divide by zero error if in
case alpha(t) becomes 0
Iris Dataset- Learning Rate=0.01
Accuracy when LR=0.01
MNIST Dataset