0% found this document useful (0 votes)
5 views

ANN-TP (1)

The document provides an overview of Artificial Neural Networks (ANNs), comparing them with Biological Neural Networks (BNNs) in terms of processing speed, learning capabilities, fault tolerance, and storage capacity. It discusses key concepts such as the McCulloch-Pitts neuron model, Perceptrons, and the architecture of neural networks, including layers, activation functions, and loss functions. Additionally, it explains the backpropagation algorithm and various optimization techniques used in training neural networks.

Uploaded by

fallenalways89
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

ANN-TP (1)

The document provides an overview of Artificial Neural Networks (ANNs), comparing them with Biological Neural Networks (BNNs) in terms of processing speed, learning capabilities, fault tolerance, and storage capacity. It discusses key concepts such as the McCulloch-Pitts neuron model, Perceptrons, and the architecture of neural networks, including layers, activation functions, and loss functions. Additionally, it explains the backpropagation algorithm and various optimization techniques used in training neural networks.

Uploaded by

fallenalways89
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Artificial Neural Networks

Introduction

Biological Neural Network Artificial Neural Network


Criteria BNN ANN

Processing Massively parallel, slow but superior Massively parallel, fast but inferior
than ANN than BNN
11
Size 10 neurons and 102 to 104 nodes
15
10 interconnections

Learning They can tolerate ambiguity Very precise, structured and


formatted data is required to
tolerate ambiguity

Fault tolerance Performance degrades with even It is capable of robust performance,


partial damage hence has the potential to be fault
tolerant

Storage capacity Stores the information in the Stores the information in continuous
synapse memory location.
Comparison
McCulloch-Pitts Neuron

• The first computational model


of a neuron was proposed by
Warren MuCulloch
(neuroscientist) and Walter
Pitts (logician) in 1943.
• It may be divided into 2 parts.
The first part, g takes an input
performs an aggregation and
based on the aggregated value
the second part, f makes a
decision.
McCulloch-Pitts Neuron
• So, x_1 could Inhibitory inputs are those that
be isPremierLeagueOn have maximum effect on the
• x_2 could be isItAFriendlyGame decision making irrespective of
other inputs.
• x_3 could be isNotHome
Excitatory inputs are NOT the
• x_4 could ones that will make the neuron
be isManUnitedPlaying fire on their own but they might
fire it when combined together.
Perceptron-cont.,
• Perceptron was introduced by Frank Rosenblatt in 1957.
• A Perceptron is an algorithm for supervised learning of binary
classifiers.
• This algorithm enables neurons to learn and processes elements in the
training set one at a time.
• There are two types of Perceptrons: Single layer and Multilayer.
Single layer - Single layer perceptrons can learn only linearly separable
patterns
Multilayer - Multilayer perceptrons or feedforward neural networks with
two or more layers have the greater processing power
• The Perceptron algorithm learns the weights for the input signals in
order to draw a linear decision boundary.
Perceptron
Perceptron vs McCulloch-Pitts Neuron
Model of Artificial Neural Network
Model of Artificial Neural Network
• Artificial neural networks (ANNs), are
computing systems inspired by
the biological neural networks that
constitute animal brains.
• An ANN is based on a collection of
connected units or nodes called artificial
neurons, which loosely model
the neurons in a biological brain.
• The "signal" at a connection is a real
number, and the output of each neuron is
computed by some non-linear function of
the sum of its inputs.
• Neurons and edges typically have
a weight that adjusts as learning proceeds.
The weight increases or decreases the
strength of the signal at a connection.
Artificial Neural Network- Cont.,
• An Artificial Neuron is the • Weights are associated with
basic unit of a neural network. each connection.
• A neural network has input, • The activation function is used
output and hidden layers. as a decision making body at
• It calculates the weighted sum the output of a neuron.
of its inputs and then applies • The error is computed and
an activation function to using the chain-rule the
normalize the sum. updated weights are
• The activation functions can be calculated.
linear or nonlinear.
Architecture of a Neural Network

• Input Layer
• Hidden Layer
• Output Layer
• Activation functions
• Loss functions
• Backpropagation
and Optimizer
Hidden Layer
Activation Functions
Loss Function & Cost Function
• A loss function is computed for a • L1 Loss function
single training example.
• A cost function computes an
average loss over the entire
training dataset.
• Binary cross entropy compares
• L2 Loss function each of the predicted probabilities
to actual class output which can be
either 0 or 1.
• Binary Cross Entropy is the
negative average of the log of
corrected predicted probabilities.
Cross Entropy Loss Function
import numpy as np
# The probabilities predicted by
the neural network
y_pred = [0.1, 0.3, 0.4, 0.2]
# one-hot-encoded ground truth
label
y_hat =[0, 1, 0, 0]
cross_entropy = -
np.sum(np.log(y_pred)*y_hat)
print(cross_entropy)
# The Result is 1.20
Cross Entropy Loss Function
import numpy as np
# The probabilities predicted by
the neural network
y_pred = [0.1, 0.3, 0.4, 0.2]
# one-hot-encoded ground truth
label
y_hat =[0, 1, 0, 0]
cross_entropy = -
np.sum(np.log(y_pred)*y_hat)
print(cross_entropy)
Type of Data
• A simple perceptron is
capable of learning a
linear decision boundary.
• In machine learning ,
feature engineering
techniques are used to
learn the non-linear
decision boundaries.
• The network will be
fragile in the presence of
called as Overfitting.
http://playground.tensorflow.org/
Linearly Separable Dataset
Dataset with circular boundary
Introduction
• Backpropagation algorithm is
fundamental building block in
a neural network.
• Train a neural network
through a method called chain
rule.
• Backpropagation aims to
minimize the cost function by
adjusting network’s weights
and biases.
Backpropagation Algorithm
• Initialization:
• Initialize the weights and biases of the neural network randomly.
• Set the learning rate (a hyperparameter that determines the size of the steps
taken during optimization).
• Forward Pass:
• Input a training example into the network and perform a forward pass through
the network to compute the predicted output.
• Calculate the error (loss) between the predicted output and the actual output
using a loss function.
• Backward Pass (Backpropagation):
• Calculate the gradient of the loss with respect to the weights and biases using the chain
rule of calculus.
• Update the weights and biases in the opposite direction of the gradient to minimize the
loss.
Optimization in Deep Learning
• Optimizers are algorithms or methods • Gradient Descent
used to minimize an error function/loss • Stochastic Gradient
function. Descent
• Mini-Batch Gradient
• Optimizers are mathematical functions Descent
which are dependent on model’s learnable • SGD with Momentum
parameters i.e Weights & Biases.
• AdaGrad(Adaptive
Gradient Descent)
• Optimizers help to know how to change • AdaDelta
weights and learning rate of neural
network to reduce the losses. • Adam(Adaptive
Moment Estimation)
Learning rate
Backpropagation with Gradient Descent
Input: Training dataset (X, y)
Output: Trained neural network weights and biases
Initialize weights and biases randomly
Repeat until convergence:
Perform forward propagation:
Compute the weighted sum of inputs and activation for each neuron in each layer
Compute the output error:
Calculate the derivative of the loss function with respect to the predicted output
Perform backpropagation:
Compute the gradients of the loss function with respect to the weights and
biases in each layer
Update the weights and biases using the gradients and learning rate
Update the weights and biases:
Update the weights and biases using the computed gradients and learning
rate
Stochastic and Mini-Batch Gradient Descent
• Stochastic GD is a variant of Gradient Descent. It update the model parameters one by one.
• If the model has 10K dataset SGD will update the model parameters 10k times.
• Mini-Batch Gradient Descent is a combination of the concepts of SGD and batch gradient
descent.
• It simply splits the training dataset into small batches and performs an update for each of
those batches.
• This creates a balance between the robustness of stochastic gradient descent and the
efficiency of batch gradient descent.
• It can reduce the variance when the parameters are updated, and the convergence is more
stable.
• It splits the data set in batches in between 50 to 256 examples, chosen at random.
SGD with Momentum

• SGD with Momentum is a stochastic optimization method that


adds a momentum term to regular stochastic gradient descent.
• Momentum simulates the inertia of an object when it is moving,
that is, the direction of the previous update is retained to a certain
extent during the update, while the current update gradient is
used to fine-tune the final update direction.
• In this way, you can increase the stability to a certain extent, so
that you can learn faster, and also have the ability to get rid of local
optimization.
Cont.,

• Velocity- moving average of gradients direction


Adagrad Optimizer
• Adagrad stands for Adaptive Gradient Optimizer.

as the number of iterations will increase, the learning rate will


reduce adaptively

where
w(t) = value of w at current iteration, w(t-1) = value of w at
previous iteration and η = learning rate.
alpha(t) denotes different learning rates for each weight at
each iteration.
Here, η is a constant number, epsilon is a small positive value
number to avoid divide by zero error if in
case alpha(t) becomes 0
Iris Dataset- Learning Rate=0.01
Accuracy when LR=0.01
MNIST Dataset

You might also like