0% found this document useful (0 votes)
40 views

ML Session 15 Backpropagation

The document discusses the multi-layer perceptron (MLP) neural network architecture and learning algorithm. It begins by defining an MLP as a fully connected feedforward artificial neural network with at least three layers - an input layer, a hidden layer, and an output layer. It then describes the MLP architecture and how it differs from a single perceptron by having multiple layers and nonlinear activation functions. Finally, it provides an example of how an MLP can be trained using backpropagation to learn the XOR logic function by adjusting weights between neurons in the network.

Uploaded by

kr8665894
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

ML Session 15 Backpropagation

The document discusses the multi-layer perceptron (MLP) neural network architecture and learning algorithm. It begins by defining an MLP as a fully connected feedforward artificial neural network with at least three layers - an input layer, a hidden layer, and an output layer. It then describes the MLP architecture and how it differs from a single perceptron by having multiple layers and nonlinear activation functions. Finally, it provides an example of how an MLP can be trained using backpropagation to learn the XOR logic function by adjusting weights between neurons in the network.

Uploaded by

kr8665894
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

DERIVING BACK-PROPAGATION

SESSION 15
AIM

To familiarise students with the basic concept of Feed forward neural network based on Learning algorithms and to process of
neural network in various Applications.

INSTRUCTIONAL OBJECTIVES

This unit is designed to:


1. Demonstrate Back Propagation overview
2. List out the activation functions of BPNN
3. Describe accuracy and loss functions

LEARNING OUTCOMES

At the end of this unit, you should be able to:


1. Define the functions BPNN algorithm
2. Summarise the development Back propagation neural network
3. Describe the various activation functions with loss values.
TOPICS TO BE COVERED

1. Back Propagation Overview


2. Back Propagation Algorithm
3. A Step By Step Back Propagation Example
1. Gradient Descent Rule
2. Delta learning Rule
WHAT IS BACK PROPAGATION?

• Backpropagation is an algorithm used in artificial intelligence (AI) to fine-tune mathematical


weight functions and improve the accuracy of an artificial neural network's outputs.
Backpropagation is the process of tuning a neural network's weights to better the prediction
accuracy.
• There are two directions in which information flows in a neural network. Forward
propagation — also called inference — is when data goes into the neural network and out
pops a prediction.
• A neural network can be thought of as a group of connected input/output (I/O) nodes. The
level of accuracy each node produces is expressed as a loss function (error rate).
BACK PROPAGATION OVERVIEW
Backpropagation is the essence of neural
network training. It is the method of fine-
tuning the weights of a neural network
based on the error rate obtained in the
previous epoch (i.e., iteration). Proper
1
tuning of the weights allows you to
reduce
2 error rates and make the model
reliable by increasing its generalization.
3

Backpropagation in neural network is a


short form for “backward propagation of
errors.” It is a standard method of
training artificial neural networks. This
method helps calculate the gradient of a
loss function with respect to all the
weights in the network.
HOW BACKPROPAGATION WORKS: SIMPLE
ALGORITHM
• Inputs X, arrive through the preconnected path
• Input is modeled using real weights W. The weights are usually randomly
selected.
• Calculate the output for every neuron from the input layer, to the hidden layers, to
the output layer.
• Calculate the error in the outputs

ErrorB= Actual Output – Desired Output

• Travel back from the output layer to the hidden layer to adjust the weights such
that the error is decreased.
• Keep repeating the process until the desired output is achieved
A Step by Step BACK PROPAGATION Example

The goal of backpropagation is to optimize the weights so that the neural network can learn how to correctly map arbitrary
inputs to outputs.
To work with a single training set: given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99.
BACK PROPAGATION WORKING METHOD

Calculating the Total Error


We can now calculate the error for each output neuron using the squared error function and sum them to get the total error:

For example,
2
the target output for I s 0.01 but the neural network output 0.75136507, therefore its error is:

Repeating this process for (r (remembering that the target is 0.99) we get:

The total error for the neural network is the sum of these errors:
The Backwards Pass

Our goal with backpropagation is to update each of the weights in the network so that they cause the actual output to be
closer the target output, thereby minimizing the error for each output neuron and the network as a whole.
Output Layer

1
Consider . we want to know how much a change in affects the total error, .
2

By applying the chain rule we know that:


BACK PROPAGATION

We need to figure out each piece in this equation.


First, how much does the total error change with respect to the output?
BACK PROPAGATION

When we take the partial derivative of the total error with respect to , the quantity
becomes zero because does not affect it which means we’re taking the derivative of a constant which is zero.

Next,1 how much does the output of change with respect to its total net input?
The partial derivative of the logistic function is the output multiplied by 1 minus the output:
2

Finally, how much does the total net input of change with respect to ?
Putting it all together:

You’ll often see this calculation combined in the form of the delta rule:

The Delta rule in machine learning and neural network environments is a specific type of backpropagation that
helps to refine connectionist ML/AI networks, making connections between inputs and outputs with layers of
artificial neurons.

In general, backpropagation has to do with recalculating input weights for artificial neurons using a gradient
method. Delta learning does this using the difference between a target activation and an actual obtained
activation. Using a linear activation function, network connections are adjusted.

Another way to explain the Delta rule is that it uses an error function to perform gradient descent learning.
Apply the Delta rule

To decrease the error, we then subtract this value from the current weight (optionally
multiplied by some learning rate, eta, which we’ll set to 0.5):
We can repeat this process to get the new weights , , and :
WEB REFERENCES

[1] https://www.guru99.com/backpropogation-neural-network.html
[2] https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/
[3] http://neuralnetworksanddeeplearning.com/chap2.html

15
Self-Assessment Questions

1. What do the gradients of backpropagation compute?

(a) Profit Function


(b) Loss function
(c) Negative Function
(d) Positive Function

2. Backpropagation work with ______________neural networks?

(a) Single Layered


(b) Multi Layered
(c) Fixed Layered
(d) Dynamic Layered
THANK
YOU

OUR TEAM
TOPICS TO BE COVERED

1. Multi Layer Perceptron


2. MLP Architecture
3. MLP Learning Algorithm using XOR Gate
4. MLP in Practice
WHAT IS MULTI LAYER PERCEPTRON

• A multilayer perceptron (MLP) is a fully connected class of feedforward artificial neural network (ANN).
Multilayer perceptrons are sometimes colloquially referred to as "vanilla" neural networks, especially when
they have a single hidden layer.
• An MLP consists of at least three layers of nodes: an input layer, a hidden layer and an output layer. Except
for the input nodes, each node is a neuron that uses a nonlinear activation function.
• MLP utilizes a chain rule based supervised learning technique called backpropagation or reverse mode of
automatic differentiation for training.
• Its multiple layers and non-linear activation distinguish MLP from a linear perceptron. It can distinguish data
that is not linearly separable.
MLP ARCHITECTURE

 The term "multilayer perceptron"


1
does not refer to a single
perceptron
2 that has multiple layers.
 MLP perceptrons can employ
3
arbitrary activation functions.
 A true perceptron performs binary
classification, an MLP neuron is
free to either perform classification
or regression, depending upon its
activation function.
Activation function

If a multilayer perceptron has a linear activation function in all neurons, that is, a linear function that
maps the weighted inputs to the output of each neuron, then linear algebra shows that any number of
layers can be reduced to a two-layer input-output model. In MLPs some neurons use a nonlinear
activation function that was developed to model the frequency of action potentials, or firing, of
biological neurons.

The two historically common activation functions are both sigmoids, and are described by
.

The first is a hyberbolic tangent that ranges from -1 to 1, while the other is the logistic function,
which is similar in shape but ranges from 0 to 1.

21
MLP LEARNING ALGORITHM

• we can check that a prepared network can solve the two-


dimensional XOR problem, something that we have seen is not
possible for a linear model like the Perceptron.
• A suitable network is shown in Figure, treating it as two different
Perceptrons,

• first computing the activations of the neurons in the middle layer


(labelled as C and D
in Figure 4.2) and then using those activations as the inputs to the
single neuron at the
output.
• As an example, I’ll work out what happens when you put in (1, 0) as
an input; the
job of checking the rest is up to you.
MLP Learning Algorithm using XOR Gate

 Input (1, 0) corresponds to node A being 1 and B being 0. The


1
input to neuron C is therefore
−1 × 0.5 + 1 × 1 + 0 × 1 = −0.5 + 1 = 0.5.
2
This is above the threshold of 0, and so neuron C fires, giving
output 1.
 For3 neuron D the input is
−1 × 1 + 1 × 1 + 0 × 1 = −1 + 1 = 0, and so it does not fire, giving
output 0.
 Therefore the input to neuron E is
−1 × 0.5 + 1 × 1 + 0 × −1 = 0.5, so neuron E fires.
 Checking the result of the inputs should persuade you that
neuron E fires when inputs A and B are different to each other,
but does not fire when they are the same, which is exactly the
XOR function.
MLP Implementation using Numpy
INITIALIZE THE WEIGHTS

• The MLP algorithm suggests that the weights are initialised to small random numbers,
both positive and negative.
• If the initial weight values are close to 1 or -1 then the inputs to the sigmoid are also
likely to be close to ±1 and so the output of the neuron is either 0 or 1.
• if we view the values of these inputs as having uniform variance, then the typical input
to the neuron will be w√n, where w is the initialization value of the weights. So a
common trick is to set the weights in the range −1/√n < w < 1/√n, where n is the
number of nodes in the input layer to those weights.

25
THE MULTI-LAYER PERCEPTRON IN PRACTICE

In this section, we are going to look more at choices that can be made about the network in order to use it for
solving real problems to four different types of problem: regression, classification, time-series prediction, and
data compression.
1. Amount of Training Data
For the MLP with one hidden layer there are (L + 1) × M + (M + 1) × N weights, where L, M, N are the number of
nodes in the input, hidden, and output layers, respectively. The extra +1s come from the bias nodes, which also
have adjustable weights.
2. Number of Hidden Layers
Two hidden layers are sufficient to compute for different inputs, and so if the function that we want to learn
(approximate) is continuous, the network can compute it. It can therefore approximate any decision boundary,
not just the linear one that the Perceptron computed.

26
THE MULTI-LAYER PERCEPTRON IN PRACTICE

3. When to Stop Learning


To monitor the generalisation ability of the network at its
current stage of learning. If we plot the sum-of-squares
error during training, it typically reduces fairly quickly
during the first few training iterations.
At some stage the error on the validation set will start
increasing again, because the network has stopped
learning about the function that generated the data, and
started to learn about the noise that is in the data itself
(shown in Figure ).
At this stage we stop the training. This technique is called
early stopping.

27
WEB REFERENCES

[1] https://www.guru99.com/backpropogation-neural-network.html
[2] https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/
[3] http://neuralnetworksanddeeplearning.com/chap2.html

28
Self-Assessment Questions

1. Why is the XOR problem exceptionally interesting to neural network researchers?

(a) Because it can be expressed in a way that allows you to use a neural network Loss function
(b) Because it is complex binary operation that cannot be solved using neural networks Positive Function
(c) Because it can be solved by a single layer perceptron
(d) Because it is the simplest linearly inseparable problem that exists.

2. A perceptron adds up all the weighted inputs it receives, and if it exceeds a certain value, it
outputs a 1, otherwise it just outputs a 0.

a) True
b) False
THANK
YOU

OUR TEAM

You might also like