0% found this document useful (0 votes)
31 views45 pages

NN Lecture Notes

Uploaded by

findinngclosure
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views45 pages

NN Lecture Notes

Uploaded by

findinngclosure
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Artificial Neural Network

Why Neural Network comes to field ?


• What about Class ????
Neural Network Backpropagation

• Backpropagation is a common method for training a neural network


• We’re going to use a neural network with two inputs, two hidden
neurons, two output neurons.
• Additionally, the hidden and output neurons will include a bias.
Basic Structure of Neural Network
Connection
between
nodes
are weights
Hidden Outputs
Inputs Units

bias
In order to have some numbers
to work with, here are the initial
weights, the biases, and training
inputs/outputs
The goal of backpropagation is to
optimize the weights so that the
neural network can learn how to
correctly map arbitrary inputs to
outputs.
The Forward Pass

Lets see what the


neural network
currently predicts
given the weights
and biases above
and inputs of 0.05
and 0.10.
We figure out the total net input to
each hidden layer
neuron, squash the total net input
using an activation function (here
we use the logistic function), then
repeat the process with the output
layer neurons.
• Here we calculate the H1

•netH1 = w1*i1 + w2*i2 + b1*1


• netH1 = 0.15*0.05 + 0.2*0.1 + 0.35*1 = 0.3775
• Then we squash it using logistic function to get the output of H1
1 1
• outH1 = 𝑜𝑢𝑡𝐻1 = = = 0.593269992
1+𝑒 −𝑛𝑒𝑡𝐻1 1+𝑒 −0.3775
Class Please Calculate Your Self
Carrying out the same process for H2
• Here we calculate the H2

•netH2 = w3*i1 + w4*i2 + b1*1


• netH2 = 0.25*0.05 + 0.3*0.1 + 0.35*1 = 0.3925
• Then we squash it using logistic function to get the output of H2
1 1
• outH2 = 𝑜𝑢𝑡𝐻2 = = = 0.596884378
1+𝑒 −𝑛𝑒𝑡𝐻1 1+𝑒 −0.3775

• So the Answer would be:


• 𝑜𝑢𝑡𝐻2 = 0.596884378
We repeat this process for the
output layer neurons, using
the output from the hidden
layer neurons as inputs.
Here the output for O1 is
netO1= 𝑤5 ∗ 𝑜𝑢𝑡𝐻1 + 𝑤6 ∗ 𝑜𝑢𝑡𝐻2 + 𝑏2 ∗ 1
netO1= 0.4 ∗ 0.593269992 + 0.45 ∗ 0.596884378 + 0.6 ∗ 1 = 1.105905967
1 1
𝑜𝑢𝑡𝑂1 = = 1.105905967 = 0.75136507
1+𝑒 −𝑛𝑒𝑡𝑂1 1+𝑒 −
Class Please Calculate Your Self
Carrying out the same process for O2
Here the output for O2 is
netO2= 𝑤7 ∗ 𝑜𝑢𝑡𝐻1 + 𝑤8 ∗ 𝑜𝑢𝑡𝐻2 + 𝑏2 ∗ 1
netO2= 0.5 ∗ 0.593269992 + 0.55 ∗ 0.596884378 + 0.6 ∗ 1 = 1.2249214039
1 1
𝑜𝑢𝑡𝑂2 = = 1.2249214039 = 0.772928465
1+𝑒 −𝑛𝑒𝑡𝑂1 1+𝑒 −
Wow we have Completed the Forward Pass

What Next ???????????????????????????


Calculating the Total Error

• We can now calculate the error for each output neuron using the
squared error function and sum them to get the total error
𝟏 𝟐
• 𝑬𝒕𝒐𝒕𝒂𝒍 = 𝒕𝒂𝒓𝒈𝒆𝒕 − 𝒐𝒖𝒕𝒑𝒖𝒕
𝟐
• The 1/2 is included so that exponent is cancelled when we
differentiate later on.
For example, the target output for O1 is 0.01 but the neural network
output 0.75136507, therefore its error is

𝟏 𝟐 𝟏
𝑬𝒐𝒖𝒕𝟏 = 𝒕𝒂𝒓𝒈𝒆𝒕𝑶𝟏 − 𝒐𝒖𝒕𝒑𝒖𝒕𝑶𝟏 = 𝟎. 𝟎𝟏 − 0.75136507 𝟐 =0.274811083
𝟐 𝟐

𝑬𝒐𝒖𝒕𝟏 = 0.274811083
Class Please Calculate Your Self
• Repeating this process for O2 (remembering that the target is 0.99)
For example, the target output for O2 is 0.99 but the neural network
output 0.772928465, therefore its error is

𝟏 𝟐 𝟏 𝟐
𝑬𝒐𝒖𝒕𝟐 = 𝒕𝒂𝒓𝒈𝒆𝒕𝑶𝟐 − 𝒐𝒖𝒕𝒑𝒖𝒕𝑶𝟐 = 𝟎. 𝟗𝟗 − 0.772928465 =0.023560026
𝟐 𝟐

𝑬𝒐𝒖𝒕𝟐 = 0.023560026

The total error for the neural network is the sum of these errors

𝑬𝒕𝒐𝒕𝒂𝒍 = 𝑬𝒐𝒖𝒕𝟏 + 𝑬𝒐𝒖𝒕𝟐 = 0.274811083 + 0.023560026 = 0.298371109


So What's Next ???????????????????????????
The Backwards Pass
Our goal with backpropagation is to
update each of the weights in the
network so that they cause the
actual output to be closer the target
output
The Backwards Pass

• Consider w5. We want to know how much a change in w5 affects the


total error

“the partial derivative of 𝑬𝒕𝒐𝒕𝒂𝒍 with


respect to w5“. You can also say “the
gradient with respect to w5“.
Gradient Descent
• Gradient Descent commonly used optimization function that
adjusts weights according to the error.
• Gradient is another word for slope, and slope.4
• In its typical form on an x-y graph, represents how two variables
relate to each other:
• rise over run, the change in money over the change in time, etc.
• The slope we care about describes the relationship between the
network’s error and a single weight; i.e, how does the error
vary as the weight is adjusted.
• The relationship between network Error and each of those weights is
a derivative, dE/dw, that measures the degree to which a slight
change in a weight causes a slight change in the error.
• Each weight is just one factor in a deep network that involves many
transforms.
• The signal of the weight passes through activations and sums over
several layers.
• So we use the chain rule of calculus to march back through the
networks activations and outputs and finally arrive at the weight in
question, and its relationship to overall error.
Chain Rule:
• The chain rule in calculus states that:

• In a feedforward network, the relationship between the net’s error


and a single weight will look something like this:
• That is, given two variables, Error and weight, that are mediated by a
third variable, activation.

• Through which the weight is passed, you can calculate how a change
in weight affects a change in Error by first calculating how a change
in activation affects a change in Error, and how a change
in weight affects a change in activation.
U Know What ???
The essence of learning in deep
learning is nothing more than
that: adjusting a model’s weights
in response to the error it
produces.
By applying the chain rule
Visually, here’s what we’re doing
• We need to figure out each piece in this equation
• First, how much does the total error change with respect to the
output?
Next, how much does the output of O1 change
with respect to its total net input
• The partial derivative of the logistic function is the output multiplied
by 1 minus the output:
Finally, how much does the total net input of
O1 change with respect to w5?
Putting it all together
To decrease the error, we then
subtract this value from the current
weight (optionally multiplied by
some learning rate, eta, which we’ll
set to 0.5):
We can repeat this process to get
the new weights w6, w7, and w8
So Here are the New Values
• W6 = 0.408666186
• W7 = 0.511301270
• W8 = 0.561370121

• We perform the actual updates in the neural network after we have


the new weights leading into the hidden layer neurons
Hidden Layer

• Next, we’ll continue the backwards pass by calculating new values for
w1, w2, w3, and w4.
Visually
Starting with : =

We can calculate using values we calculated earlier


And is equal to :

You might also like