0% found this document useful (0 votes)
22 views29 pages

Backpropagation, Sgmiod Neuron & Gradient Discend

Neural networks data science

Uploaded by

praswin70
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views29 pages

Backpropagation, Sgmiod Neuron & Gradient Discend

Neural networks data science

Uploaded by

praswin70
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Backpropagation

Sigmoid Neuron
Sigmoid Neuron
FEED FORWARD NEURAL NETWORK (MULTILAYER
PERCEPTRON)
• Below is a Single layer feed-forward network. Here, the sum of the products of
inputs and weights are calculated and fed to the output. The output is considered
if it is above a certain value i.e threshold and the neuron fires with an activated
output and if it does not fire, the deactivated value is emitted.

Application of Feedforward neural networks are


found in computer vision and speech recognition
where classifying the target classes is
complicated.
• Gradient Descent is an iterative optimization algorithm for finding
the local minimum of a function.
• To find the local minimum of a function using Gradient Descent,
take steps proportional to the negative of the gradient( move away
from the gradient ) of the function at the current point.
• It takes steps proportional to the positive of the gradient ( moving
towards the gradient ), we will approach a local maximum of the
function. This procedure is called gradient ascent.
• The goal of the gradient Descent algorithm is to minimize the given function.
1) Compute the gradient(slope), the first order derivative of the function at
that point.
2) Make a step in the direction opposite to the direction of the gradient.
• We have now in the direction to move in, next is to decide the size of the step.
• If the learning rate is too
small the training might turn
out to be too long.
• Choose an optimal learning
rate so that model converges
to the minimum.
Types of gradient Descent:
• Batch Gradient Descent: This is a type of gradient descent which processes all the training
examples for each iteration of gradient descent. But if the number of training examples is large, then
batch gradient descent is computationally very expensive and not preferred. Instead, we prefer to use
stochastic gradient descent or mini-batch gradient descent.

• Stochastic Gradient Descent: This is a type of gradient descent which processes 1 training
example per iteration. Hence, the parameters are being updated even after one iteration in which only
a single example has been processed. Hence this is quite faster than batch gradient descent. But again,
when the number of training examples is large, even then it processes only one example which can be
additional overhead for the system as the number of iterations will be quite large.

• Mini Batch gradient descent: This is a type of gradient descent which works faster than both batch
gradient descent and stochastic gradient descent. Here b examples where b<m are processed per
iteration. So even if the number of training examples is large, it is processed in batches of b training
examples in one go. Thus, it works for larger training examples and that too with lesser number of
iterations.
Types of gradient Descent:
Gradient Descent Algorithm This function takes
5 parameters:
• Gradient Descent method’s steps are: 1.starting point - In
1. Choose a starting point (initialization) practice, it is often a random
2. Calculate gradient at this point initialisation
2.gradient function - has
3. Make a scaled step in the opposite
to be specified before-hand
direction to the gradient
3.learning rate - scaling
• (objective: minimize) factor for step sizes
4. Repeat points 2 and 3 until one of the criteria 4.maximum number of
is met: iterations
5. Maximum number of iterations reached 5.tolerance to
6. Step size is smaller than the tolerance. conditionally stop the
algorithm (a default value
is 0.01)

You might also like