0% found this document useful (0 votes)
7 views

ANNs

Uploaded by

adityas199292
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

ANNs

Uploaded by

adityas199292
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

rajat.prakash1992@gmail.

com
3DVB5QRZ69

Neural Networks

This file is meant for personal use by [email protected] only.


1
Sharing or publishing the contents in part or full is liable for legal action.
Artificial Neural Networks

• Vaguely inspired by the biological neural networks that


constitute animal brains

• The human brain is made


of close to 100 billions of
neurons interconnected
by synapses.
[email protected]
3DVB5QRZ69

• A neuron processes and


transmits information
through electrical and
chemical signals that are
carried via the synapses

• Neurons can connect to each other to form neural networks –


each neuron can be connected with about 5,000 other
neurons

This file is meant for personal use by [email protected] only.


2
Sharing or publishing the contents in part or full is liable for legal action.
Artificial NNs Vs Biological NN

ANN BNN

10-1000 Neurons & 86 Billion Neurons &


Size 1000’s of synapses >100 Trillion synapses

• Size: our brain contains about 86 billion neurons and more


than a 100 trillion
Usually (or according
Feed-forward, to some estimates
that is computed 1000that is
Complex Network
Topology
Layer by Layer
trillion) synapses (connections).
[email protected]
3DVB5QRZ69 The number of “neurons”
computed in
asynchronously
artificial networks is much less than that (usually in the
Calculation
ballpark of 10–1000) nanobut comparing their numbers
seconds this way
milli seconds
Speeds
is misleading.

Power 100 watts (~100Watts) 20 watts

Not Fault Tolerant, Fault Tolerant,


Others
Learning Learning?

This file is meant for personal use by [email protected] only.


3
Sharing or publishing the contents in part or full is liable for legal action.
Applications

• Neural Nets have done exceptionally well at tasks like

• Image Recognition, Character Recognition, Face


Recognition

• Feature Extractions, Finger print processing, Signature


matching
[email protected]
3DVB5QRZ69

• Speech Recognition

• Other modest successes in: Stock market predictions,


Combinatorial optimization, Medicine etc.

This file is meant for personal use by [email protected] only.


4
Sharing or publishing the contents in part or full is liable for legal action.
The Perceptron
• Perceptron: The main building block

Bias

[email protected]
3DVB5QRZ69

Activation function

Activation function

This file is meant for personal use by [email protected] only.


5
Sharing or publishing the contents in part or full is liable for legal action.
The Artificial Neural Net

• Number of Layers

• Single Vs

• Multi Layer
[email protected]
3DVB5QRZ69
• Number of Nodes in each
Layer

• Weights/connections

• Activation or Transfer function

This file is meant for personal use by [email protected] only.


6
Sharing or publishing the contents in part or full is liable for legal action.
Examples of Activation functions

• ReLu (with Softmax/Linear)

• Sigmoid (Logistic)

• Hyperbolic Tangent (tanh)

• Step function (Heaviside)


[email protected]


3DVB5QRZ69
Softmax (Generalized Logistic)

• Linear

• Which one do we use?

• There is no set procedure/Rule

• ReLu has become very popular

This file is meant for personal use by [email protected] only.


7
Sharing or publishing the contents in part or full is liable for legal action.
An Example

[email protected]
3DVB5QRZ69

This file is meant for personal use by [email protected] only.


8
Sharing or publishing the contents in part or full is liable for legal action.
Learning

• How to determine the weights?

• Start with guess values for weights

• Calculate outputs from inputs

• Compare outputs to desired outputs: Calculating errors


[email protected]
3DVB5QRZ69
• Training algorithms update the weights in a way to
minimize the errors (cost functions)

• Cost (loss) functions: measures how close an output is to


the desired output. Preferably:

• Non-negativity

• Globally continuous and differentiable

This file is meant for personal use by [email protected] only.


9
Sharing or publishing the contents in part or full is liable for legal action.
Training Also: Back Propagation

• Back propagation of errors, is a common algorithm to train


artificial neural networks used in conjunction with an
optimization method such as gradient descent.


[email protected]
3DVB5QRZ69 The method calculates the gradient of a loss function with
respect to all the weights in the network.

• The gradient is fed to the optimization method which in turn


uses it to update the weights, in an attempt to minimize the
loss function.

This file is meant for personal use by [email protected] only.


10
Sharing or publishing the contents in part or full is liable for legal action.
Gradient Descent

• A first-order optimization algorithm.

• Essentially equivalent to falling down on a slope to


eventually find the minimum (lowest point in the valley).

• To find a minimum (valley) of a function, take a small step


along the steepest direction. And keep iterating.

• For finding maximums we’d do gradient ascent.


[email protected]
3DVB5QRZ69

This file is meant for personal use by [email protected] only.


11
Sharing or publishing the contents in part or full is liable for legal action.
Learning rate

• Choosing the Learning rate ( )

• Too small, we will need too many iterations for


convergence

• Too large, we may skip the optimal solution


[email protected]
3DVB5QRZ69 Adaptive Learning Rate : start with high learning rate and
gradually reduce the learning rate with each iteration

• Can also be chosen by Trial & Error – Use a range of


learning rate (1, 0.1, 0.001, 0.0001) and use the results as a
guide

This file is meant for personal use by [email protected] only.


12
Sharing or publishing the contents in part or full is liable for legal action.
Epochs, Batch size, Iterations

• When dataset is too large, passing all the data through a


Neural net before we make weight updates is computationally
expensive

• Instead we would create data batches with smaller batch size.

• After each batch is passed and weights updated, we will count


it as one iteration.
[email protected]


3DVB5QRZ69
When an entire dataset is passed forward and backward
(weights updated) through the neural network, we will count it
as one epoch.

• Too few epochs: under fitting, Too many: overfitting

• Batch training: All of the training samples pass through the


neural net, before weights are updated

• Sequential training: Weights are updated after each training


vector is passed through the neural net.
This file is meant for personal use by [email protected] only.
13
Sharing or publishing the contents in part or full is liable for legal action.
Scaling
• Scaling the Variables

• The non-linearities in the activation function and the


numerical rounding errors make input scaling quite
important

• Scaling can accelerate learning and improve


performance
[email protected]
3DVB5QRZ69

This file is meant for personal use by [email protected] only.


14
Sharing or publishing the contents in part or full is liable for legal action.
Overfitting

• Neural Network Models are susceptible to overfitting

• Large number of weights and biases

• Excessive learning (Epocs) on training data

3DVB5QRZ69 •
[email protected]
Ways to avoid Overfitting

• Increase sample size

• Early stopping

• Reduce Network Size

• Regularization

This file is meant for personal use by [email protected] only.


15
Sharing or publishing the contents in part or full is liable for legal action.
Regularization (weight decay)

• Regularization is a technique used to avoid this overfitting


problem.

• The idea behind regularization is that models that overfit the


data are complex models that have for example too many
parameters.


[email protected]
Regularization penalizes the usual loss function by adding a
3DVB5QRZ69
complexity term that would give a bigger loss for more
complex models.

• Types of Regularization

• LASSO

• Ridge

• Optimal value of λ, the decay rate or penalty coefficient, is


determined through cross-validation
This file is meant for personal use by [email protected] only.
16
Sharing or publishing the contents in part or full is liable for legal action.
Hyper parameters and tuning

• Hyper parameters are the variables which determines the


network structure and the variables which determine how
the network is trained

• Number of hidden layers

• Number of neurons in hidden layers


[email protected]
3DVB5QRZ69

• Decay factor

• Number of Epoch

• Learning Rate

• The Activation Function

• Tuning these to ensure that you don’t overfit is an art.

This file is meant for personal use by [email protected] only.


17
Sharing or publishing the contents in part or full is liable for legal action.

You might also like