0% found this document useful (0 votes)
6 views

Lect 12 -Deep Feed Forward NN- Review

The document provides a comprehensive review of deep feedforward neural networks, covering topics such as linear and non-linear classifiers, optimization techniques, and the importance of activation functions. It discusses the structure and function of perceptrons, various types of neural networks, and methods for loss optimization and regularization to prevent overfitting. Additionally, it highlights practical aspects of training neural networks, including gradient descent algorithms and adaptive learning rates.

Uploaded by

cs22b2021
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Lect 12 -Deep Feed Forward NN- Review

The document provides a comprehensive review of deep feedforward neural networks, covering topics such as linear and non-linear classifiers, optimization techniques, and the importance of activation functions. It discusses the structure and function of perceptrons, various types of neural networks, and methods for loss optimization and regularization to prevent overfitting. Additionally, it highlights practical aspects of training neural networks, including gradient descent algorithms and adaptive learning rates.

Uploaded by

cs22b2021
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 93

1

DEEP FEED FORWARD NEURAL


NETWORKS-
A COMPLETE REVIEW

Umarani Jayaraman
Outline
2

 Linear classifier
 perceptron
 Non-linear classifiers
 MLP, Neural Networks
 Optimization Techniques
 Loss Optimization – Gradient Descent
 Batch Optimization – Batch, Stochastic and
Mini-Batch
 Overfitting – Dropout, Early Stopping
3
Linear Classifier-The
Perceptron
The structural building
block of deep learning
The Perceptron: Forward Propagation
4
The Perceptron: Forward Propagation
5
The Perceptron: Forward Propagation
6
The Perceptron: Forward Propagation
7
Common activation function
8
Importance of Activation
9
Functions
 What is the purpose of activation
function?
 The purpose of activation functions is to
introduce non-linearity into the
network
 What if we wanted to build a neural
network to distinguish green against red
points?
Importance of activation
10
functions
 The purpose of activation functions is to
introduce non-linearity into the
network
 Linear activation functions produce
linear decision boundary with minimal
error
Importance of activation
11
functions
 The purpose of activation functions is to
introduce non-linearity into the
network
 Non-linearity allow us to approximate
arbitrarily complex functions
The perceptron: Example
12
The perceptron: Example
13
The perceptron: Example
14
The perceptron: Example
15
16 Neural Networks
Building Neural Networks
with Perceptorns
The perceptron : Simplified
17
The perceptron : Simplified
18

 This is also called as Single Layer Single


Output
Neural Network: Single Layer Multiple
Output or Multi output Perceptron
19

 Because all inputs are densely


connected to all outputs
 These layers are called Dense layers or
sometimes fully connected layers
Dense layer from scratch in TensorFlow
20
Neural Network: Single Layer Multiple
Output or Multi output Perceptron
21
Neural Network: Multiple Layer Multiple
Output
22

 Multiple Layer Neural Networks


Multiple Layer Neural
23
Network
Multi Output Perceptron (Multiple Layer
Perceptorn)
24
Deep Neural Networks
25
Deep Neural Networks
26
Summary
27

 The perceptron
 Activation Functions
 Neural Networks
 Types of Neural Networks
 Deep Neural Networks
28 Neural Networks
Applying Neural
Networks
Example Problem
29

 How do expertise in the field of


deep learning?
 Lets start with a simple two

feature model
 X = Number of lectures you
1
attend
 X = Hours spent on each topics
2
and the final project
Example Problem
30

 How do expertise in the field of deep


learning?
Example Problem
31

 How do expertise in the field of deep


learning?
Example Problem
32

 How do expertise in the field of deep


learning?
Example Problem
33

 How do expertise in the field of deep


learning?
Quantifying loss
34

 The loss of our network measures


the cost incurred from incorrect
predictions (misclassified samples)
Empirical Loss
35

 The empirical loss (mean loss) measures


the total loss over our entire dataset. Let
say there are n samples
i) Binary cross entropy loss
36

 Cross entropy loss can be used with


models that output a probability
between 0 and 1
ii) Mean Squared Error Loss
37

 Mean squared error loss can be used


with regression models that output
continuous real numbers
38 Neural Networks
Training Neural
Networks
Training Neural Network
39
Loss optimization
40

 The goal is to find the weight that


achieve the lowest error rate
Loss optimization
41

 The goal is to find the weight that


achieve the lowest error rate
Loss optimization
42
Loss optimization
43

 Randomly pick an initial weight (wo,w1)


Loss optimization
44

 Compute gradient
Loss Optimization
45

 Take small step in opposite direction of


gradient
Gradient Descent
46

 Repeat until convergence


Gradient Descent
47
Gradient Descent
48
Loss optimization for linear regression
49
Gradient Descent Algorithm
50
Gradient Descent
51
Computing Gradients (Step 3): Back
propagation
52

 How does a small change in one weight


(w2) affect the final loss J(w)?
Computing Gradients: Back propagation
53

 How does a small change in one weight


(w2) affect the final loss J(w)?
 Lets use chain rule
Computing Gradients: Back propagation
54

 How does a small change in one weight


(w2) affect the final loss J(w)?
 Lets use chain rule
Computing Gradients: Back propagation
55
Computing Gradients: Back propagation
56
Computing Gradients: Back propagation
57

 Repeat this for every weight in the


network using gradients from last layers
58 Neural Networks in Practice

Loss Optimization
Batch Optimization
Overfitting
Training Neural Networks is
59
Difficult
 Visualizing the loss landscape of neural
net, Dec 2017
 Loss functions can be difficult to
optimize
Loss Optimization
60

 The loss is extremely non convex


Loss Optimization
61

 Loss function is optimized through


gradient descent
Loss optimization
62

 How can we set the learning rate?


Loss optimization
63

 Small learning rate converges slowly and


gets stuck in false local minima
 Large learning rate overshoot, become
unstable and diverge
 Stable learning rates converge smoothly
and avoid local minima
Loss optimization
64

 How to deal with this?


 Idea 1: Try lots of different learning rates
and see what works “right”
 Idea 2: Do something smarter ! Design
an adaptive learning rate that “adapts”
to the landscape
Adaptive Learning Rates
65

 Learning rates are no longer fixed


 Can be larger or smaller depending on
 How large gradient is
 How fast learning is happening
 Size of particular weight
 Etc.
Gradient Descent
66
Algorithms
 Algorithm
 SGD
 Adam
 Adadelta
 Adagrad
 RMSProp
67 Neural Networks in Practice

Loss Optimization
Batch Optimization
Overfitting
Gradient Descent
68
Gradient Descent
69

It can be
computationally
intensive to compute
Stochastic Gradient
70
Descent

Easy to compute but very


noisy
71

Fast to compute and a


much better estimate of
the true gradient
Mini-batches while training
72

 More accurate estimation of gradient


 Smoother convergence
 Allows for larger leaning rates
 Mini-batches lead to fast training
Error minimization with
73
iterations
74 Neural Networks in Practice

Loss Optimization
Batch Optimization
Over-fitting
The problem of Over-fitting
75

 It is also known as problem of


generalization
The problem of Over-fitting
76
The problem of Over-fitting
77
Regularization
78

 What is it?
 Technique that constrains our optimization
problem to discourage (not to have)
complex models
 Why do we need it?
 Improve generalization of our model on
unseen data
Regularization 1: Dropout
79

 During training, randomly set some


activation to 0
Regularization 1: Dropout
80

 During training, randomly set some


activation to 0
 Typically ‘drop’ 50% of activations in
layer
 Forces network to not rely on any one
node
Regularization 1: Dropout
81

 During training, randomly set some


activation to 0
 Typically ‘drop’ 50% of activations in
layer
 Forces network to not rely on any one
node
Regularization 1: Dropout
82
Regularization 2 : Early
83
Stopping
 Stop training before we have a chance to
overfit
Regularization 2 : Early
84
Stopping
 Stop training before we have a chance to
overfit
Regularization 2 : Early
85
Stopping
 Stop training before we have a chance to
overfit
Regularization 2 : Early
86
Stopping
 Stop training before we have a chance to
overfit
Regularization 2 : Early
87
Stopping
 Stop training before we have a chance to
overfit
Regularization 2 : Early
88
Stopping
 Stop training before we have a chance to
overfit
Regularization 2 : Early
89
Stopping
Summary
90
91

Thank you
Extra Slides
92
Sources:
93

 Loss Functions
 https://deeplearningdemystified.com/arti
cle/fdl-3
 https://gombru.github.io/2018/05/23/cros
s_entropy_loss/

You might also like