Lect 12 -Deep Feed Forward NN- Review
Lect 12 -Deep Feed Forward NN- Review
Umarani Jayaraman
Outline
2
Linear classifier
perceptron
Non-linear classifiers
MLP, Neural Networks
Optimization Techniques
Loss Optimization – Gradient Descent
Batch Optimization – Batch, Stochastic and
Mini-Batch
Overfitting – Dropout, Early Stopping
3
Linear Classifier-The
Perceptron
The structural building
block of deep learning
The Perceptron: Forward Propagation
4
The Perceptron: Forward Propagation
5
The Perceptron: Forward Propagation
6
The Perceptron: Forward Propagation
7
Common activation function
8
Importance of Activation
9
Functions
What is the purpose of activation
function?
The purpose of activation functions is to
introduce non-linearity into the
network
What if we wanted to build a neural
network to distinguish green against red
points?
Importance of activation
10
functions
The purpose of activation functions is to
introduce non-linearity into the
network
Linear activation functions produce
linear decision boundary with minimal
error
Importance of activation
11
functions
The purpose of activation functions is to
introduce non-linearity into the
network
Non-linearity allow us to approximate
arbitrarily complex functions
The perceptron: Example
12
The perceptron: Example
13
The perceptron: Example
14
The perceptron: Example
15
16 Neural Networks
Building Neural Networks
with Perceptorns
The perceptron : Simplified
17
The perceptron : Simplified
18
The perceptron
Activation Functions
Neural Networks
Types of Neural Networks
Deep Neural Networks
28 Neural Networks
Applying Neural
Networks
Example Problem
29
feature model
X = Number of lectures you
1
attend
X = Hours spent on each topics
2
and the final project
Example Problem
30
Compute gradient
Loss Optimization
45
Loss Optimization
Batch Optimization
Overfitting
Training Neural Networks is
59
Difficult
Visualizing the loss landscape of neural
net, Dec 2017
Loss functions can be difficult to
optimize
Loss Optimization
60
Loss Optimization
Batch Optimization
Overfitting
Gradient Descent
68
Gradient Descent
69
It can be
computationally
intensive to compute
Stochastic Gradient
70
Descent
Loss Optimization
Batch Optimization
Over-fitting
The problem of Over-fitting
75
What is it?
Technique that constrains our optimization
problem to discourage (not to have)
complex models
Why do we need it?
Improve generalization of our model on
unseen data
Regularization 1: Dropout
79
Thank you
Extra Slides
92
Sources:
93
Loss Functions
https://deeplearningdemystified.com/arti
cle/fdl-3
https://gombru.github.io/2018/05/23/cros
s_entropy_loss/