Layer-Wise Training Rectified Linear Activation Function: Kick-Start Your Project With My New Book

The document discusses the vanishing gradients problem, where the gradient information propagated back through deep neural networks becomes very small in layers close to the input, making learning difficult. This can limit the development of deep networks using activations like tanh. The document proposes using rectified linear activation and He weight initialization to fix this in a multilayer perceptron classifying two circles. It will review average gradient size during training to diagnose and confirm that ReLU improves gradient flow through the model.

Uploaded by

Abhishek Sanap

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views3 pages

Layer-Wise Training Rectified Linear Activation Function: Kick-Start Your Project With My New Book

Uploaded by

Abhishek Sanap

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

The vanishing gradients problem is one example of unstable behavior that you may encounter when

training a deep neural network.

It describes the situation where a deep multilayer feed-forward network or a recurrent neural network
is unable to propagate useful gradient information from the output end of the model back to the
layers near the input end of the model.

The result is the general inability of models with many layers to learn on a given dataset, or for
models with many layers to prematurely converge to a poor solution.

Many fixes and workarounds have been proposed and investigated, such as alternate weight
initialization schemes, unsupervised pre-training, layer-wise training, and variations on gradient
descent. Perhaps the most common change is the use of the rectified linear activation function that
has become the new default, instead of the hyperbolic tangent activation function that was the default
through the late 1990s and 2000s.
In this tutorial, you will discover how to diagnose a vanishing gradient problem when training a
neural network model and how to fix it using an alternate activation function and weight initialization
scheme.

After completing this tutorial, you will know:

 The vanishing gradients problem limits the development of deep neural networks with
classically popular activation functions such as the hyperbolic tangent.
 How to fix a deep neural network Multilayer Perceptron for classification using ReLU and
He weight initialization.
 How to use TensorBoard to diagnose a vanishing gradient problem and confirm the impact of
ReLU to improve the flow of gradients through the model.
Kick-start your project with my new book Better Deep Learning, including step-by-step
tutorials and the Python source code files for all examples.
Let’s get started.

 Updated Oct/2019: Updated for Keras 2.3 and TensorFlow 2.0.

How to Fix the Vanishing Gradient By Using the Rectified Linear Activation Function

Photo by Liam Moloney, some rights reserved.

Tutorial Overview
This tutorial is divided into five parts; they are:

1. Vanishing Gradients Problem

2. Two Circles Binary Classification Problem
3. Multilayer Perceptron Model for Two Circles Problem
4. Deeper MLP Model with ReLU for Two Circles Problem
5. Review Average Gradient Size During Training
Vanishing Gradients Problem
Neural networks are trained using stochastic gradient descent.

This involves first calculating the prediction error made by the model and using the error to estimate
a gradient used to update each weight in the network so that less error is made next time. This error
gradient is propagated backward through the network from the output layer to the input layer.
It is desirable to train neural networks with many layers, as the addition of more layers increases the
capacity of the network, making it capable of learning a large training dataset and efficiently
representing more complex mapping functions from inputs to outputs.

A problem with training networks with many layers (e.g. deep neural networks) is that the gradient
diminishes dramatically as it is propagated backward through the network. The error may be so small
by the time it reaches layers close to the input of the model that it may have very little effect. As
such, this problem is referred to as the “vanishing gradients” problem.
Vanishing gradients make it difficult to know which direction the parameters should move to
improve the cost function …

— Page 290, Deep Learning, 2016.

In fact, the error gradient can be unstable in deep neural networks and not only vanish, but also
explode, where the gradient exponentially increases as it is propagated backward through the
network. This is referred to as the “exploding gradient” problem.
The term vanishing gradient refers to the fact that in a feedforward network (FFN) the
backpropagated error signal typically decreases (or increases) exponentially as a function of the
distance from the final layer.

— Random Walk Initialization for Training Very Deep Feedforward Networks, 2014.
Vanishing gradients is a particular problem with recurrent neural networks as the update of the
network involves unrolling the network for each input time step, in effect creating a very deep
network that requires weight updates. A modest recurrent neural network may have 200-to-400 input
time steps, resulting conceptually in a very deep network.

The vanishing gradients problem may be manifest in a Multilayer Perceptron by a slow rate of
improvement of a model during training and perhaps premature convergence, e.g. continued training
does not result in any further improvement. Inspecting the changes to the weights during training, we
would see more change (i.e. more learning) occurring in the layers closer to the output layer and less
change occurring in the layers close to the input layer.

There are many techniques that can be used to reduce the impact of the vanishing gradients problem
for feed-forward neural networks, most notably alternate weight initialization schemes and use of
alternate activation functions.

Gradient Problems
No ratings yet
Gradient Problems
8 pages
Weight Initialization Techniques Assignment Questions
No ratings yet
Weight Initialization Techniques Assignment Questions
8 pages
Vanishing Gradient Problem in Deep Learning Understanding Intuition and Solutions
No ratings yet
Vanishing Gradient Problem in Deep Learning Understanding Intuition and Solutions
8 pages
Deep Learning Gradient Challenges
No ratings yet
Deep Learning Gradient Challenges
8 pages
Abss
No ratings yet
Abss
8 pages
Gradient Exploding Vanishing Problem v2
No ratings yet
Gradient Exploding Vanishing Problem v2
3 pages
Vanishing & Exploding Gradient Fixes
No ratings yet
Vanishing & Exploding Gradient Fixes
41 pages
Vanishing & Exploding Gradients in NN
No ratings yet
Vanishing & Exploding Gradients in NN
2 pages
CT1 DL Ans
No ratings yet
CT1 DL Ans
13 pages
Training Deep Neural Networks Hifi
No ratings yet
Training Deep Neural Networks Hifi
267 pages
RNN Gradient Problems Explained
No ratings yet
RNN Gradient Problems Explained
4 pages
Deep Learning: Training Techniques
No ratings yet
Deep Learning: Training Techniques
42 pages
Deep FFNN QA Final Clean
No ratings yet
Deep FFNN QA Final Clean
4 pages
LSTM Introduction
No ratings yet
LSTM Introduction
3 pages
Module 2
No ratings yet
Module 2
13 pages
Al3502 - DLV Unit 2
No ratings yet
Al3502 - DLV Unit 2
18 pages
RNN Stanford
No ratings yet
RNN Stanford
44 pages
Unit 5 (Second Half)
No ratings yet
Unit 5 (Second Half)
10 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
26 pages
Deep Learning Challenges & Solutions
No ratings yet
Deep Learning Challenges & Solutions
64 pages
Unit 2
No ratings yet
Unit 2
10 pages
Random Walk Initialization for Deep Networks
No ratings yet
Random Walk Initialization for Deep Networks
10 pages
Lecture 3
No ratings yet
Lecture 3
24 pages
RNN Gradient Stability Techniques
No ratings yet
RNN Gradient Stability Techniques
13 pages
Lecture 02 With Notes
No ratings yet
Lecture 02 With Notes
65 pages
5.4 Numerical Stability and Initialization: 5.4.1 Vanishing and Exploding Gradients
No ratings yet
5.4 Numerical Stability and Initialization: 5.4.1 Vanishing and Exploding Gradients
23 pages
Iva Unit-5 Edited
No ratings yet
Iva Unit-5 Edited
42 pages
Lecture 5-6
No ratings yet
Lecture 5-6
45 pages
Chapter 3
No ratings yet
Chapter 3
17 pages
Neural Networks: A Beginner's Guide
No ratings yet
Neural Networks: A Beginner's Guide
23 pages
Introai Last Edit
No ratings yet
Introai Last Edit
11 pages
Gradient Descent
No ratings yet
Gradient Descent
8 pages
DL Unit2
No ratings yet
DL Unit2
113 pages
DeekshikaJadyada20 AP24LDS11
No ratings yet
DeekshikaJadyada20 AP24LDS11
4 pages
DL U-I Introduction Part-2
No ratings yet
DL U-I Introduction Part-2
48 pages
AIMLB PGP 2025 Session 15
No ratings yet
AIMLB PGP 2025 Session 15
23 pages
A) Explanation of Two Tensor Operations With Examp
No ratings yet
A) Explanation of Two Tensor Operations With Examp
11 pages
Understanding Deep Convolutional Networks
No ratings yet
Understanding Deep Convolutional Networks
17 pages
19EEE362:Deep Learning For Visual Computing: Dr.T.Ananthan
No ratings yet
19EEE362:Deep Learning For Visual Computing: Dr.T.Ananthan
23 pages
LecML - 3 NN
No ratings yet
LecML - 3 NN
33 pages
全局极小的结构
No ratings yet
全局极小的结构
42 pages
Lecture 6
No ratings yet
Lecture 6
10 pages
Deep Neural Network
No ratings yet
Deep Neural Network
60 pages
CS231n Deep Learning For Computer Vision p-1
No ratings yet
CS231n Deep Learning For Computer Vision p-1
10 pages
Xuesong Wang Et Al - 2021 - Multipath Ensemble Convolutional Neural Network
No ratings yet
Xuesong Wang Et Al - 2021 - Multipath Ensemble Convolutional Neural Network
9 pages
Deep Network Questions Answers Final
No ratings yet
Deep Network Questions Answers Final
3 pages
Data Science Interview Qes.
No ratings yet
Data Science Interview Qes.
15 pages
SS 2020 Solutions
No ratings yet
SS 2020 Solutions
22 pages
3b Dynamics
No ratings yet
3b Dynamics
19 pages
Imperial Dlcourse2022 RNN Notes
No ratings yet
Imperial Dlcourse2022 RNN Notes
9 pages
PATTERN
No ratings yet
PATTERN
2 pages
Deep Learning 15
No ratings yet
Deep Learning 15
13 pages
04 Numerical
No ratings yet
04 Numerical
39 pages
Deep Learning Notes-2
No ratings yet
Deep Learning Notes-2
16 pages
06 AIS302 ANN Backpropagation
No ratings yet
06 AIS302 ANN Backpropagation
83 pages
DL M2 Tech
No ratings yet
DL M2 Tech
32 pages
Curriculum: Tuesday, February 15, 2022 3:30 PM
No ratings yet
Curriculum: Tuesday, February 15, 2022 3:30 PM
408 pages
Curriculum: Tuesday, February 15, 2022 3:30 PM
No ratings yet
Curriculum: Tuesday, February 15, 2022 3:30 PM
358 pages
Insem2 Scheme
No ratings yet
Insem2 Scheme
6 pages
Fnins 17 1239839
No ratings yet
Fnins 17 1239839
11 pages
Cardiac AP
No ratings yet
Cardiac AP
1 page
Neural Control of Locomotion
No ratings yet
Neural Control of Locomotion
6 pages
116 - Concept of Perception and Coordination
No ratings yet
116 - Concept of Perception and Coordination
3 pages
Nervous System Overview
No ratings yet
Nervous System Overview
25 pages
Blok Nbss 2021
No ratings yet
Blok Nbss 2021
14 pages
Human Physiology From Cells To Systems 9th Edition Sherwood Test Bank Download
100% (21)
Human Physiology From Cells To Systems 9th Edition Sherwood Test Bank Download
24 pages
Unit 6 Application of AI
No ratings yet
Unit 6 Application of AI
91 pages
2022-Nep-1 Sem - Psychology - DSC 1
No ratings yet
2022-Nep-1 Sem - Psychology - DSC 1
3 pages
The Spinal Cord (Structure and Functions)
No ratings yet
The Spinal Cord (Structure and Functions)
6 pages
Neuron Infographics by Slidesgo
No ratings yet
Neuron Infographics by Slidesgo
35 pages
Onlinefirst Talaber Laha2021
No ratings yet
Onlinefirst Talaber Laha2021
13 pages
Apical and Basal Dendrites
No ratings yet
Apical and Basal Dendrites
16 pages
TENS Balance
No ratings yet
TENS Balance
7 pages
01 - Introduction To Neuroanatomy - DR Najeeb Videos
100% (1)
01 - Introduction To Neuroanatomy - DR Najeeb Videos
6 pages
Autonomic Nervous System-Note
No ratings yet
Autonomic Nervous System-Note
13 pages
Neuromorphic Computing Advances
No ratings yet
Neuromorphic Computing Advances
10 pages
DL
No ratings yet
DL
2 pages
The Vestibular System A Sixth Sense
No ratings yet
The Vestibular System A Sixth Sense
5 pages
BIOTA QUIZ BEE MOCK TEST PART 3 Key
No ratings yet
BIOTA QUIZ BEE MOCK TEST PART 3 Key
2 pages
2006 - Neuroanatomy of The Oculomotor System. Preface
No ratings yet
2006 - Neuroanatomy of The Oculomotor System. Preface
2 pages
Nervous
No ratings yet
Nervous
19 pages
Homework Fifth Grade Nervous System
No ratings yet
Homework Fifth Grade Nervous System
9 pages
ANS & CVS Response To Exercise v2
No ratings yet
ANS & CVS Response To Exercise v2
18 pages
Mason 1
No ratings yet
Mason 1
1 page
Deep Learning Question Bank
No ratings yet
Deep Learning Question Bank
8 pages
Img 20240111 0002
No ratings yet
Img 20240111 0002
11 pages
Lab 19 Stimuli Reflexes
No ratings yet
Lab 19 Stimuli Reflexes
8 pages
Backpropagation Example
No ratings yet
Backpropagation Example
9 pages
Nervous System Reviewer
100% (2)
Nervous System Reviewer
9 pages

Layer-Wise Training Rectified Linear Activation Function: Kick-Start Your Project With My New Book

Uploaded by

Layer-Wise Training Rectified Linear Activation Function: Kick-Start Your Project With My New Book

Uploaded by

The vanishing gradients problem is one example of unstable behavior that you may encounter when

training a deep neural network.

After completing this tutorial, you will know:

 Updated Oct/2019: Updated for Keras 2.3 and TensorFlow 2.0.

Photo by Liam Moloney, some rights reserved.

1. Vanishing Gradients Problem

— Page 290, Deep Learning, 2016.

You might also like