0% found this document useful (0 votes)

107 views

How To Choose An Activation Function For Deep Learning

The document discusses choosing activation functions for deep learning neural networks. It explains that activation functions transform weighted inputs into outputs for each node in a layer. The choice of activation function affects how well a network can learn. For hidden layers, rectified linear activation (ReLU) is most commonly used due to its effectiveness. For output layers, the activation depends on the prediction task, such as sigmoid for classification and linear for regression. The document provides details on ReLU, sigmoid, and hyperbolic tangent activation functions.

Uploaded by

Abdul Khaliq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

107 views

How To Choose An Activation Function For Deep Learning

Uploaded by

Abdul Khaliq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

 Navigation

Click to Take the FREE Deep Learning Crash-Course

Search... 

How to Choose an Activation Function for Deep Learning

by Jason Brownlee on January 18, 2021 in Deep Learning

Tweet Tweet
Share Share

Last Updated on January 22, 2021

Activation functions are a critical part of the design of a neural network.

The choice of activation function in the hidden layer will control how well the network model learns the training
dataset. The choice of activation function in the output layer will define the type of predictions the model can make.

As such, a careful choice of activation function must be made for each deep learning neural network project.

In this tutorial, you will discover how to choose activation functions for neural network models.

After completing this tutorial, you will know:

Activation functions are a key part of neural network design.

The modern default activation function for hidden layers is the ReLU function.
The activation function for output layers depends on the type of prediction problem.

Let’s get started.

How to Choose an Activation Function for Deep Learning

Photo by Peter Dowley, some rights reserved.

Tutorial Overview
This tutorial is divided into three parts; they are:

1. Activation Functions
2. Activation for Hidden Layers
3. Activation for Output Layers

Activation Functions
An activation function in a neural network defines how the weighted sum of the input is transformed into an output
from a node or nodes in a layer of the network.

Sometimes the activation function is called a “transfer function.” If the output range of the activation function is
limited, then it may be called a “squashing function.” Many activation functions are nonlinear and may be referred
to as the “nonlinearity” in the layer or the network design.

The choice of activation function has a large impact on the capability and performance of the neural network, and
different activation functions may be used in different parts of the model.

Technically, the activation function is used within or after the internal processing of each node in the network,
although networks are designed to use the same activation function for all nodes in a layer.
A network may have three types of layers: input layers that take raw input from the domain, hidden layers that
take input from another layer and pass output to another layer, and output layers that make a prediction.

All hidden layers typically use the same activation function. The output layer will typically use a different activation
function from the hidden layers and is dependent upon the type of prediction required by the model.

Activation functions are also typically differentiable, meaning the first-order derivative can be calculated for a given
input value. This is required given that neural networks are typically trained using the backpropagation of error
algorithm that requires the derivative of prediction error in order to update the weights of the model.

There are many different types of activation functions used in neural networks, although perhaps only a small
number of functions used in practice for hidden and output layers.

Let’s take a look at the activation functions used for each type of layer in turn.

Activation for Hidden Layers

A hidden layer in a neural network is a layer that receives input from another layer (such as another hidden layer or
an input layer) and provides output to another layer (such as another hidden layer or an output layer).

A hidden layer does not directly contact input data or produce outputs for a model, at least in general.

A neural network may have zero or more hidden layers.

Typically, a differentiable nonlinear activation function is used in the hidden layers of a neural network. This allows
the model to learn more complex functions than a network trained using a linear activation function.

In order to get access to a much richer hypothesis space that would benefit from deep representations,
 you need a non-linearity, or activation function.

— Page 72, Deep Learning with Python, 2017.

There are perhaps three activation functions you may want to consider for use in hidden layers; they are:

Rectified Linear Activation (ReLU)

Logistic (Sigmoid)
Hyperbolic Tangent (Tanh)

This is not an exhaustive list of activation functions used for hidden layers, but they are the most commonly used.

Let’s take a closer look at each in turn.

ReLU Hidden Layer Activation Function

The rectified linear activation function, or ReLU activation function, is perhaps the most common function used for
hidden layers.

It is common because it is both simple to implement and effective at overcoming the limitations of other previously
popular activation functions, such as Sigmoid and Tanh. Specifically, it is less susceptible to vanishing gradients
that prevent deep models from being trained, although it can suffer from other problems like saturated or “dead”
units.
The ReLU function is calculated as follows:

max(0.0, x)

This means that if the input value (x) is negative, then a value 0.0 is returned, otherwise, the value is returned.

You can learn more about the details of the ReLU activation function in this tutorial:

A Gentle Introduction to the Rectified Linear Unit (ReLU)

We can get an intuition for the shape of this function with the worked example below.

1 # example plot for the relu activation function

2 from matplotlib import pyplot
3
4 # rectified linear function
5 def rectified(x):
6 return max(0.0, x)
7
8 # define input data
9 inputs = [x for x in range(-10, 10)]
10 # calculate outputs
11 outputs = [rectified(x) for x in inputs]
12 # plot inputs vs outputs
13 pyplot.plot(inputs, outputs)
14 pyplot.show()

Running the example calculates the outputs for a range of values and creates a plot of inputs versus outputs.

We can see the familiar kink shape of the ReLU activation function.
Plot of Inputs vs. Outputs for the ReLU Activation Function.

When using the ReLU function for hidden layers, it is a good practice to use a “He Normal” or “He Uniform” weight
initialization and scale input data to the range 0-1 (normalize) prior to training.

Sigmoid Hidden Layer Activation Function

The sigmoid activation function is also called the logistic function.

It is the same function used in the logistic regression classification algorithm.

The function takes any real value as input and outputs values in the range 0 to 1. The larger the input (more
positive), the closer the output value will be to 1.0, whereas the smaller the input (more negative), the closer the
output will be to 0.0.

The sigmoid activation function is calculated as follows:

1.0 / (1.0 + e^-x)

Where e is a mathematical constant, which is the base of the natural logarithm.

We can get an intuition for the shape of this function with the worked example below.

1 # example plot for the sigmoid activation function

2 from math import exp
3 from matplotlib import pyplot
4
5 # sigmoid activation function
6 def sigmoid(x):
7 return 1.0 / (1.0 + exp(-x))
8
9 # define input data
10 inputs = [x for x in range(-10, 10)]
11 # calculate outputs
12 outputs = [sigmoid(x) for x in inputs]
13 # plot inputs vs outputs
14 pyplot.plot(inputs, outputs)
15 pyplot.show()

Running the example calculates the outputs for a range of values and creates a plot of inputs versus outputs.

We can see the familiar S-shape of the sigmoid activation function.

Plot of Inputs vs. Outputs for the Sigmoid Activation Function.

When using the Sigmoid function for hidden layers, it is a good practice to use a “Xavier Normal” or “Xavier
Uniform” weight initialization (also referred to Glorot initialization, named for Xavier Glorot) and scale input data to
the range 0-1 (e.g. the range of the activation function) prior to training.

Tanh Hidden Layer Activation Function

The hyperbolic tangent activation function is also referred to simply as the Tanh (also “tanh” and “TanH“) function.

It is very similar to the sigmoid activation function and even has the same S-shape.
The function takes any real value as input and outputs values in the range -1 to 1. The larger the input (more
positive), the closer the output value will be to 1.0, whereas the smaller the input (more negative), the closer the
output will be to -1.0.

The Tanh activation function is calculated as follows:

(e^x – e^-x) / (e^x + e^-x)

Where e is a mathematical constant that is the base of the natural logarithm.

We can get an intuition for the shape of this function with the worked example below.

1 # example plot for the tanh activation function

2 from math import exp
3 from matplotlib import pyplot
4
5 # tanh activation function
6 def tanh(x):
7 return (exp(x) - exp(-x)) / (exp(x) + exp(-x))
8
9 # define input data
10 inputs = [x for x in range(-10, 10)]
11 # calculate outputs
12 outputs = [tanh(x) for x in inputs]
13 # plot inputs vs outputs
14 pyplot.plot(inputs, outputs)
15 pyplot.show()

Running the example calculates the outputs for a range of values and creates a plot of inputs versus outputs.

We can see the familiar S-shape of the Tanh activation function.

Plot of Inputs vs. Outputs for the Tanh Activation Function.

When using the TanH function for hidden layers, it is a good practice to use a “Xavier Normal” or “Xavier Uniform”
weight initialization (also referred to Glorot initialization, named for Xavier Glorot) and scale input data to the range
-1 to 1 (e.g. the range of the activation function) prior to training.

How to Choose a Hidden Layer Activation Function

A neural network will almost always have the same activation function in all hidden layers.

It is most unusual to vary the activation function through a network model.

Traditionally, the sigmoid activation function was the default activation function in the 1990s. Perhaps through the
mid to late 1990s to 2010s, the Tanh function was the default activation function for hidden layers.

… the hyperbolic tangent activation function typically performs better than the logistic sigmoid.

— Page 195, Deep Learning, 2016.

Both the sigmoid and Tanh functions can make the model more susceptible to problems during training, via the so-
called vanishing gradients problem.

You can learn more about this problem in this tutorial:

A Gentle Introduction to the Rectified Linear Unit (ReLU)

The activation function used in hidden layers is typically chosen based on the type of neural network architecture.

Modern neural network models with common architectures, such as MLP and CNN, will make use of the ReLU
activation function, or extensions.

In modern neural networks, the default recommendation is to use the rectified linear unit or ReLU …

— Page 174, Deep Learning, 2016.

Recurrent networks still commonly use Tanh or sigmoid activation functions, or even both. For example, the LSTM
commonly uses the Sigmoid activation for recurrent connections and the Tanh activation for output.

Multilayer Perceptron (MLP): ReLU activation function.

Convolutional Neural Network (CNN): ReLU activation function.
Recurrent Neural Network: Tanh and/or Sigmoid activation function.

If you’re unsure which activation function to use for your network, try a few and compare the results.

The figure below summarizes how to choose an activation function for the hidden layers of your neural network
model.

How to Choose a Hidden Layer Activation Function

Activation for Output Layers

The output layer is the layer in a neural network model that directly outputs a prediction.

All feed-forward neural network models have an output layer.

There are perhaps three activation functions you may want to consider for use in the output layer; they are:
Linear
Logistic (Sigmoid)
Softmax

This is not an exhaustive list of activation functions used for output layers, but they are the most commonly used.

Let’s take a closer look at each in turn.

Linear Output Activation Function

The linear activation function is also called “identity” (multiplied by 1.0) or “no activation.”

This is because the linear activation function does not change the weighted sum of the input in any way and
instead returns the value directly.

We can get an intuition for the shape of this function with the worked example below.

1 # example plot for the linear activation function

2 from matplotlib import pyplot
3
4 # linear activation function
5 def linear(x):
6 return x
7
8 # define input data
9 inputs = [x for x in range(-10, 10)]
10 # calculate outputs
11 outputs = [linear(x) for x in inputs]
12 # plot inputs vs outputs
13 pyplot.plot(inputs, outputs)
14 pyplot.show()

Running the example calculates the outputs for a range of values and creates a plot of inputs versus outputs.

We can see a diagonal line shape where inputs are plotted against identical outputs.
Plot of Inputs vs. Outputs for the Linear Activation Function

Target values used to train a model with a linear activation function in the output layer are typically scaled prior to
modeling using normalization or standardization transforms.

Sigmoid Output Activation Function

The sigmoid of logistic activation function was described in the previous section.

Nevertheless, to add some symmetry, we can review for the shape of this function with the worked example below.

1 # example plot for the sigmoid activation function

Running the example calculates the outputs for a range of values and creates a plot of inputs versus outputs.
We can see the familiar S-shape of the sigmoid activation function.

Plot of Inputs vs. Outputs for the Sigmoid Activation Function.

Target labels used to train a model with a sigmoid activation function in the output layer will have the values 0 or 1.

Softmax Output Activation Function

The softmax function outputs a vector of values that sum to 1.0 that can be interpreted as probabilities of class
membership.

It is related to the argmax function that outputs a 0 for all options and 1 for the chosen option. Softmax is a “softer”
version of argmax that allows a probability-like output of a winner-take-all function.

As such, the input to the function is a vector of real values and the output is a vector of the same length with values
that sum to 1.0 like probabilities.

The softmax function is calculated as follows:

e^x / sum(e^x)

Where x is a vector of outputs and e is a mathematical constant that is the base of the natural logarithm.

You can learn more about the details of the Softmax function in this tutorial:

Softmax Activation Function with Python

We cannot plot the softmax function, but we can give an example of calculating it in Python.

1 from numpy import exp

2
3 # softmax activation function
4 def softmax(x):
5 return exp(x) / exp(x).sum()
6
7 # define input data
8 inputs = [1.0, 3.0, 2.0]
9 # calculate outputs
10 outputs = softmax(inputs)
11 # report the probabilities
12 print(outputs)
13 # report the sum of the probabilities
14 print(outputs.sum())

Running the example calculates the softmax output for the input vector.

We then confirm that the sum of the outputs of the softmax indeed sums to the value 1.0.

1 [0.09003057 0.66524096 0.24472847]

2 1.0

Target labels used to train a model with the softmax activation function in the output layer will be vectors with 1 for
the target class and 0 for all other classes.

How to Choose an Output Activation Function

You must choose the activation function for your output layer based on the type of prediction problem that you are
solving.

Specifically, the type of variable that is being predicted.

For example, you may divide prediction problems into two main groups, predicting a categorical variable
(classification) and predicting a numerical variable (regression).

If your problem is a regression problem, you should use a linear activation function.

Regression: One node, linear activation.

If your problem is a classification problem, then there are three main types of classification problems and each may
use a different activation function.

Predicting a probability is not a regression problem; it is classification. In all cases of classification, your model will
predict the probability of class membership (e.g. probability that an example belongs to each class) that you can
convert to a crisp class label by rounding (for sigmoid) or argmax (for softmax).

If there are two mutually exclusive classes (binary classification), then your output layer will have one node and a
sigmoid activation function should be used. If there are more than two mutually exclusive classes (multiclass
classification), then your output layer will have one node per class and a softmax activation should be used. If there
are two or more mutually inclusive classes (multilabel classification), then your output layer will have one node for
each class and a sigmoid activation function is used.

Binary Classification: One node, sigmoid activation.

Multiclass Classification: One node per class, softmax activation.
Multilabel Classification: One node per class, sigmoid activation.
The figure below summarizes how to choose an activation function for the output layer of your neural network
model.

How to Choose an Output Layer Activation Function

Further Reading
This section provides more resources on the topic if you are looking to go deeper.

Tutorials
A Gentle Introduction to the Rectified Linear Unit (ReLU)
Softmax Activation Function with Python
4 Types of Classification Tasks in Machine Learning
How to Fix the Vanishing Gradients Problem Using the ReLU

Books
Deep Learning, 2016.
Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks, 1999.
Neural Networks for Pattern Recognition, 1996.
Deep Learning with Python, 2017.

Articles
Activation function, Wikipedia.

Summary
In this tutorial, you discovered how to choose activation functions for neural network models.
Specifically, you learned:

Activation functions are a key part of neural network design.

The modern default activation function for hidden layers is the ReLU function.
The activation function for output layers depends on the type of prediction problem.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

Develop Deep Learning Projects with Python!

What If You Could Develop A Network in Minutes
...with just a few lines of Python

Discover how in my new Ebook:

Deep Learning With Python

It covers end-to-end projects on topics like:

Multilayer Perceptrons, Convolutional Nets and Recurrent Neural Nets, and more...

Finally Bring Deep Learning To

Your Own Projects

Skip the Academics. Just Results.

SEE WHAT'S INSIDE

Tweet Tweet
Share Share

A Gentle Introduction to the Rectified Linear Unit (ReLU)

How to Develop a CNN From Scratch for CIFAR-10 Photo…

(eBook PDF) Pathways to Math Literacy 2nd Edition pdf download
100% (7)
(eBook PDF) Pathways to Math Literacy 2nd Edition pdf download
49 pages
M4 Assignment
100% (2)
M4 Assignment
4 pages
Iksc 5-6 (2016-2022)
No ratings yet
Iksc 5-6 (2016-2022)
67 pages
Chapter 8 Training and Developing Employees
50% (2)
Chapter 8 Training and Developing Employees
14 pages
Assessment Checklist
100% (1)
Assessment Checklist
5 pages
v2k Analysis
No ratings yet
v2k Analysis
25 pages
Activation Function
No ratings yet
Activation Function
31 pages
4 4 Choosing The Right Activation Function For Neural Networks
No ratings yet
4 4 Choosing The Right Activation Function For Neural Networks
25 pages
4 - Activation Functions in Neural Networks
No ratings yet
4 - Activation Functions in Neural Networks
12 pages
Activation Function in NN
No ratings yet
Activation Function in NN
29 pages
Activation Function
No ratings yet
Activation Function
43 pages
activatn fn 2
No ratings yet
activatn fn 2
10 pages
Activation Function
No ratings yet
Activation Function
4 pages
UNIT-III Activation-function
No ratings yet
UNIT-III Activation-function
6 pages
activation fn
No ratings yet
activation fn
15 pages
Activation Function
No ratings yet
Activation Function
9 pages
Activation Function
No ratings yet
Activation Function
36 pages
Need and Use of Activation Functions in Anndeep Learning
No ratings yet
Need and Use of Activation Functions in Anndeep Learning
7 pages
Experiment No. 1 SL-II (ANN)
No ratings yet
Experiment No. 1 SL-II (ANN)
3 pages
Unit 3 Deep Learning
No ratings yet
Unit 3 Deep Learning
11 pages
Ad3451 Ml Unit 4 Notes
No ratings yet
Ad3451 Ml Unit 4 Notes
34 pages
Unit 5 Activation Function
No ratings yet
Unit 5 Activation Function
15 pages
5 TH
No ratings yet
5 TH
22 pages
Module1
No ratings yet
Module1
124 pages
Module1 - Upto Loss Function
No ratings yet
Module1 - Upto Loss Function
137 pages
7 Types of Neural Network Activation Functions
No ratings yet
7 Types of Neural Network Activation Functions
16 pages
Deep Learning: International Islamic University of Chittagong
No ratings yet
Deep Learning: International Islamic University of Chittagong
31 pages
Activation
No ratings yet
Activation
7 pages
Performance Analysis of Various Activation Functio
No ratings yet
Performance Analysis of Various Activation Functio
7 pages
M2 PPT
No ratings yet
M2 PPT
84 pages
Unit Iv
No ratings yet
Unit Iv
34 pages
Fundamentals Deep Learning Activation Functions When To Use Them
No ratings yet
Fundamentals Deep Learning Activation Functions When To Use Them
15 pages
Mod 2.3 - Activation Function
No ratings yet
Mod 2.3 - Activation Function
9 pages
Perceptron in Machine Learning
No ratings yet
Perceptron in Machine Learning
11 pages
Activation Functions in Neural Networks - 241102 - 224129
No ratings yet
Activation Functions in Neural Networks - 241102 - 224129
7 pages
Activation Functions
No ratings yet
Activation Functions
9 pages
Act_Fun
No ratings yet
Act_Fun
7 pages
Types of Neural Network Activation Functions_ How to Choose_ (1)
No ratings yet
Types of Neural Network Activation Functions_ How to Choose_ (1)
36 pages
NN unit_1
No ratings yet
NN unit_1
27 pages
Activation Function
No ratings yet
Activation Function
18 pages
Activation Function
No ratings yet
Activation Function
44 pages
UNIT V NEURAL NETWORKS
No ratings yet
UNIT V NEURAL NETWORKS
35 pages
lecture 9-NN- modified
No ratings yet
lecture 9-NN- modified
94 pages
Neural Network example and Activation Functions Summary
No ratings yet
Neural Network example and Activation Functions Summary
2 pages
26- netinput activation function forward and back propogation
No ratings yet
26- netinput activation function forward and back propogation
41 pages
Functii de Activare1
No ratings yet
Functii de Activare1
89 pages
Study of Ensemble of Activation Functions in Deep Learning
No ratings yet
Study of Ensemble of Activation Functions in Deep Learning
10 pages
Activation Funtions
No ratings yet
Activation Funtions
26 pages
Unit 2_Activation Function_PR
No ratings yet
Unit 2_Activation Function_PR
22 pages
Mod 2.3 - Activation Function, Loss Functions
No ratings yet
Mod 2.3 - Activation Function, Loss Functions
12 pages
Activation functions 2
No ratings yet
Activation functions 2
5 pages
Aditya Jain NN Assignment
No ratings yet
Aditya Jain NN Assignment
13 pages
Activation Functions
No ratings yet
Activation Functions
8 pages
Activation Functions in Neural Networks: What Is Activation Function?
No ratings yet
Activation Functions in Neural Networks: What Is Activation Function?
11 pages
Deep Learning Tutorial 3
No ratings yet
Deep Learning Tutorial 3
12 pages
ML_Lec-22
No ratings yet
ML_Lec-22
25 pages
Activation Functions in Neural Networks
No ratings yet
Activation Functions in Neural Networks
10 pages
Ad3451 ML Unit 4 Notes Eduengg
No ratings yet
Ad3451 ML Unit 4 Notes Eduengg
36 pages
Lect 5- Non Linear Activation Functions
No ratings yet
Lect 5- Non Linear Activation Functions
41 pages
Deep Learning
No ratings yet
Deep Learning
5 pages
12 Types of Neural Network Activation Functions
No ratings yet
12 Types of Neural Network Activation Functions
38 pages
DL Answers
No ratings yet
DL Answers
24 pages
UNIT II DNN
No ratings yet
UNIT II DNN
24 pages
Artificial Neural Networks(ANN)
No ratings yet
Artificial Neural Networks(ANN)
67 pages
Activation Functions
No ratings yet
Activation Functions
6 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
1.chapter 12 Coding Decoding Odd One Out Syllogism 60
No ratings yet
1.chapter 12 Coding Decoding Odd One Out Syllogism 60
15 pages
Plots
No ratings yet
Plots
1 page
7 Time Series Datasets For Machine Learning
No ratings yet
7 Time Series Datasets For Machine Learning
8 pages
General Digital 1. General Digital Design Questions Design Questions
No ratings yet
General Digital 1. General Digital Design Questions Design Questions
44 pages
4 EXERCISE 1 - Complete
No ratings yet
4 EXERCISE 1 - Complete
9 pages
Weinmann Accuvac Basic Aspirator - Service Manual
No ratings yet
Weinmann Accuvac Basic Aspirator - Service Manual
32 pages
WelchAllyn CP-100,200 ECG - Service Manual
No ratings yet
WelchAllyn CP-100,200 ECG - Service Manual
96 pages
Wire Rope Lubricant PDF
No ratings yet
Wire Rope Lubricant PDF
1 page
Remove The Existing Database in Docker
No ratings yet
Remove The Existing Database in Docker
11 pages
3.3 The Inverse of A Matrix
No ratings yet
3.3 The Inverse of A Matrix
30 pages
Wind Energy Systems Classifications
No ratings yet
Wind Energy Systems Classifications
16 pages
Nature and Scope of Economics 2
No ratings yet
Nature and Scope of Economics 2
14 pages
Experiments - RC Circuit Sheet PDF
No ratings yet
Experiments - RC Circuit Sheet PDF
5 pages
Investigating the Impact of Product Quality Price
No ratings yet
Investigating the Impact of Product Quality Price
8 pages
OP-7.3.1.1 Process Failure Mode and Effect Analysis
No ratings yet
OP-7.3.1.1 Process Failure Mode and Effect Analysis
7 pages
Role of The Nurse Educator
No ratings yet
Role of The Nurse Educator
1 page
AE 007 Single Cylinder Two-stroke petrol engine with Rope brake dyna
No ratings yet
AE 007 Single Cylinder Two-stroke petrol engine with Rope brake dyna
1 page
Assignment 1
No ratings yet
Assignment 1
33 pages
Victor Brochure
No ratings yet
Victor Brochure
8 pages
Governance and Technology Becta Report
No ratings yet
Governance and Technology Becta Report
12 pages
تصميم
No ratings yet
تصميم
33 pages
HiLight H4 KD ESF - Instr - EN
No ratings yet
HiLight H4 KD ESF - Instr - EN
94 pages
Economic Analysis of Organic and Convectional Turmeric Cultivation of Erode District in Tamil Nadu
No ratings yet
Economic Analysis of Organic and Convectional Turmeric Cultivation of Erode District in Tamil Nadu
4 pages
Nokia 2000 Series
No ratings yet
Nokia 2000 Series
3 pages
Task Sheet CSS OHS
100% (1)
Task Sheet CSS OHS
8 pages
Grid Harmonic Impact of Multiple Electric Vehicle Fast Charging
No ratings yet
Grid Harmonic Impact of Multiple Electric Vehicle Fast Charging
10 pages
Vocal EQ Cheat Sheet
No ratings yet
Vocal EQ Cheat Sheet
11 pages
Tugas B Inggris
No ratings yet
Tugas B Inggris
2 pages

How To Choose An Activation Function For Deep Learning

Uploaded by

How To Choose An Activation Function For Deep Learning

Uploaded by

 Navigation

Click to Take the FREE Deep Learning Crash-Course

How to Choose an Activation Function for Deep Learning

Last Updated on January 22, 2021

Activation functions are a critical part of the design of a neural network.

After completing this tutorial, you will know:

Activation functions are a key part of neural network design.

Let’s get started.

Photo by Peter Dowley, some rights reserved.

Activation for Hidden Layers

A neural network may have zero or more hidden layers.

— Page 72, Deep Learning with Python, 2017.

Rectified Linear Activation (ReLU)

Let’s take a closer look at each in turn.

ReLU Hidden Layer Activation Function

A Gentle Introduction to the Rectified Linear Unit (ReLU)

1 # example plot for the relu activation function

Sigmoid Hidden Layer Activation Function

It is the same function used in the logistic regression classification algorithm.

The sigmoid activation function is calculated as follows:

1.0 / (1.0 + e^-x)

Where e is a mathematical constant, which is the base of the natural logarithm.

1 # example plot for the sigmoid activation function

We can see the familiar S-shape of the sigmoid activation function.

Plot of Inputs vs. Outputs for the Sigmoid Activation Function.

Tanh Hidden Layer Activation Function

The Tanh activation function is calculated as follows:

(e^x – e^-x) / (e^x + e^-x)

Where e is a mathematical constant that is the base of the natural logarithm.

1 # example plot for the tanh activation function

We can see the familiar S-shape of the Tanh activation function.

How to Choose a Hidden Layer Activation Function

It is most unusual to vary the activation function through a network model.

You can learn more about this problem in this tutorial:

A Gentle Introduction to the Rectified Linear Unit (ReLU)

Multilayer Perceptron (MLP): ReLU activation function.

How to Choose a Hidden Layer Activation Function

Activation for Output Layers

All feed-forward neural network models have an output layer.

Let’s take a closer look at each in turn.

Linear Output Activation Function

1 # example plot for the linear activation function

Sigmoid Output Activation Function

1 # example plot for the sigmoid activation function

Plot of Inputs vs. Outputs for the Sigmoid Activation Function.

Softmax Output Activation Function

The softmax function is calculated as follows:

Softmax Activation Function with Python

1 from numpy import exp

1 [0.09003057 0.66524096 0.24472847]

How to Choose an Output Activation Function

Specifically, the type of variable that is being predicted.

Regression: One node, linear activation.

Binary Classification: One node, sigmoid activation.

How to Choose an Output Layer Activation Function

Activation functions are a key part of neural network design.

Do you have any questions?

Develop Deep Learning Projects with Python!

Discover how in my new Ebook:

Deep Learning With Python

It covers end-to-end projects on topics like:

Multilayer Perceptrons, Convolutional Nets and Recurrent Neural Nets, and more...

Finally Bring Deep Learning To

Your Own Projects

SEE WHAT'S INSIDE

More On This Topic

A Gentle Introduction to the Rectified Linear Unit (ReLU)

How to Develop a CNN From Scratch for CIFAR-10 Photo…

You might also like