0% found this document useful (0 votes)
12 views

3. Activation Function

Uploaded by

Bhakti Vankhde
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

3. Activation Function

Uploaded by

Bhakti Vankhde
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Activation Functions

What is the Activation Function?

• An activation function decides whether a neuron should be activated


or not.
• This means that it will decide whether the neuron’s input to the
network is important or not in the process of prediction using
mathematical operations.
• In the process of building a neural network, one of the choices you
get to make is what Activation function to use in the hidden layer as
well as at the output layer of the network.
Types of Activation Functions

1. Binary step function/Threshold function


2. Linear
3. Sigmoid
4. Hyperbolic Tangent Function(Tanh)
5. ReLU
6. Leaky ReLU
7. Parameterised ReLU
8. Exponential Linear Unit
9. Swish
10. Softmax
• Binary step function/Threshold function: Binary step function is a
threshold-based activation function which means after a certain
threshold neuron is activated and below said threshold neuron is
deactivated. In the above graph, the threshold is zero.
• Linear Activation Function: The linear activation function, also
known as “no activation,” or “identity function” (multiplied*1.0), is
where the activation is proportional to the input. The function
doesn’t do anything to the weighted sum of the input, it simply spits
out the value it was given.
• Sigmoid Activation Function: The sigmoid activation function is also
called the logistic function. It is the same function used in the logistic
regression classification algorithm. The function takes any real value
as input and outputs the value in the range of 0 to 1. Also, as the
sigmoid is a non-linear function, the output of this unit would be a
non-linear function of the weighted sum of inputs.
• Hyperbolic Tangent Function(Tanh): The biggest advantage of the
tanh function is that it produces a zero-centered output, thereby
supporting the backpropagation process. The tanh function has been
mostly used in recurrent neural networks for natural language
processing and speech recognition tasks.
• ReLU Activation Function: ReLU stands for Rectified Linear Unit. The
main advantage of using the ReLU function over other activation
functions is that it does not activate all the neurons at the same time.
The ReLU AF for short is a piecewise linear function that will output
the input directly if it is positive, otherwise, it will output zero.
• Leaky ReLU Activation Function: Leaky ReLU, is a type of activation
function based on a ReLU, but it has a small slope for negative values
instead of a flat slope.
A leaky ReLU layer performs a threshold operation, where any input
a value less than zero is multiplied by a fixed scalar.
• Parameterized ReLU: A parametric Linear Unit, Or PReLU, is an
activation function that generalizes the traditional rectified unit with
a slope for negative values.
• Exponential Linear Unit: The Exponential Unit (ELU) is an activation
function for neural networks. In contrast to ReLUs, ELUs have negative
values which allow them to push mean unit activations closer to zero
like batch normalization but with lower computational complexity.
An ELU Activation layer performs the identity operation on positive
inputs and an exponential nonlinearity on negative inputs.
• Swish Activation Function: Swish is an activation function, f ( x ) = x ⋅
sigmoid ( β x ), where a learnable parameter. Nearly all
implementations do not use the learnable parameter, in which case
the activation function x σ ( x ) ("Swish-1").
• Softmax Activation Function: Softmax extends this idea into a
multiclass world. That is, Softmax assigns decimal probabilities to
each class in multi-class problems. That decimal probabilities must
add up to 1.0. This additional constraint helps training converge more
quickly than it otherwise would.
Thank You…
Mr Abhay Chougule

You might also like