Unit 2_Activation Function_PR
Unit 2_Activation Function_PR
Uses : Linear activation function is used at just one place i.e. output layer
3. Sigmoid Function-
Sigmoid is an ‘S’ shaped non-linear
mathematical function whose formula is-
Pros-
1. Sigmoid Function is continuous and differentiable.
2. It will limit output between 0 and 1
3. Very clear predictions for binary classification.
Cons-
1. It can cause the vanishing gradient problem.
2. It’s not centered around zero.
3. Computationally Expensive
• RELU :- Stands for Rectified linear unit. It is the most widely used
activation function. Chiefly implemented in hidden layers of Neural
network.
• • Equation :- A(x) = max(0,x). It gives an output x if x is positive
and 0 otherwise.
• • Value Range :- [0, inf)
• • Nature :- non-linear, which means we can easily
backpropagate the errors and have multiple layers of neurons being
activated by the ReLU function.
• • Uses :- ReLu is less computationally expensive than tanh and
sigmoid because it involves simpler mathematical operations. At a
time only a few neurons are activated making the network sparse
making it efficient and easy for computation.
Pros-
1. Easy to compute.
2. Does not cause vanishing gradient problem
3. As all neurons are not activated, this creates sparsity in
the network and hence it will be fast and efficient.
Cons-
1. Cause Exploding Gradient problem.
2. Not Zero Centered.
3. Can kill some neurons forever as it always gives 0 for
negative values.
Uses :- ReLu is less computationally expensive than tanh and
sigmoid because it involves simpler mathematical operations.
At a time only a few neurons are activated making the
network sparse making it efficient and easy for computation
7. Parameterized Relu-
In parameterized relu, instead of fixing a rate
for the negative axis, it is passed as a new
trainable parameter which network learns on
its own to achieve faster convergence.
Pros-
1. The network will learn the most appropriate value
of alpha on its own.
2. Does not cause vanishing gradient problem
Cons-
1. Difficult to compute.
2. Performance depends upon the problem.
Choosing the right Activation Function
1. Sigmoid functions and their combinations generally
work better in the case of classifiers.
2. Sigmoids and tanh functions are sometimes avoided
due to the vanishing gradient problem.
3. ReLU function is a general activation function and is
used in most cases these days.
4. If we encounter a case of dead neurons in our
networks the leaky ReLU function is the best choice.
5. Always keep in mind that ReLU function should only
be used in the hidden layers.
6. As a rule of thumb, you can begin with using ReLU
function and then move over to other activation
functions in case ReLU doesn’t provide with optimum
results.