Activation Function
Activation Function
• Sigmoid
• Relu
• Tanh
Sigmoid
• The sigmoid activation function, also known as the logistic function, is historically
used in neural networks for binary classification problems. It has a range between
0 and 1, which makes it suitable for tasks where the goal is to produce an output
that can be interpreted as a probability.
• Sigmoid is most widely used in output layer only.
• The sigmoid formula is defined as:
• Graphical representation of sigmoid
What if we don’t use any Activation Function
• Without Activation Function • With Sigmoid Activation
Function
Activation function by using sigmoid.
Activation function by using sigmoid.
• Advantages of Sigmoid
• Binary Classification
• Smooth Gradient
• Disadvantages of Sigmoid
• Vanishing Gradient Problem.
Relu
• ReLU stands for Rectified Linear Unit, and it is an activation function
commonly used in artificial neural networks. The ReLU activation function is a
piecewise linear function that returns the input for positive values and zero
for negative values.
• It is used in Hidden layer.
• The relu formula is defined as: ReLU(x)=max(0,x)
• In terms of its output range:
• For Positive Inputs (x>0): The ReLU function returns the input value itself. In
this case, the output range is (0,+∞).
• For Negative Inputs (x≤0): The ReLU function returns zero. In this case, the
output is always 0.
Graphical representation of ReLu
Activation Function using ReLu
Activation Function using ReLu
• ADVANTAGE
1. It’s non linear function.
2. It’s not saturated in positive region.
3. Convergence is faster compare to tanh and sigmoid.
• DISADVANTAGE
• It’s not zero centered.
Tanh
• The hyperbolic tangent function, commonly known as tanh, is a
mathematical function used as an activation function in artificial
neural networks. The tanh function is an extension of the sigmoid
function.
• It is defined mathematically as =
• It ranges from -1 to 1
Graphical Representation of Tanh
Activation function using Tanh
• ADVANTAGE
• ZERO CENTERED
• SQUASHING EFFECT
• NON LINEARITY
• DISADVANTAGE
• Vanishing Gradient Problem
• Computationally More Expensive