0% found this document useful (0 votes)
2 views

Unit 2_Activation Function_PR

The document discusses Artificial Neural Networks (ANN), focusing on the structure of layers and the role of activation functions in processing input data. It explains various activation functions, their pros and cons, and the importance of non-linearity in neural networks to avoid issues like vanishing and exploding gradients. Additionally, it provides guidelines for selecting appropriate activation functions for different scenarios in neural network design.

Uploaded by

archanashrma6266
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Unit 2_Activation Function_PR

The document discusses Artificial Neural Networks (ANN), focusing on the structure of layers and the role of activation functions in processing input data. It explains various activation functions, their pros and cons, and the importance of non-linearity in neural networks to avoid issues like vanishing and exploding gradients. Additionally, it provides guidelines for selecting appropriate activation functions for different scenarios in neural network design.

Uploaded by

archanashrma6266
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

ANN and Activation Function

Dr. Preeti Rai


ANN
• The input layer.
– Introduces input values into the
network.
– No activation function or other
processing.
• The hidden layer(s).
– Perform classification of features
– Two hidden layers are sufficient
to solve any problem
– Features imply more layers may
be better
• The output layer.
– Functionally just like the hidden
layers
– Outputs are passed on to the
world outside the neural
network.
Single Layer Perceptron

• The perceptron is a single processing unit of any neural network. Frank


Rosenblatt first proposed in 1958 is a simple neuron which is used to classify its
input into one or two categories. Perceptron is a linear classifier, and is used in
supervised learning. It helps to organize the given input data.
• A perceptron is a neural network unit that does a precise computation to detect
features in the input data. Perceptron is mainly used to classify the data into two
parts. Therefore, it is also known as Linear Binary Classifier
Perceptron
• The perceptron consists of 4 parts.
• Input value or One input layer: The input layer of the perceptron is made
of artificial input neurons and takes the initial data into the system for
further processing.
• Weights and Bias:
Weight: It represents the dimension or strength of the connection
between units. If the weight to node 1 to node 2 has a higher quantity,
then neuron 1 has a more considerable influence on the neuron.
Bias: It is the same as the intercept added in a linear equation. It is an
additional parameter which task is to modify the output along with the
weighted sum of the input to the other neuron.
• Net sum: It calculates the total sum.
• Activation Function: A neuron can be activated or not, is determined by
an activation function. The activation function calculates a weighted sum
and further adding bias with it to give the result.
Activation function
In artificial neurons inputs and weights are given
from which the weighted sum of input is
calculated, and then it is given to an activation
function that converts it into the output.
The input is fed to the input layer, the neurons
perform a linear transformation on this input
using the weights and biases.
x = (weight * input) + bias
An activation function is applied on the above
result.
Y= Activation ( Σ (weight * input) + bias )
Activation function helps a neural network to
learn complex relationships and patterns in data.
Now the question is what if we don’t use any
activation function and allow a neuron to give the
weighted sum of inputs as it is as the output.
In this case computation will be very difficult as
the weighted sum of input doesn’t have any
range and depending upon input it can take any
value. Hence one important use of the activation
function is to keep output restricted to a
particular range.
Another use of activation function is to add non-
linearity in data.
In Neural Network, output from the activation
function moves to the next hidden layer and the same
process is repeated. This forward movement of
information is known as the forward propagation.
Using the output from the forward propagation,
error is calculated. Based on this error value, the
weights and biases of the neurons are updated. This
process is known as back-propagation.
Non-linearity is important in neural networks
because linear activation functions are not enough
to form a universal function approximator.
Most of the real-world problems are very complex
we need non-linear activation functions in a neural
network.
Neural Network without non-linear activation
functions will be just a simple linear regression
model.
Vanishing gradient problem
In neural networks during back propagation, each
weight receives an update proportional to the partial
derivative of the error function. In some cases, this
derivative term is so small that it makes updates very
small. Especially in deep layers of the neural
network, the update is obtained by multiplication of
various partial derivatives.
If these partial derivatives are very small then the
overall update becomes very small and approaches
zero. In such a case, weights will not be able to
update and hence there will be slow or no
convergence. This problem is known as the
Vanishing gradient problem.
Exploding gradient problem.
Similarly, if the derivative term is very large
then updates will also be very large. In such a
case, the algorithm will overshoot the
minimum and won’t be able to converge. This
problem is known as the Exploding gradient
problem.
There are various methods to avoid these
problems. Choosing the appropriate activation
function is one of the them.
Various Activation function
1. Binary Step Function :
This activation function would be a threshold based
classifier means whether or not the neuron should
be activated based on the value from the linear
transformation.
In other words, if the input to the activation function
is greater than a threshold, then the neuron is
activated, else it is deactivated, means its output is
not considered for the next hidden layer. Let us look
at it mathematically:
f(x) = 1, when x>=0
= 0, when x<0
Pros-
1. Easy to compute.
2. Binary / Linear Classifier
Cons-
The gradient of the function is zero, the weights
and biases don’t update. The problem with a step function is
that it does not allow multi-value outputs—for example, it cannot support
classifying the inputs into one of several categories
2. Linear Function
The problem with the binary step function is
that - the gradient of the function became
zero since there is no component of x is used
in the binary step function.
Then instead of a binary function, we can use
a linear function. We can define the function
as-
Pros-
1. Easy to compute.
2. the gradient here does not become zero, but it is a constant, so
the weights and biases will be updated during the back
propagation process but the updating factor would be the same.
Cons-
The neural network will not really improve the error since the
gradient is the same for every iteration. The network will not be able
to train well and capture the complex patterns from the data.

Uses : Linear activation function is used at just one place i.e. output layer
3. Sigmoid Function-
Sigmoid is an ‘S’ shaped non-linear
mathematical function whose formula is-
Pros-
1. Sigmoid Function is continuous and differentiable.
2. It will limit output between 0 and 1
3. Very clear predictions for binary classification.
Cons-
1. It can cause the vanishing gradient problem.
2. It’s not centered around zero.
3. Computationally Expensive

Uses : Usually used in output layer of a binary


classification, where result is either 0 or 1, as value for
sigmoid function lies between 0 and 1 only so, result can
be predicted easily to be 1 if value is greater
than 0.5 and 0 otherwise
4. Relu-
Rectified linear Unit often called as just a
rectifier or relu is-
. Relu-

• RELU :- Stands for Rectified linear unit. It is the most widely used
activation function. Chiefly implemented in hidden layers of Neural
network.
• • Equation :- A(x) = max(0,x). It gives an output x if x is positive
and 0 otherwise.
• • Value Range :- [0, inf)
• • Nature :- non-linear, which means we can easily
backpropagate the errors and have multiple layers of neurons being
activated by the ReLU function.
• • Uses :- ReLu is less computationally expensive than tanh and
sigmoid because it involves simpler mathematical operations. At a
time only a few neurons are activated making the network sparse
making it efficient and easy for computation.
Pros-
1. Easy to compute.
2. Does not cause vanishing gradient problem
3. As all neurons are not activated, this creates sparsity in
the network and hence it will be fast and efficient.
Cons-
1. Cause Exploding Gradient problem.
2. Not Zero Centered.
3. Can kill some neurons forever as it always gives 0 for
negative values.
Uses :- ReLu is less computationally expensive than tanh and
sigmoid because it involves simpler mathematical operations.
At a time only a few neurons are activated making the
network sparse making it efficient and easy for computation
7. Parameterized Relu-
In parameterized relu, instead of fixing a rate
for the negative axis, it is passed as a new
trainable parameter which network learns on
its own to achieve faster convergence.
Pros-
1. The network will learn the most appropriate value
of alpha on its own.
2. Does not cause vanishing gradient problem
Cons-
1. Difficult to compute.
2. Performance depends upon the problem.
Choosing the right Activation Function
1. Sigmoid functions and their combinations generally
work better in the case of classifiers.
2. Sigmoids and tanh functions are sometimes avoided
due to the vanishing gradient problem.
3. ReLU function is a general activation function and is
used in most cases these days.
4. If we encounter a case of dead neurons in our
networks the leaky ReLU function is the best choice.
5. Always keep in mind that ReLU function should only
be used in the hidden layers.
6. As a rule of thumb, you can begin with using ReLU
function and then move over to other activation
functions in case ReLU doesn’t provide with optimum
results.

You might also like