Unit 3
Unit 3
In Machine Learning and Artificial Intelligence, Perceptron is the most commonly used term
for all folks. It is the primary step to learn Machine Learning and Deep Learning
technologies, which consists of a set of weights, input values or scores, and a
threshold. Perceptron is a building block of an Artificial Neural Network. Initially, in the
mid of 19th century, Mr. Frank Rosenblatt invented the Perceptron for performing certain
calculations to detect input data capabilities or business intelligence. Perceptron is a linear
Machine Learning algorithm used for supervised learning for various binary classifiers. This
algorithm enables neurons to learn elements and processes them one by one during
preparation. In this tutorial, "Perceptron in Machine Learning," we will discuss in-depth
knowledge of Perceptron and its basic functions in brief. Let's start with the basic
introduction of Perceptron.
Perceptron model is also treated as one of the best and simplest types of Artificial Neural
networks. However, it is a supervised learning algorithm of binary classifiers. Hence, we can
consider it as a single-layer neural network with four main parameters, i.e., input values,
weights and Bias, net sum, and an activation function.
In Machine Learning, binary classifiers are defined as the function that helps in deciding
whether input data can be represented as vectors of numbers and belongs to some specific
class.
Binary classifiers can be considered as linear classifiers. In simple words, we can understand
it as a classification algorithm that can predict linear predictor function in terms of weight
and feature vectors.
Mr. Frank Rosenblatt invented the perceptron model as a binary classifier which contains
three main components. These are as follows:
Weight parameter represents the strength of the connection between units. This is another
most important parameter of Perceptron components. Weight is directly proportional to the
strength of the associated input neuron in deciding the output. Further, Bias can be considered
as the line of intercept in a linear equation.
o Activation Function:
These are the final and important components that help to determine whether the neuron will
fire or not. Activation Function can be considered primarily as a step function.
o Sign function
o Step function, and
o Sigmoid function
The data scientist uses the activation function to take a subjective decision based on various
problem statements and forms the desired outputs. Activation function may differ (e.g., Sign,
Step, and Sigmoid) in perceptron models by checking whether the learning process is slow or
has vanishing or exploding gradients.
This step function or Activation function plays a vital role in ensuring that output is mapped
between required values (0,1) or (-1,1). It is important to note that the weight of input is
indicative of the strength of a node. Similarly, an input's bias value gives the ability to shift
the activation function curve up or down.
Step-1
In the first step first, multiply all input values with corresponding weight values and then add
them to determine the weighted sum. Mathematically, we can calculate the weighted sum as
follows:
Add a special term called bias 'b' to this weighted sum to improve the model's performance.
∑wi*xi + b
Step-2
In the second step, an activation function is applied with the above-mentioned weighted sum,
which gives us output either in binary form or a continuous value as follows:
Y = f(∑wi*xi + b)
Based on the layers, Perceptron models are divided into two types. These are as follows:
This is one of the easiest Artificial neural networks (ANN) types. A single-layered perceptron
model consists feed-forward network and also includes a threshold transfer function inside
the model. The main objective of the single-layer perceptron model is to analyze the linearly
separable objects with binary outcomes.
In a single layer perceptron model, its algorithms do not contain recorded data, so it begins
with inconstantly allocated input for weight parameters. Further, it sums up all inputs
(weight). After adding all inputs, if the total sum of all inputs is more than a pre-determined
value, the model gets activated and shows the output value as +1.
If the outcome is same as pre-determined or threshold value, then the performance of this
model is stated as satisfied, and weight demand does not change. However, this model
consists of a few discrepancies triggered when multiple weight inputs values are fed into the
model. Hence, to find desired output and minimize errors, some changes should be necessary
for the weights input.
Like a single-layer perceptron model, a multi-layer perceptron model also has the same
model structure but has a greater number of hidden layers.
The multi-layer perceptron model is also known as the Backpropagation algorithm, which
executes in two stages as follows:
o Forward Stage: Activation functions start from the input layer in the forward stage
and terminate on the output layer.
o Backward Stage: In the backward stage, weight and bias values are modified as per
the model's requirement. In this stage, the error between actual output and demanded
originated backward on the output layer and ended on the input layer.
Hence, a multi-layered perceptron model has considered as multiple artificial neural networks
having various layers in which activation function does not remain linear, similar to a single
layer perceptron model. Instead of linear, activation function can be executed as sigmoid,
TanH, ReLU, etc., for deployment.
A multi-layer perceptron model has greater processing power and can process linear and non-
linear patterns. Further, it can also implement logic gates such as AND, OR, XOR, NAND,
NOT, XNOR, NOR.
Perceptron Function
Perceptron function ''f(x)'' can be achieved as output by multiplying the input 'x' with the
learned weight coefficient 'w'.
f(x)=1; if w.x+b>0
otherwise, f(x)=0
Characteristics of Perceptron
The perceptron model has the following characteristics.
o The output of a perceptron can only be a binary number (0 or 1) due to the hard limit
transfer function.
o Perceptron can only be used to classify the linearly separable sets of input vectors. If
input vectors are non-linear, it is not easy to classify them properly.
Future of Perceptron
The future of the Perceptron model is much bright and significant as it helps to interpret data
by building intuitive patterns and applying them in the future. Machine learning is a rapidly
growing technology of Artificial Intelligence that is continuously evolving and in the
developing phase; hence the future of perceptron technology will continue to support and
facilitate analytical behavior in machines that will, in turn, add to the efficiency of computers.
The perceptron model is continuously becoming more advanced and working efficiently on
complex problems with the help of artificial neurons.
Multi-layer Perceptron
Multi-layer perception is also known as MLP. It is fully connected dense layers, which
transform any input dimension to the desired dimension. A multi-layer perception is a
neural network that has multiple layers. To create a neural network we combine neurons
together so that the outputs of some neurons are inputs of other neurons.
A gentle introduction to neural networks and TensorFlow can be found here:
Neural Networks
Introduction to TensorFlow
A multi-layer perceptron has one input layer and for each input, there is one neuron(or
node), it has one output layer with a single node for each output and it can have any number
of hidden layers and each hidden layer can have any number of nodes. A schematic diagram
of a Multi-Layer Perceptron (MLP) is depicted below.
In the multi-layer perceptron diagram above, we can see that there are three inputs and thus
three input nodes and the hidden layer has three nodes. The output layer gives two outputs,
therefore there are two output nodes. The nodes in the input layer take input and forward it
for further process, in the diagram above the nodes in the input layer forwards their output
to each of the three nodes in the hidden layer, and in the same way, the hidden layer
processes the information and passes it to the output layer.
Every node in the multi-layer perception uses a sigmoid activation function. The sigmoid
activation function takes real values as input and converts them to numbers between 0 and 1
using the sigmoid formula.
Backpropagation
Backpropagation is an algorithm that backpropagates the errors from the output nodes to the
input nodes. Therefore, it is simply referred to as the backward propagation of errors. It uses
in the vast applications of neural networks in data mining like Character recognition,
Signature verification, etc.
Neural Network:
Neural networks are an information processing paradigm inspired by the human nervous
system. Just like in the human nervous system, we have biological neurons in the same way
in neural networks we have artificial neurons, artificial neurons are mathematical functions
derived from biological neurons. The human brain is estimated to have about 10 billion
neurons, each connected to an average of 10,000 other neurons. Each neuron receives a signal
through a synapse, which controls the effect of the sign concerning on the neuron.
Back propagation:
Back propagation is a widely used algorithm for training feed forward neural networks. It
computes the gradient of the loss function with respect to the network weights. It is very
efficient, rather than naively directly computing the gradient concerning each weight. This
efficiency makes it possible to use gradient methods to train multi-layer networks and update
weights to minimize loss; variants such as gradient descent or stochastic gradient descent are
often used.
The back propagation algorithm works by computing the gradient of the loss function with
respect to each weight via the chain rule, computing the gradient layer by layer, and iterating
backward from the last layer to avoid redundant computation of intermediate terms in the
chain rule.
1. it is the gradient descent method as used in the case of simple perceptron network with
the differentiable unit.
2. it is different from other networks in respect to the process by which the weights are
calculated during the learning period of the network.
3. training is done in the three stages :
the feed-forward of input training pattern
the calculation and backpropagation of the error
updation of the weight
Working of Backpropagation:
Neural networks use supervised learning to generate output vectors from input vectors that
the network operates on. It Compares generated output to the desired output and generates an
error report if the result does not match the generated output vector. Then it adjusts the
weights according to the bug report to get your desired output.
Back propagation Algorithm:
Parameters :
x = inputs training vector x=(x1,x2,…………xn).
t = target vector t=(t1,t2……………tn).
δk = error at output unit.
δj = error at hidden layer.
α = learning rate.
V0j = bias of hidden unit j.
Training Algorithm :
Step 1: Initialize weight to small random values.
Step 2: While the stepsstopping condition is to be false do step 3 to 10.
Step 3: For each training pair do step 4 to 9 (Feed-Forward).
Step 4: Each input unit receives the signal unit and transmitsthe signal xi signal to all the
units.
Step 5 : Each hidden unit Zj (z=1 to a) sums its weighted input signal to calculate its net
input
zinj = v0j + Σxivij ( i=1 to n)
Applying activation function zj = f(zinj) and sends this signals to all units in the layer
about i.e output units
For each output l=unit yk = (k=1 to m) sums its weighted input signals.
yink = w0k + Σ ziwjk (j=1 to a)
and applies its activation function to calculate the output signals.
yk = f(yink)
Backpropagation Error :
Step 6: Each output unit yk (k=1 to n) receives a target pattern corresponding to an input
pattern then error is calculated as:
δk = ( tk – yk ) + yink
Step 7: Each hidden unit Zj (j=1 to a) sums its input from all units in the layer above
δinj = Σ δj wjk
The error information term is calculated as :
δj = δinj + zinj
Updation of weight and bias :
Step 8: Each output unit yk (k=1 to m) updates its bias and weight (j=1 to a). The weight
correction term is given by :
Δ wjk = α δk zj
and the bias correction term is given by Δwk = α δk.
therefore wjk(new) = wjk(old) + Δ wjk
w0k(new) = wok(old) + Δ wok
for each hidden unit zj (j=1 to a) update its bias and weights (i=0 to n) the weight
connection term
Δ vij = α δj xi
and the bias connection on term
Δ v0j = α δj
Therefore vij(new) = vij(old) + Δvij
v0j(new) = v0j(old) + Δv0j
Step 9: Test the stopping condition. The stopping condition can be the minimization of error,
number of epochs.
Disadvantages:
It is sensitive to noisy data and irregularities. Noisy data can lead to inaccurate results.
Performance is highly dependent on input data.
Spending too much time training.
The matrix-based approach is preferred over a mini-batch.
Let’s understand how errors are calculated and weights are updated in backpropagation
networks(BPNs).
Step 1:
Step 2:
Step 3:
Write
Sign up
Sign in
Neural Representation of AND, OR, NOT, XOR and XNOR Logic Gates (Perceptron
Algorithm)
While taking the Udacity Pytorch Course by Facebook, I found it difficult understanding how
the Perceptron works with Logic gates (AND, OR, NOT, and so on). I decided to check online
resources, but as of the time of writing this, there was really no explanation on how to go
about it. So after personal readings, I finally understood how to go about it, which is the
reason for this medium post.
Note: The purpose of this article is NOT to mathematically explain how the neural network
updates the weights, but to explain the logic behind how the values are being changed in
simple terms.
Also, the steps in this method are very similar to how Neural Networks learn, which is as
follows;
Forward Propagate
AND Gate
From our knowledge of logic gates, we know that an AND logic table is given by the diagram
below
AND Gate
The question is, what are the weights and bias for the AND perceptron?
First, we need to understand that the output of an AND gate is 1 only if both inputs (in this
case, x1 and x2) are 1. So, following the steps listed above;
Row 1
x1(1)+x2(1)–1
Passing the first row of the AND logic table (x1=0, x2=0), we get;
0+0–1 = –1
From the Perceptron rule, if Wx+b≤0, then y`=0. Therefore, this row is correct, and no
need for Backpropagation.
Row 2
0+1–1 = 0
From the Perceptron rule, if Wx+b≤0, then y`=0. This row is correct, as the output is 0 for
the AND gate.
From the Perceptron rule, this works (for both row 1, row 2 and 3).
Row 4
1+1–1 = 1
Again, from the perceptron rule, this is still valid.
Therefore, we can conclude that the model to achieve an AND gate, using the Perceptron
algorithm is;
x1+x2–1
OR Gate
OR Gate
From the diagram, the OR gate is 0 only if both inputs are 0.
Row 1
x1(1)+x2(1)–1
Passing the first row of the OR logic table (x1=0, x2=0), we get;
0+0–1 = –1
From the Perceptron rule, if Wx+b≤0, then y`=0. Therefore, this row is correct.
Row 2
0+1–1 = 0
From the Perceptron rule, if Wx+b <= 0, then y`=0. Therefore, this row is incorrect.
So we want values that will make inputs x1=0 and x2=1 give y` a value of 1. If we
change w2 to 2, we have;
0+2–1 = 1
From the Perceptron rule, this is correct for both the row 1 and 2.
Row 3
1+0–1 = 0
From the Perceptron rule, if Wx+b <= 0, then y`=0. Therefore, this row is incorrect.
Since it is similar to that of row 2, we can just change w1 to 2, we have;
2+0–1 = 1
From the Perceptron rule, this is correct for both the row 1, 2 and 3.
Row 4
2+2–1 = 3
Again, from the perceptron rule, this is still valid. Quite Easy!
Therefore, we can conclude that the model to achieve an OR gate, using the Perceptron
algorithm is;
2x1+2x2–1
NOT Gate
NOT Gate
From the diagram, the output of a NOT gate is the inverse of a single input. So, following the
steps listed above;
Row 1
x1(1)–1
Passing the first row of the NOT logic table (x1=0), we get;
0–1 = –1
From the Perceptron rule, if Wx+b≤0, then y`=0. This row is incorrect, as the output is 1
for the NOT gate.
So we want values that will make input x1=0 to give y` a value of 1. If we change b to 1,
we have;
0+1 = 1
1+1 = 2
From the Perceptron rule, if Wx+b > 0, then y`=1. This row is so incorrect, as the output
is 0 for the NOT gate.
So we want values that will make input x1=1 to give y` a value of 0. If we change w1 to –
1, we have;
–1+1 = 0
From the Perceptron rule, if Wx+b ≤ 0, then y`=0. Therefore, this works (for both row 1
and row 2).
Therefore, we can conclude that the model to achieve a NOT gate, using the Perceptron
algorithm is;
–x1+1
NOR Gate
NOR Gate
From the diagram, the NOR gate is 1 only if both inputs are 0.
Row 1
x1(1)+x2(1)–1
Passing the first row of the NOR logic table (x1=0, x2=0), we get;
0+0–1 = –1
From the Perceptron rule, if Wx+b≤0, then y`=0. This row is incorrect, as the output is 1
for the NOR gate.
So we want values that will make input x1=0 and x2 = 0 to give y` a value of 1. If we
change b to 1, we have;
0+0+1 = 1
Row 2
From the Perceptron rule, if Wx+b > 0, then y`=1. This row is incorrect, as the output is 0
for the NOR gate.
So we want values that will make input x1=0 and x2 = 1 to give y` a value of 0. If we
change w2 to –1, we have;
0–1+1 = 0
From the Perceptron rule, this is valid for both row 1 and row 2.
Row 3
1+0+1 = 2
From the Perceptron rule, if Wx+b > 0, then y`=1. This row is incorrect, as the output is 0
for the NOR gate.
So we want values that will make input x1=0 and x2 = 1 to give y` a value of 0. If we
change w1 to –1, we have;
–1+0+1 = 0
From the Perceptron rule, this is valid for both row 1, 2 and 3.
Row 4
-1-1+1 = -1
Therefore, we can conclude that the model to achieve a NOR gate, using the Perceptron
algorithm is;
-x1-x2+1
NAND Gate
From the diagram, the NAND gate is 0 only if both inputs are 1.
Row 1
x1(1)+x2(1)-1
Passing the first row of the NAND logic table (x1=0, x2=0), we get;
0+0-1 = -1
From the Perceptron rule, if Wx+b≤0, then y`=0. This row is incorrect, as the output is 1
for the NAND gate.
So we want values that will make input x1=0 and x2 = 0 to give y` a value of 1. If we
change b to 1, we have;
0+0+1 = 1
Row 2
0+1+1 = 2
From the Perceptron rule, if Wx+b > 0, then y`=1. This row is also correct (for both row 2
and row 3).
Row 4
1+1+1 = 3
This is not the expected output, as the output is 0 for a NAND combination of x1=1 and
x2=1.
-1-1+2 = 0
Therefore, we can conclude that the model to achieve a NAND gate, using the Perceptron
algorithm is;
-x1-x2+2
XNOR Gate
XNOR Gate
Now that we are done with the necessary basic logic gates, we can combine them to give an
XNOR gate.
x1x2 + x1`x2`
From the expression, we can say that the XNOR gate consists of an AND gate (x1x2), a NOR
gate (x1`x2`), and an OR gate.
AND (x1+x2–1)
NOR (-x1-x2+1)
OR (2x1+2x2–1)
XOR Gate
XOR Gate
x1x`2 + x`1x2
(x1 + x2)(x1x2)`
From the simplified expression, we can say that the XOR gate consists of an OR gate (x1 +
x2), a NAND gate (-x1-x2+1) and an AND gate (x1+x2–1.5).
OR (2x1+2x2–1)
NAND (-x1-x2+2)
AND (x1+x2–1)