0% found this document useful (0 votes)
90 views

Unit 3

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views

Unit 3

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 29

UNIT - III MULTILAYER PERCEPTRONS 9

The Perceptron-Training a Perceptron-Learning Boolean Functions-Multilayer Perceptron’s-


MLP as Universal Approximate Back propagation Algorithm
Perceptron in Machine Learning

In Machine Learning and Artificial Intelligence, Perceptron is the most commonly used term
for all folks. It is the primary step to learn Machine Learning and Deep Learning
technologies, which consists of a set of weights, input values or scores, and a
threshold. Perceptron is a building block of an Artificial Neural Network. Initially, in the
mid of 19th century, Mr. Frank Rosenblatt invented the Perceptron for performing certain
calculations to detect input data capabilities or business intelligence. Perceptron is a linear
Machine Learning algorithm used for supervised learning for various binary classifiers. This
algorithm enables neurons to learn elements and processes them one by one during
preparation. In this tutorial, "Perceptron in Machine Learning," we will discuss in-depth
knowledge of Perceptron and its basic functions in brief. Let's start with the basic
introduction of Perceptron.

What is the Perceptron model in Machine Learning?

Perceptron is Machine Learning algorithm for supervised learning of various binary


classification tasks. Further, Perceptron is also understood as an Artificial Neuron or
neural network unit that helps to detect certain input data computations in business
intelligence.

Perceptron model is also treated as one of the best and simplest types of Artificial Neural
networks. However, it is a supervised learning algorithm of binary classifiers. Hence, we can
consider it as a single-layer neural network with four main parameters, i.e., input values,
weights and Bias, net sum, and an activation function.

What is Binary classifier in Machine Learning?

In Machine Learning, binary classifiers are defined as the function that helps in deciding
whether input data can be represented as vectors of numbers and belongs to some specific
class.

Binary classifiers can be considered as linear classifiers. In simple words, we can understand
it as a classification algorithm that can predict linear predictor function in terms of weight
and feature vectors.

Basic Components of Perceptron

Mr. Frank Rosenblatt invented the perceptron model as a binary classifier which contains
three main components. These are as follows:

o Input Nodes or Input Layer:


This is the primary component of Perceptron which accepts the initial data into the system for
further processing. Each input node contains a real numerical value.

o Wight and Bias:

Weight parameter represents the strength of the connection between units. This is another
most important parameter of Perceptron components. Weight is directly proportional to the
strength of the associated input neuron in deciding the output. Further, Bias can be considered
as the line of intercept in a linear equation.

o Activation Function:

These are the final and important components that help to determine whether the neuron will
fire or not. Activation Function can be considered primarily as a step function.

Types of Activation functions:

o Sign function
o Step function, and
o Sigmoid function

The data scientist uses the activation function to take a subjective decision based on various
problem statements and forms the desired outputs. Activation function may differ (e.g., Sign,
Step, and Sigmoid) in perceptron models by checking whether the learning process is slow or
has vanishing or exploding gradients.

How does Perceptron work?

In Machine Learning, Perceptron is considered as a single-layer neural network that consists


of four main parameters named input values (Input nodes), weights and Bias, net sum, and an
activation function. The perceptron model begins with the multiplication of all input values
and their weights, then adds these values together to create the weighted sum. Then this
weighted sum is applied to the activation function 'f' to obtain the desired output. This
activation function is also known as the step function and is represented by 'f'.

This step function or Activation function plays a vital role in ensuring that output is mapped
between required values (0,1) or (-1,1). It is important to note that the weight of input is
indicative of the strength of a node. Similarly, an input's bias value gives the ability to shift
the activation function curve up or down.

Perceptron model works in two important steps as follows:

Step-1
In the first step first, multiply all input values with corresponding weight values and then add
them to determine the weighted sum. Mathematically, we can calculate the weighted sum as
follows:

∑wi*xi = x1*w1 + x2*w2 +…wn*xn

Add a special term called bias 'b' to this weighted sum to improve the model's performance.

∑wi*xi + b

Step-2

In the second step, an activation function is applied with the above-mentioned weighted sum,
which gives us output either in binary form or a continuous value as follows:

Y = f(∑wi*xi + b)

Types of Perceptron Models

Based on the layers, Perceptron models are divided into two types. These are as follows:

1. Single-layer Perceptron Model


2. Multi-layer Perceptron model

Single Layer Perceptron Model:

This is one of the easiest Artificial neural networks (ANN) types. A single-layered perceptron
model consists feed-forward network and also includes a threshold transfer function inside
the model. The main objective of the single-layer perceptron model is to analyze the linearly
separable objects with binary outcomes.

In a single layer perceptron model, its algorithms do not contain recorded data, so it begins
with inconstantly allocated input for weight parameters. Further, it sums up all inputs
(weight). After adding all inputs, if the total sum of all inputs is more than a pre-determined
value, the model gets activated and shows the output value as +1.

If the outcome is same as pre-determined or threshold value, then the performance of this
model is stated as satisfied, and weight demand does not change. However, this model
consists of a few discrepancies triggered when multiple weight inputs values are fed into the
model. Hence, to find desired output and minimize errors, some changes should be necessary
for the weights input.

"Single-layer perceptron can learn only linearly separable patterns."

Multi-Layered Perceptron Model:

Like a single-layer perceptron model, a multi-layer perceptron model also has the same
model structure but has a greater number of hidden layers.
The multi-layer perceptron model is also known as the Backpropagation algorithm, which
executes in two stages as follows:

o Forward Stage: Activation functions start from the input layer in the forward stage
and terminate on the output layer.
o Backward Stage: In the backward stage, weight and bias values are modified as per
the model's requirement. In this stage, the error between actual output and demanded
originated backward on the output layer and ended on the input layer.

Hence, a multi-layered perceptron model has considered as multiple artificial neural networks
having various layers in which activation function does not remain linear, similar to a single
layer perceptron model. Instead of linear, activation function can be executed as sigmoid,
TanH, ReLU, etc., for deployment.

A multi-layer perceptron model has greater processing power and can process linear and non-
linear patterns. Further, it can also implement logic gates such as AND, OR, XOR, NAND,
NOT, XNOR, NOR.

Advantages of Multi-Layer Perceptron:

o A multi-layered perceptron model can be used to solve complex non-linear problems.


o It works well with both small and large input data.
o It helps us to obtain quick predictions after the training.
o It helps to obtain the same accuracy ratio with large as well as small data.

Disadvantages of Multi-Layer Perceptron:

o In Multi-layer perceptron, computations are difficult and time-consuming.


o In multi-layer Perceptron, it is difficult to predict how much the dependent variable
affects each independent variable.
o The model functioning depends on the quality of the training.

Perceptron Function

Perceptron function ''f(x)'' can be achieved as output by multiplying the input 'x' with the
learned weight coefficient 'w'.

Mathematically, we can express it as follows:

f(x)=1; if w.x+b>0

otherwise, f(x)=0

o 'w' represents real-valued weights vector


o 'b' represents the bias
o 'x' represents a vector of input x values.

Characteristics of Perceptron
The perceptron model has the following characteristics.

1. Perceptron is a machine learning algorithm for supervised learning of binary


classifiers.
2. In Perceptron, the weight coefficient is automatically learned.
3. Initially, weights are multiplied with input features, and the decision is made whether
the neuron is fired or not.
4. The activation function applies a step rule to check whether the weight function is
greater than zero.
5. The linear decision boundary is drawn, enabling the distinction between the two
linearly separable classes +1 and -1.
6. If the added sum of all input values is more than the threshold value, it must have an
output signal; otherwise, no output will be shown.

Limitations of Perceptron Model

A perceptron model has limitations as follows:

o The output of a perceptron can only be a binary number (0 or 1) due to the hard limit
transfer function.
o Perceptron can only be used to classify the linearly separable sets of input vectors. If
input vectors are non-linear, it is not easy to classify them properly.

Future of Perceptron

The future of the Perceptron model is much bright and significant as it helps to interpret data
by building intuitive patterns and applying them in the future. Machine learning is a rapidly
growing technology of Artificial Intelligence that is continuously evolving and in the
developing phase; hence the future of perceptron technology will continue to support and
facilitate analytical behavior in machines that will, in turn, add to the efficiency of computers.

The perceptron model is continuously becoming more advanced and working efficiently on
complex problems with the help of artificial neurons.

Multi-layer Perceptron
Multi-layer perception is also known as MLP. It is fully connected dense layers, which
transform any input dimension to the desired dimension. A multi-layer perception is a
neural network that has multiple layers. To create a neural network we combine neurons
together so that the outputs of some neurons are inputs of other neurons.
A gentle introduction to neural networks and TensorFlow can be found here:
 Neural Networks
 Introduction to TensorFlow
A multi-layer perceptron has one input layer and for each input, there is one neuron(or
node), it has one output layer with a single node for each output and it can have any number
of hidden layers and each hidden layer can have any number of nodes. A schematic diagram
of a Multi-Layer Perceptron (MLP) is depicted below.
In the multi-layer perceptron diagram above, we can see that there are three inputs and thus
three input nodes and the hidden layer has three nodes. The output layer gives two outputs,
therefore there are two output nodes. The nodes in the input layer take input and forward it
for further process, in the diagram above the nodes in the input layer forwards their output
to each of the three nodes in the hidden layer, and in the same way, the hidden layer
processes the information and passes it to the output layer.
Every node in the multi-layer perception uses a sigmoid activation function. The sigmoid
activation function takes real values as input and converts them to numbers between 0 and 1
using the sigmoid formula.

Backpropagation


Backpropagation is an algorithm that backpropagates the errors from the output nodes to the
input nodes. Therefore, it is simply referred to as the backward propagation of errors. It uses
in the vast applications of neural networks in data mining like Character recognition,
Signature verification, etc.

Neural Network:

Neural networks are an information processing paradigm inspired by the human nervous
system. Just like in the human nervous system, we have biological neurons in the same way
in neural networks we have artificial neurons, artificial neurons are mathematical functions
derived from biological neurons. The human brain is estimated to have about 10 billion
neurons, each connected to an average of 10,000 other neurons. Each neuron receives a signal
through a synapse, which controls the effect of the sign concerning on the neuron.
Back propagation:

Back propagation is a widely used algorithm for training feed forward neural networks. It
computes the gradient of the loss function with respect to the network weights. It is very
efficient, rather than naively directly computing the gradient concerning each weight. This
efficiency makes it possible to use gradient methods to train multi-layer networks and update
weights to minimize loss; variants such as gradient descent or stochastic gradient descent are
often used.
The back propagation algorithm works by computing the gradient of the loss function with
respect to each weight via the chain rule, computing the gradient layer by layer, and iterating
backward from the last layer to avoid redundant computation of intermediate terms in the
chain rule.

Features of Back propagation:

1. it is the gradient descent method as used in the case of simple perceptron network with
the differentiable unit.
2. it is different from other networks in respect to the process by which the weights are
calculated during the learning period of the network.
3. training is done in the three stages :
 the feed-forward of input training pattern
 the calculation and backpropagation of the error
 updation of the weight
Working of Backpropagation:
Neural networks use supervised learning to generate output vectors from input vectors that
the network operates on. It Compares generated output to the desired output and generates an
error report if the result does not match the generated output vector. Then it adjusts the
weights according to the bug report to get your desired output.
Back propagation Algorithm:

Step 1: Inputs X, arrive through the pre connected path.


Step 2: The input is modelled using true weights W. Weights are usually chosen randomly.
Step 3: Calculate the output of each neuron from the input layer to the hidden layer to the
output layer.
Step 4: Calculate the error in the outputs
Backpropagation Error= Actual Output – Desired Output
Step 5: From the output layer, go back to the hidden layer to adjust the weights to reduce the
error.
Step 6: Repeat the process until the desired output is achieved.

Parameters :
 x = inputs training vector x=(x1,x2,…………xn).
 t = target vector t=(t1,t2……………tn).
 δk = error at output unit.
 δj = error at hidden layer.
 α = learning rate.
 V0j = bias of hidden unit j.
Training Algorithm :
Step 1: Initialize weight to small random values.
Step 2: While the stepsstopping condition is to be false do step 3 to 10.
Step 3: For each training pair do step 4 to 9 (Feed-Forward).
Step 4: Each input unit receives the signal unit and transmitsthe signal xi signal to all the
units.
Step 5 : Each hidden unit Zj (z=1 to a) sums its weighted input signal to calculate its net
input
zinj = v0j + Σxivij ( i=1 to n)
Applying activation function zj = f(zinj) and sends this signals to all units in the layer
about i.e output units
For each output l=unit yk = (k=1 to m) sums its weighted input signals.
yink = w0k + Σ ziwjk (j=1 to a)
and applies its activation function to calculate the output signals.
yk = f(yink)
Backpropagation Error :
Step 6: Each output unit yk (k=1 to n) receives a target pattern corresponding to an input
pattern then error is calculated as:
δk = ( tk – yk ) + yink
Step 7: Each hidden unit Zj (j=1 to a) sums its input from all units in the layer above
δinj = Σ δj wjk
The error information term is calculated as :
δj = δinj + zinj
Updation of weight and bias :
Step 8: Each output unit yk (k=1 to m) updates its bias and weight (j=1 to a). The weight
correction term is given by :
Δ wjk = α δk zj
and the bias correction term is given by Δwk = α δk.
therefore wjk(new) = wjk(old) + Δ wjk
w0k(new) = wok(old) + Δ wok
for each hidden unit zj (j=1 to a) update its bias and weights (i=0 to n) the weight
connection term
Δ vij = α δj xi
and the bias connection on term
Δ v0j = α δj
Therefore vij(new) = vij(old) + Δvij
v0j(new) = v0j(old) + Δv0j
Step 9: Test the stopping condition. The stopping condition can be the minimization of error,
number of epochs.

Need for Back propagation:

Backpropagation is “backpropagation of errors” and is very useful for training neural


networks. It’s fast, easy to implement, and simple. Backpropagation does not require any
parameters to be set, except the number of inputs. Backpropagation is a flexible method
because no prior knowledge of the network is required.

Types of Back propagation

There are two types of backpropagation networks.


 Static back propagation: Static backpropagation is a network designed to map static
inputs for static outputs. These types of networks are capable of solving static
classification problems such as OCR (Optical Character Recognition).
 Recurrent backpropagation: Recursive backpropagation is another network used for
fixed-point learning. Activation in recurrent backpropagation is feed-forward until a fixed
value is reached. Static backpropagation provides an instant mapping, while recurrent
backpropagation does not provide an instant mapping.
Advantages:

 It is simple, fast, and easy to program.


 Only numbers of the input are tuned, not any other parameter.
 It is Flexible and efficient.
 No need for users to learn any special functions.

Disadvantages:

 It is sensitive to noisy data and irregularities. Noisy data can lead to inaccurate results.
 Performance is highly dependent on input data.
 Spending too much time training.
 The matrix-based approach is preferred over a mini-batch.
Let’s understand how errors are calculated and weights are updated in backpropagation
networks(BPNs).

The network in the above figure is a simple multi-layer feed-forward network or


backpropagation network. It contains three layers, the input layer with two neurons x 1 and
x2, the hidden layer with two neurons z 1 and z2 and the output layer with one neuron y in.
Now let’s write down the weights and bias vectors for each neuron.
Note: The weights are taken randomly.
Input layer: i/p – [x1 x2] = [0 1]
Here since it is the input layer only the input values are present.
Hidden layer: z1 – [v11 v21 v01] = [0.6 -0.1 03]
Here v11 refers to the weight of first input x 1 on z1, v21 refers to the weight of second input
x2 on z1 and v01 refers to the bias value on z 1.
z2 – [v12 v22 v02] = [-0.3 0.4 0.5]
Here v12 refers to the weight of first input x 1 on z2, v22 refers to the weight of second input
x2 on z2 and v02 refers to the bias value on z 2.
Output layer: yin – [w11 w21 w01] = [0.4 0.1 -0.2]
Here w11 refers to the weight of first neuron z 1 in a hidden layer on y in, w21 refers to the
weight of second neuron z 2 in a hidden layer on y in and w01 refers to the bias value on y in.
Let’s consider three variables, k which refers to the neurons in the output layer, ‘j’ which
refers to the neurons in the hidden layer and ‘i’ which refers to the neurons in the input
layer.
Therefore,
k=1
j = 1, 2(meaning first neuron and second neuron in hidden layer)
i = 1, 2(meaning first and second neuron in the input layer)
Below are some conditions to be followed in BPNs.
Conditions/Constraints:
1. In BPN, the activation function used should be differentiable.
2. The input for bias is always 1.
To proceed with the problem, let:
Target value, t = 1
Learning rate, α = 0.25
Activation function = Binary sigmoid function
Binary sigmoid function, f(x) = (1+e -x)-1 eq. (1)
And, f'(x) = f(x)[1-f(x)] eq. (2)
There are three steps to solve the problem:
1. Computing the output, y.
2. Backpropagation of errors, i.e., between output and hidden layer, hidden and input
layer.
3. Updating weights.

Step 1:

The value y is calculated by finding y in and applying the activation function.


yin is calculated as:
yin = w01 + z1*w11 + z2*w21 eq. (3)
Here, z1 and z2 are the values from hidden layer, calculated by finding z in1, zin2 and applying
activation function to them.
zin1 and zin2 are calculated as:
zin1 = v01 + x1*v11 + x2*v21 eq. (4)
zin2 = v02 + x1*v12 + x2*v22 eq. (5)
From (4)
zin1 = 0.3 + 0*0.6 + 1*(-0.1)
zin1 = 0.2
z1 = f(zin1) = (1+e-0.2)-1 From (1)
z1 = 0.5498
From (5)
zin2 = 0.5 + 0*(-0.3) + 1*0.4
zin2 = 0.9
z2 = f(zin2) = (1+e-0.9)-1 From (1)
z2 = 0.7109
From (3)
yin = (-0.2) + 0.5498*0.4 + 0.7109*0.1
yin = 0.0910
y = f(yin) = (1+e-0.0910)-1 From (1)
y = 0.5227
Here, y is not equal to the target ‘t’, which is 1. And we proceed to calculate the errors and
then update weights from them in order to achieve the target value.

Step 2:

(a) Calculating the error between output and hidden layer


Error between output and hidden layer is represented as δ k, where k represents the neurons
in output layer as mentioned above. The error is calculated as:
δk = (tk – yk) * f'(yink) eq. (6)
where, f'(yink) = f(yink)[1 – f(yink)] From (2)
Since k = 1 (Assumed above),
δ = (t – y) f'(yin) eq. (7)
where, f'(yin) = f(yin)[1 – f(yin)]
f'(yin) = 0.5227[1 – 0.5227]
f'(yin) = 0.2495
Therefore,
δ = (1 – 0.5227) * 0.2495 From (7)
δ = 0.1191, is the error
Note: (Target – Output) i.e., (t – y) is the error in the output not in the layer. Error in a
layer is contributed by different factors like weights and bias.
(b) Calculating the error between hidden and input layer
Error between hidden and input layer is represented as δ j, where j represents the number of
neurons in the hidden layer as mentioned above. The error is calculated as:
δj = δinj * f'(zinj) eq. (8)
where,
δinj = ∑k=1 to n (δk * wjk) eq. (9)
f'(zinj) = f(zinj)[1 – f(zinj)] eq. (10)
Since k = 1(Assumed above) eq. (9) becomes:
δinj = δ * wj1 eq. (11)
As j = 1, 2, we will have one error values for each neuron and total of 2 errors values.
δ1 = δin1 * f'(zin1) eq. (12), From (8)
δin1 = δ * w11 From (11)
δin1 = 0.1191 * 0.4 From weights vectors
δin1 = 0.04764
f'(zin1) = f(zin1)[1 – f(zin1)]
f'(zin1) = 0.5498[1 – 0.5498] As f(zin1) = z1
f'(zin1) = 0.2475
Substituting in (12)
δ1 = 0.04674 * 0.2475 = 0.0118
δ2 = δin2 * f'(zin2) eq. (13), From (8)
δin2 = δ * w21 From (11)
δin2 = 0.1191 * 0.1 From weights vectors
δin2 = 0.0119
f'(zin2) = f(zin2)[1 – f(zin2)]
f'(zin2) = 0.7109[1 – 0.7109] As f(zin2) = z2
f'(zin2) = 0.2055
Substituting in (13)
δ2 = 0.0119 * 0.2055 = 0.00245
The errors have been calculated, the weights have to be updated using these error values.

Step 3:

The formula for updating weights for output layer is:


wjk(new) = wjk(old) + Δwjk eq. (14)
where, Δwjk = α * δk * zj eq. (15)
Since k = 1, (15) becomes:
Δwjk = α * δ * zi eq. (16)
The formula for updating weights for hidden layer is:
vij(new) = vij(old) + Δvij eq. (17)
where, Δvi = α * δj * xi eq. (18)
From (14) and (16)
w11(new) = w11(old) + Δw11 = 0.4 + α * δ * z1 = 0.4 + 0.25 * 0.1191 * 0.5498 = 0.4164
w21(new) = w21(old) + Δw21 = 0.1 + α * δ * z2 = 0.1 + 0.25 * 0.1191 * 0.7109 = 0.12117
w01(new) = w01(old) + Δw01 = (-0.2) + α * δ * bias = (-0.2) + 0.25 * 0.1191 * 1 = -0.1709,
kindly note the 1 taken here is input considered for bias as per the conditions.
These are the updated weights of the output layer.
From (17) and (18)
v11(new) = v11(old) + Δv11 = 0.6 + α * δ1 * x1 = 0.6 + 0.25 * 0.0118 * 0 = 0.6
v21(new) = v21(old) + Δv21 = (-0.1) + α * δ1 * x2 = (-0.1) + 0.25 * 0.0118 * 1 = 0.00295
v01(new) = v01(old) + Δv01 = 0.3 + α * δ1 * bias = 0.3 + 0.25 * 0.0118 * 1 = 0.00295, kindly
note the 1 taken here is input considered for bias as per the conditions.
v12(new) = v12(old) + Δv12 = (-0.3) + α * δ2 * x1 = (-0.3) + 0.25 * 0.00245 * 0 = -0.3
v22(new) = v22(old) + Δv22 = 0.4 + α * δ2 * x2 = 0.4 + 0.25 * 0.00245 * 1 = 0.400612
v02(new) = v02(old) + Δv02 = 0.5 + α * δ2 * bias = 0.5 + 0.25 * 0.00245 * 1 = 0.500612,
kindly note the 1 taken here is input considered for bias as per the conditions.
These are all the updated weights of the hidden layer.
These three steps are repeated until the output ‘y’ is equal to the target ‘t’.
This is how the BPNs work. The backpropagation in BPN refers to that the error in the
present layer is used to update weights between the present and previous layer by
backpropagating the error values.
Search

Write

Sign up
Sign in

Neural Representation of AND, OR, NOT, XOR and XNOR Logic Gates (Perceptron
Algorithm)

While taking the Udacity Pytorch Course by Facebook, I found it difficult understanding how
the Perceptron works with Logic gates (AND, OR, NOT, and so on). I decided to check online
resources, but as of the time of writing this, there was really no explanation on how to go
about it. So after personal readings, I finally understood how to go about it, which is the
reason for this medium post.

Note: The purpose of this article is NOT to mathematically explain how the neural network
updates the weights, but to explain the logic behind how the values are being changed in
simple terms.

First, we need to know that the Perceptron algorithm states that:

Prediction (y`) = 1 if Wx+b > 0 and 0 if Wx+b ≤ 0

Also, the steps in this method are very similar to how Neural Networks learn, which is as
follows;

 Initialize weight values and bias

 Forward Propagate

 Check the error

 Backpropagate and Adjust weights and bias

 Repeat for all training examples

Now that we know the steps, let’s get up and running:

AND Gate

From our knowledge of logic gates, we know that an AND logic table is given by the diagram
below
AND Gate

The question is, what are the weights and bias for the AND perceptron?

First, we need to understand that the output of an AND gate is 1 only if both inputs (in this
case, x1 and x2) are 1. So, following the steps listed above;

Row 1

 From w1*x1+w2*x2+b, initializing w1, w2, as 1 and b as –1, we get;

x1(1)+x2(1)–1

 Passing the first row of the AND logic table (x1=0, x2=0), we get;

0+0–1 = –1

 From the Perceptron rule, if Wx+b≤0, then y`=0. Therefore, this row is correct, and no
need for Backpropagation.

Row 2

 Passing (x1=0 and x2=1), we get;

0+1–1 = 0

 From the Perceptron rule, if Wx+b≤0, then y`=0. This row is correct, as the output is 0 for
the AND gate.

 From the Perceptron rule, this works (for both row 1, row 2 and 3).

Row 4

 Passing (x1=1 and x2=1), we get;

1+1–1 = 1
 Again, from the perceptron rule, this is still valid.

Therefore, we can conclude that the model to achieve an AND gate, using the Perceptron
algorithm is;

x1+x2–1

OR Gate

OR Gate
From the diagram, the OR gate is 0 only if both inputs are 0.

Row 1

 From w1x1+w2x2+b, initializing w1, w2, as 1 and b as –1, we get;

x1(1)+x2(1)–1

 Passing the first row of the OR logic table (x1=0, x2=0), we get;

0+0–1 = –1

 From the Perceptron rule, if Wx+b≤0, then y`=0. Therefore, this row is correct.

Row 2

 Passing (x1=0 and x2=1), we get;

0+1–1 = 0

 From the Perceptron rule, if Wx+b <= 0, then y`=0. Therefore, this row is incorrect.

 So we want values that will make inputs x1=0 and x2=1 give y` a value of 1. If we
change w2 to 2, we have;

0+2–1 = 1

 From the Perceptron rule, this is correct for both the row 1 and 2.

Row 3

 Passing (x1=1 and x2=0), we get;

1+0–1 = 0

 From the Perceptron rule, if Wx+b <= 0, then y`=0. Therefore, this row is incorrect.
 Since it is similar to that of row 2, we can just change w1 to 2, we have;

2+0–1 = 1

 From the Perceptron rule, this is correct for both the row 1, 2 and 3.

Row 4

 Passing (x1=1 and x2=1), we get;

2+2–1 = 3

 Again, from the perceptron rule, this is still valid. Quite Easy!

Therefore, we can conclude that the model to achieve an OR gate, using the Perceptron
algorithm is;

2x1+2x2–1
NOT Gate

NOT Gate

From the diagram, the output of a NOT gate is the inverse of a single input. So, following the
steps listed above;

Row 1

 From w1x1+b, initializing w1 as 1 (since single input), and b as –1, we get;

x1(1)–1

 Passing the first row of the NOT logic table (x1=0), we get;

0–1 = –1

 From the Perceptron rule, if Wx+b≤0, then y`=0. This row is incorrect, as the output is 1
for the NOT gate.

 So we want values that will make input x1=0 to give y` a value of 1. If we change b to 1,
we have;

0+1 = 1

 From the Perceptron rule, this works.


Row 2

 Passing (x1=1), we get;

1+1 = 2

 From the Perceptron rule, if Wx+b > 0, then y`=1. This row is so incorrect, as the output
is 0 for the NOT gate.

 So we want values that will make input x1=1 to give y` a value of 0. If we change w1 to –
1, we have;

–1+1 = 0

 From the Perceptron rule, if Wx+b ≤ 0, then y`=0. Therefore, this works (for both row 1
and row 2).

Therefore, we can conclude that the model to achieve a NOT gate, using the Perceptron
algorithm is;

–x1+1

NOR Gate
NOR Gate

From the diagram, the NOR gate is 1 only if both inputs are 0.

Row 1

 From w1x1+w2x2+b, initializing w1 and w2 as 1, and b as –1, we get;

x1(1)+x2(1)–1

 Passing the first row of the NOR logic table (x1=0, x2=0), we get;

0+0–1 = –1

 From the Perceptron rule, if Wx+b≤0, then y`=0. This row is incorrect, as the output is 1
for the NOR gate.

 So we want values that will make input x1=0 and x2 = 0 to give y` a value of 1. If we
change b to 1, we have;

0+0+1 = 1

 From the Perceptron rule, this works.

Row 2

 Passing (x1=0, x2=1), we get;


0+1+1 = 2

 From the Perceptron rule, if Wx+b > 0, then y`=1. This row is incorrect, as the output is 0
for the NOR gate.

 So we want values that will make input x1=0 and x2 = 1 to give y` a value of 0. If we
change w2 to –1, we have;

0–1+1 = 0

 From the Perceptron rule, this is valid for both row 1 and row 2.

Row 3

 Passing (x1=1, x2=0), we get;

1+0+1 = 2

 From the Perceptron rule, if Wx+b > 0, then y`=1. This row is incorrect, as the output is 0
for the NOR gate.

 So we want values that will make input x1=0 and x2 = 1 to give y` a value of 0. If we
change w1 to –1, we have;

–1+0+1 = 0

 From the Perceptron rule, this is valid for both row 1, 2 and 3.

Row 4

 Passing (x1=1, x2=1), we get;

-1-1+1 = -1

 From the Perceptron rule, this still works.

Therefore, we can conclude that the model to achieve a NOR gate, using the Perceptron
algorithm is;
-x1-x2+1

NAND Gate
From the diagram, the NAND gate is 0 only if both inputs are 1.

Row 1

 From w1x1+w2x2+b, initializing w1 and w2 as 1, and b as -1, we get;

x1(1)+x2(1)-1

 Passing the first row of the NAND logic table (x1=0, x2=0), we get;

0+0-1 = -1

 From the Perceptron rule, if Wx+b≤0, then y`=0. This row is incorrect, as the output is 1
for the NAND gate.

 So we want values that will make input x1=0 and x2 = 0 to give y` a value of 1. If we
change b to 1, we have;

0+0+1 = 1

 From the Perceptron rule, this works.

Row 2

 Passing (x1=0, x2=1), we get;

0+1+1 = 2

 From the Perceptron rule, if Wx+b > 0, then y`=1. This row is also correct (for both row 2
and row 3).

Row 4

 Passing (x1=1, x2=1), we get;

1+1+1 = 3
 This is not the expected output, as the output is 0 for a NAND combination of x1=1 and
x2=1.

 Changing values of w1 and w2 to -1, and value of b to 2, we get;

-1-1+2 = 0

 It works for all rows.

Therefore, we can conclude that the model to achieve a NAND gate, using the Perceptron
algorithm is;

-x1-x2+2

XNOR Gate
XNOR Gate

Now that we are done with the necessary basic logic gates, we can combine them to give an
XNOR gate.

The boolean representation of an XNOR gate is;

x1x2 + x1`x2`

Where ‘`' means inverse.

From the expression, we can say that the XNOR gate consists of an AND gate (x1x2), a NOR
gate (x1`x2`), and an OR gate.

This means we will have to combine 3 perceptrons:

 AND (x1+x2–1)

 NOR (-x1-x2+1)
 OR (2x1+2x2–1)

XOR Gate

XOR Gate

The boolean representation of an XOR gate is;

x1x`2 + x`1x2

We first simplify the boolean expression


x`1x2 + x1x`2 + x`1x1 + x`2x2

x1(x`1 + x`2) + x2(x`1 + x`2)

(x1 + x2)(x`1 + x`2)

(x1 + x2)(x1x2)`

From the simplified expression, we can say that the XOR gate consists of an OR gate (x1 +
x2), a NAND gate (-x1-x2+1) and an AND gate (x1+x2–1.5).

This means we will have to combine 2 perceptrons:

 OR (2x1+2x2–1)

 NAND (-x1-x2+2)

 AND (x1+x2–1)

You might also like