Fundamentals of Neural Networks-2
Fundamentals of Neural Networks-2
Neural networks have become one of the most influential technologies in the field of artificial
natural language processing, and autonomous systems. Their ability to model complex, non-
linear relationships in data has made them indispensable in solving problems that were previously
difficult to manage. We shall explore the foundational concepts of neural networks, beginning with
a discussion of the motivation behind their development, the historical evolution of neural
network research, and the underlying biological and artificial neuron models. We will also
introduce key notations and the neuron equation, which are essential for understanding how neural
networks operate.
Neural networks are inspired by the human brain's ability to learn from experience and
struggle with tasks that require pattern recognition, such as distinguishing between images of cats
and dogs, translating languages, or predicting stock prices. These tasks involve complex
relationships and patterns that are not easily captured by conventional algorithms, especially when
Neural networks address these challenges by learning directly from the data. They are designed
to automatically identify patterns and relationships in the input data without the need for
explicit programming. This capability makes them highly effective in tasks like classification,
regression, and clustering. For example, in image recognition, a neural network can be trained
1
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
One. Introduction to Neural Networks
images.
The development of neural networks has a rich history, beginning in the early 1940s. The concept
of a computational model based on the human brain was first proposed by Warren McCulloch
and Walter Pitts in 1943, who introduced the idea of a neuron as a binary threshold unit (This
simply means that a neuron is a simple model that activates (outputs 1) if the weighted sum of its
inputs exceeds a certain threshold, otherwise it stays inactive (outputs 0)). This early model laid
the foundation for subsequent research into artificial intelligence and machine learning.
In the 1950s and 1960s, Frank Rosenblatt developed the perceptron, an early type of neural
network capable of learning from data. Despite its initial success, the perceptron was limited by
its inability to solve non-linearly separable problems (non-linearly separable problems are those
where data points cannot be separated into distinct classes by a straight line or a simple linear
boundary), as highlighted by Marvin Minsky and Seymour Papert in their 1969 book, Perceptrons.
The field experienced a resurgence in the 1980s with the development of backpropagation, a
method for training multi-layer neural networks, introduced by David E. Rumelhart, Geoffrey E.
Hinton, and Ronald J. Williams. Backpropagation addressed the limitations of earlier models by
enabling the training of deeper networks, leading to significant improvements in performance. This
period marked the beginning of the modern era of neural networks, with continuous advancements
2
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
One. Introduction to Neural Networks
The structure and function of neural networks are inspired by the biological neurons found in the
human brain. A biological neuron consists of three main components: the cell body (soma),
dendrites, and an axon. The dendrites receive electrical signals (inputs) from other neurons, which
are then processed in the cell body. If the combined signal exceeds a certain threshold, the neuron
fires, sending an electrical impulse down the axon to communicate with other neurons.
Summation of Inputs: Neurons receive multiple inputs, which are summed to determine whether
Activation Function: The firing of the neuron is dependent on whether the summed inputs exceed
a certain threshold.
Propagation of Signal: Once a neuron fires, the signal is transmitted to other neurons, forming
complex networks.
This biological model serves as the inspiration for artificial neurons, which aim to replicate these
Structure of a neuron
3
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
One. Introduction to Neural Networks
A biological neuron is a specialized cell that serves as the fundamental building block of the
nervous system. It is designed to receive, process, and transmit information through electrical and
chemical signals. Understanding the different components of a biological neuron is essential for
comprehending how the brain and nervous system function. Below is a detailed description of the
Structure: The cell body, or soma, is the central part of the neuron and contains the nucleus and
Functions:
Nucleus: The nucleus houses the neuron's genetic material (DNA) and is responsible for regulating
the cell's activities, including growth, metabolism, and protein synthesis. The nucleus plays a
Cytoplasm: The cytoplasm within the soma contains various organelles such as mitochondria (for
energy production), ribosomes (for protein synthesis), and the endoplasmic reticulum (for protein
and lipid synthesis). These organelles work together to sustain the neuron's metabolic needs.
Integration Center: The soma integrates incoming signals from the dendrites and determines
whether the neuron will generate an action potential (a nerve impulse). It acts as a decision-making
b. Dendrites
Structure: Dendrites are branched, tree-like extensions from the cell body. They are typically short
and highly branched, providing a large surface area for receiving signals from other neurons.
4
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
One. Introduction to Neural Networks
Functions:
Signal Reception: Dendrites are the primary sites for receiving chemical signals (neurotransmitters)
from the axon terminals of other neurons. These signals are received at specialized structures called
synapses, where the dendrites form connections with the presynaptic neuron.
Graded Potentials: When neurotransmitters bind to receptors on the dendrites, they cause changes
in the membrane potential of the dendrite. These changes are known as graded potentials, which
can be either excitatory (increasing the likelihood of an action potential) or inhibitory (decreasing
Signal Transmission: The graded potentials generated at the dendrites are transmitted toward the
soma. If the cumulative effect of these potentials reaches a certain threshold at the axon hillock,
c. Axon Hillock
Structure: The axon hillock is a specialized region located at the junction between the soma and
the axon. It is often cone-shaped and has a high concentration of voltage-gated ion channels.
Functions:
Action Potential Initiation: The axon hillock is the site where the decision to initiate an action
potential is made. It is highly sensitive to changes in the membrane potential due to its abundance
Trigger Zone: If the membrane potential at the axon hillock reaches the threshold level, these
channels open, initiating an action potential. The axon hillock is thus referred to as the "trigger
5
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
One. Introduction to Neural Networks
d. Axon
Structure: The axon is a long, slender projection that extends from the axon hillock to the axon
terminals. Axons can vary in length, from a few millimeters to over a meter in some neurons. The
axon is often covered by a myelin sheath, which is interrupted at intervals by nodes of Ranvier.
Functions:
Signal Conduction: The primary function of the axon is to conduct the action potential away from
the soma and toward the axon terminals. The action potential is a rapid, all-or-nothing electrical
Myelination: Many axons are wrapped in a myelin sheath, a fatty layer produced by glial cells
(oligodendrocytes in the central nervous system and Schwann cells in the peripheral nervous
system). Myelin insulates the axon, increasing the speed and efficiency of electrical signal
transmission.
Nodes of Ranvier: These are gaps in the myelin sheath along the axon. The action potential "jumps"
from one node to the next in a process called saltatory conduction, which significantly speeds up
signal transmission.
Structure: The axon terminals, also known as synaptic terminals or terminal boutons, are the distal
ends of the axon. They are often branched, forming multiple connections with target cells, and
6
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
One. Introduction to Neural Networks
Functions:
Signal Transmission: When the action potential reaches the axon terminals, it triggers the opening
of voltage-gated calcium channels. The influx of calcium ions causes synaptic vesicles to fuse with
the presynaptic membrane and release their neurotransmitters into the synaptic cleft.
Neurotransmitter Release: The neurotransmitters diffuse across the synaptic cleft and bind to
receptors on the postsynaptic membrane (usually the dendrites of the next neuron). This binding
Synaptic Communication: The axon terminals are critical for synaptic communication between
neurons, allowing the transmission of information across the nervous system. The precise control
of neurotransmitter release and receptor activation ensures accurate signal transmission and
processing.
f. Myelin Sheath
Structure: The myelin sheath is a multilayered, lipid-rich covering that surrounds the axon of many
neurons. It is produced by glial cells—Schwann cells in the peripheral nervous system (PNS) and
Functions:
Insulation: The myelin sheath acts as an electrical insulator, preventing the loss of electrical current
from the axon as the action potential travels along it. This insulation is essential for the rapid and
7
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
One. Introduction to Neural Networks
Increased Conduction Speed: The myelin sheath allows for saltatory conduction, where the action
potential jumps between the nodes of Ranvier. This significantly increases the speed at which
Support and Protection: Myelin not only enhances the speed of conduction but also provides
g. Nodes of Ranvier
Structure: The nodes of Ranvier are small, unmyelinated gaps in the myelin sheath that occur at
Functions:
Saltatory Conduction: The nodes of Ranvier are crucial for saltatory conduction, where the action
potential "jumps" from one node to the next. This process allows the action potential to propagate
Ion Exchange: The nodes contain a high density of voltage-gated sodium and potassium channels,
which are essential for the regeneration of the action potential as it travels down the axon.
h. Synapse
Structure: A synapse is the junction between two neurons, where the axon terminal of one neuron
(presynaptic neuron) meets the dendrite or soma of another neuron (postsynaptic neuron). The
synapse includes the presynaptic membrane, the synaptic cleft (the gap between the neurons), and
8
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
One. Introduction to Neural Networks
Functions:
Signal Transmission: The synapse is the site where electrical signals are converted into chemical
signals (neurotransmitters) and then back into electrical signals in the next neuron. This process
Neurotransmitter Release: When an action potential reaches the presynaptic terminal, it triggers
the release of neurotransmitters into the synaptic cleft. These neurotransmitters bind to receptors
Synaptic Plasticity: Synapses are dynamic structures that can strengthen or weaken over time, a
process known as synaptic plasticity. This ability to change is critical for learning, memory, and
adaptation.
i. Neurotransmitters
Structure: Neurotransmitters are chemical messengers stored in synaptic vesicles within the axon
terminals. They are released into the synaptic cleft in response to an action potential.
Functions:
Signal Transmission: Neurotransmitters are essential for transmitting signals across synapses.
They bind to specific receptors on the postsynaptic membrane, leading to either excitation or
Diversity of Function: Different neurotransmitters have different effects. For example, glutamate
9
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
One. Introduction to Neural Networks
Reuptake and Degradation: After their release and action, neurotransmitters are typically removed
from the synaptic cleft by reuptake into the presynaptic neuron or degradation by enzymes. This
k. Glial Cells
Structure: Glial cells are non-neuronal cells that provide support, insulation, and protection to
neurons. They include astrocytes, oligodendrocytes, Schwann cells, microglia, and ependymal
cells.
Functions:
Support: Astrocytes provide structural support to neurons, maintain the blood-brain barrier, and
Myelination: Oligodendrocytes in the CNS and Schwann cells in the PNS produce the myelin
Immune Defense: Microglia act as the immune cells of the CNS, removing debris and protecting
Cerebrospinal Fluid Production: Ependymal cells line the ventricles of the brain and produce
The information flow in a neuron, whether biological or artificial, involves a series of steps that
transform input signals into output signals. This process is crucial for the functioning of neural
10
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
One. Introduction to Neural Networks
networks, where multiple neurons work together to process information and make decisions.
Below is a detailed description of how information flows through a neuron, highlighting both the
A biological neuron is a specialized cell in the nervous system that transmits information through
electrical and chemical signals. The flow of information in a biological neuron can be broken down
Dendrites are tree-like extensions from the neuron's cell body (soma) that receive signals from
other neurons. These signals arrive at the synapses, which are junctions where the axons of other
The incoming signals are typically in the form of neurotransmitters, chemical messengers released
by the presynaptic neuron (the neuron sending the signal). These neurotransmitters bind to receptor
The binding of neurotransmitters causes changes in the electrical potential across the dendrite's
membrane, leading to the generation of graded potentials. These are small changes in the
11
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
One. Introduction to Neural Networks
The cell body integrates the graded potentials received from the dendrites. If the cumulative signal
strength from all inputs exceeds a certain threshold, the neuron will "fire" an action potential. The
soma acts as the decision-making center, determining whether the incoming signals are strong
The signals that reach the soma are summed (spatial summation) or integrated over time (temporal
summation). If the combined signal (the depolarization of the cell membrane) reaches the threshold
level, it initiates an action potential. If it does not reach the threshold, the signal dissipates, and no
The axon hillock, located at the junction of the soma and the axon, is the critical region where the
action potential is initiated. If the membrane potential at the axon hillock reaches the threshold,
voltage-gated ion channels open, allowing ions to flow across the membrane.
The action potential is an all-or-nothing electrical signal that propagates along the axon. Once
initiated, the action potential travels down the length of the axon without losing strength.
The action potential travels along the axon, a long, slender projection of the neuron, to the axon
terminals. The movement is facilitated by the opening and closing of voltage-gated sodium and
potassium channels along the axon membrane, which create a wave of electrical depolarization.
12
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
One. Introduction to Neural Networks
Many axons are covered by a myelin sheath, a fatty layer that insulates the axon and speeds up
signal transmission. The action potential jumps from one node of Ranvier (gaps in the myelin
sheath) to the next, in a process called saltatory conduction, making the transmission more efficient.
When the action potential reaches the axon terminals, it triggers the release of neurotransmitters
into the synaptic cleft (the small gap between the axon terminal and the dendrite of the next neuron).
This is done through the process of exocytosis, where synaptic vesicles containing
neurotransmitters fuse with the presynaptic membrane and release their contents.
The neurotransmitters cross the synaptic cleft and bind to receptors on the postsynaptic neuron’s
An artificial neuron, a fundamental unit of artificial neural networks, models the basic information
processing function of a biological neuron. The flow of information in an artificial neuron involves
Inputs: The artificial neuron receives multiple inputs, each representing a feature or signal from
external data. These inputs are often represented as numerical values and are analogous to the
13
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
One. Introduction to Neural Networks
Weights: Each input (𝑥𝑖 ) is associated with a weight (𝑤𝑖 ), which determines the importance or
influence of that input on the neuron’s output. These weights are adjustable parameters learned
during the training process. These will be studied in detail in subsequent lecturers.
Weighted Sum: The artificial neuron computes a weighted sum of all inputs, similar to the
integration of inputs in the biological neuron’s soma. The weighted sum is calculated as:
[𝑧 = ∑ 𝑤𝑖 𝑥𝑖 + 𝑏]
𝑖=1
where:
( 𝑏 ) is the bias term, which allows the neuron to shift the activation function and provides
additional flexibility.
14
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
One. Introduction to Neural Networks
Non-linearity: The weighted sum ( 𝑧 ) is passed through an activation function (ϕ(𝑧)) , which
introduces non-linearity into the model. The activation function determines whether the neuron
Common Activation Functions include Threshold function, Step Function, Sigmoid Function,
Output Generation: The result of the activation function (𝑦 = ϕ(𝑧)) is the output of the neuron.
This output can be passed to other neurons in subsequent layers of the network or can represent
Propagation: In a multi-layer network, the output of one neuron becomes the input to neurons in
the next layer, continuing the flow of information through the network until the final output layer
Training: During the learning process, the network adjusts the weights and biases based on the
error between the predicted output and the actual target (in supervised learning). This adjustment
15
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
One. Introduction to Neural Networks
Backpropagation: For multi-layer networks, the error is propagated backward through the network
using a method called backpropagation, which updates the weights to minimize the overall error
An artificial neuron, also known as a perceptron in its simplest form, is a mathematical model
that mimics the behavior of a biological neuron. It is the basic unit of a neural network, designed
Inputs: Analogous to the dendrites in biological neurons, artificial neurons receive multiple inputs,
Weights: Each input is associated with a weight, which represents the strength of the input's
influence on the neuron. These weights are learned during the training process.
Summation Function: The inputs are combined by calculating the weighted sum, which is then
Activation Function: This function determines whether the neuron will "fire" (produce an output).
Common activation functions include the sigmoid, hyperbolic tangent (tanh), and rectified linear
unit (ReLU).
Output: The result of the activation function is the neuron's output, which can be fed into other
16
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
One. Introduction to Neural Networks
McCulloch-Pitts Neuron
1.5 Notations
Scalar variables ( 𝑥𝑖 ), where ( 𝑖 ) is an index representing each scalar in the series. These variables
[𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛]
Here, ( x1 , x2 , x3 , … , xn ) are individual scalar values, and ( 𝑛 ) is the total number of scalars in
the series. The sum of these scalar variables can be expressed as:
[Sum = ∑ 𝑥𝑖 = 𝑥1 + 𝑥2 + 𝑥3 + ⋯ + 𝑥𝑛 ]
𝑖=1
This sum is a scalar number that results from adding all the (𝑥𝑖 ) values together.
For vectors,
17
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
One. Introduction to Neural Networks
𝑏 represents the bias term, a constant added to the weighted sum of inputs to allow for shifting the
activation function.
Also, the following is a series of row vector variables, where each vector is represented as a
Here:
Each row vector (𝑥𝑖 ) has ( 𝑛 ) components, denoted as (𝑥𝑖1 , 𝑥𝑖2 , … , 𝑥𝑖𝑛 ), where ( 𝑖 ) indicates the
vector number and ( 𝑛 ) represents the number of elements in each row vector.
These row vectors form an ordered set of related numbers and can be used in various mathematical
18
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
One. Introduction to Neural Networks
Neuron Equation
The output of an artificial neuron can be expressed mathematically by the following equation:
[𝑦 = 𝜙 (∑ 𝑤𝑖 𝑥𝑖 + 𝑏)]
𝑖=1
Here:
(𝜙(⋅)) is the activation function, which introduces non-linearity into the model, enabling the
19
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Two. Model of an Artificial Neuron
Artificial neurons are the fundamental building blocks of neural networks, which are
computational models inspired by the biological neural networks in the human brain.
Understanding the model of an artificial neuron is crucial for comprehending how neural networks
learn from data, make decisions, and perform complex tasks. We will explore the basic elements
of an artificial neuron, focusing on its core components, including the activation function. We will
also examine various types of activation functions, such as the threshold function, piecewise linear
function, and sigmoidal function, and provide practical examples to illustrate their use.
abstraction that simulates the behavior of a biological neuron. It consists of several key elements
An artificial neuron receives multiple inputs, each representing a feature of the data. These inputs
are denoted as (𝑥1 , 𝑥2 , … , 𝑥𝑛 ), where ( 𝑛 ) is the number of inputs. Each input is associated with a
weight (𝑤1 , 𝑤2 , … , 𝑤𝑛 ) , which determines the significance of the corresponding input. The
weights are learnable parameters that are adjusted during the training process to optimize the
neuron's performance.
The neuron computes a weighted sum of the inputs, which can be expressed as:
20
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Two. Model of an Artificial Neuron
[𝑧 = ∑ 𝑤𝑖 𝑥𝑖 + 𝑏]
𝑖=1
where ( 𝑧 ) is the weighted sum, and ( 𝑏 ) is the bias term. The bias allows the model to shift the
The weighted sum ( 𝑧 ) is passed through an activation function (ϕ(𝑧)) , which determines
whether the neuron will "fire" or produce an output. The activation function introduces non-
linearity into the model, enabling the network to learn complex relationships. The output of the
neuron is thus:
[𝑦 = ϕ(𝑧)]
The activation function is a critical component of an artificial neuron. It defines how the weighted
sum of inputs is transformed into an output that can be passed to the next layer in the network or
used as the final prediction. Various activation functions serve different purposes and have unique
properties that make them suitable for different types of tasks. In this section, we will discuss three
common activation functions: the threshold function, the piecewise linear function, and the
sigmoidal function.
21
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Two. Model of an Artificial Neuron
The threshold function, also known as the step function, is one of the simplest activation functions.
It outputs a binary value based on whether the input exceeds a certain threshold.
Definition
1 𝑖𝑓 𝑧 ≥ 𝜃
𝜑(𝑧) = {
0 𝑖𝑓 𝑧 < 𝜃
The threshold function is used in binary classification tasks where the objective is to classify inputs
into one of two categories. For example, in a simple binary classification problem such as email
spam detection, the neuron might output 1 (spam) if the input features (e.g., frequency of certain
However, the threshold function has limitations. It is not differentiable, making it unsuitable for
Additionally, it cannot capture the subtleties in data where small changes in inputs should lead to
The piecewise linear function, also known as the rectified linear unit (ReLU) when applied in its
most common form, is another popular activation function. It addresses some of the limitations of
the threshold function by allowing the output to vary continuously with the input.
22
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Two. Model of an Artificial Neuron
Definition
𝑧 𝑖𝑓 𝑧 ≥ 0
𝜑(𝑧) = {
0 𝑖𝑓 𝑧 < 0
This can be visualized as a function that is linear for positive inputs and zero for negative inputs.
ReLU is widely used in deep learning because it is simple and computationally efficient. It
introduces non-linearity while retaining differentiability for positive inputs, making it compatible
with gradient-based optimization techniques. Additionally, ReLU helps mitigate the vanishing
gradient problem, a common issue with other activation functions, particularly in deep networks.
For instance, in an image classification task, ReLU can be applied to the output of convolutional
layers, allowing the network to learn more complex features by activating neurons only when the
The sigmoidal function, or sigmoid function, is a smooth, S-shaped function that maps any input
to a value between 0 and 1. This makes it particularly useful for tasks where outputs need to be
interpreted as probabilities.
Definition
1
[σ(𝑧) = ]
1 + 𝑒 −𝑧
23
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Two. Model of an Artificial Neuron
The sigmoid function is commonly used in binary classification tasks where the output represents
the probability of a particular class. For example, in a medical diagnosis system, a neuron with a
sigmoid activation function might output the probability that a patient has a certain disease based
One of the key advantages of the sigmoid function is its differentiability, which allows for the
gradient problem, particularly when the input values are very large or very small, causing the
gradients to become very small and slow down the learning process.
Question: Consider a simple neuron with two inputs (𝑥1 ) and (𝑥2 ), weights (𝑤1 ) and (𝑤2 ), and
1
a bias ( 𝑏 ). Suppose the activation function is a sigmoid function (𝜎(𝑧) = 1+𝑒 −𝑧 ).
Solution:
[𝑦 = 𝜎(𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏)]
If (𝑤1 = 0.5), (𝑤2 = −1.2), (𝑥1 = 0.8), (𝑥2 = 0.4), 𝑎𝑛𝑑(𝑏 = 0.2), then:
24
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Two. Model of an Artificial Neuron
1
[𝑦 = ≈ 0.53]
1 + 𝑒 −0.12
This output ( 𝑦 ) can then be used as an input to other neurons in a more complex network or serve
Question: Given a simple neuron with inputs (𝑥1 = 2) and (𝑥2 = 3) , weights (𝑤1 = 0.5) and
(𝑤2 = −1), and a bias ( 𝑏 = 1 ), calculate the output of the neuron using the ReLU activation
function.
Solution:
[𝑧 = (0.5 ⋅ 2) + (−1 ⋅ 3) + 1]
[ 𝑧 = 1 − 3 + 1 ]
[ 𝑧 = −1 ]
25
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Two. Model of an Artificial Neuron
Since ( 𝑧 = −1 ):
Questions: Consider a simple neuron with two inputs (𝑥1 ) and (𝑥2 ), weights (𝑤1 ) and (𝑤2 ), and
a bias ( 𝑏 ). The activation function for the neuron is the ReLU (Rectified Linear Unit) function,
(𝑥1 = 1.5)
(𝑥2 = −2.0)
(𝑤1 = 0.8)
(𝑤2 = −1.3)
( 𝑏 = 0.5 )
Solution:
26
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Two. Model of an Artificial Neuron
[𝑧 = 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏]
The output of the neuron after applying the ReLU activation function is:
[Output = 4.3]
27
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Two. Model of an Artificial Neuron
Question: Consider a simple neuron with two inputs (𝑥1 ) and (𝑥2 ), weights (𝑤1 ) and (𝑤2 ), and
a bias ( 𝑏 ). The activation function for the neuron is a Threshold Function where the threshold
1 𝑖𝑓 𝑧 ≥ 0
𝑦 ={
0 𝑖𝑓 𝑧 < 0
(𝑥1 = 1.5)
(𝑥2 = −2.0)
(𝑤1 = 0.8)
(𝑤2 = −1.3)
( 𝑏 = 0.5 )
Solution:
[𝑧 = 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏]
28
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Two. Model of an Artificial Neuron
1 𝑖𝑓 𝑧 ≥ 0
𝑦 ={
0 𝑖𝑓 𝑧 < 0
Since the calculated value of ( 𝑧 = 4.3 ) is greater than 0, we apply the threshold function:
[𝑦 = 1]
29
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Three. Neural Network Architectures
Neural network architectures define the structure and connectivity of the artificial neurons within
a network. The architecture of a neural network plays a critical role in determining its capacity to
model complex relationships, its computational efficiency, and its ability to generalize from data.
Understanding different neural network architectures is essential for designing models that are
well-suited to specific tasks. We will explore three fundamental neural network architectures: the
single-layer feed-forward network, the multi-layer feed-forward network, and the recurrent
The single-layer feed-forward network, also known as a perceptron, is the most basic type of neural
network architecture. It consists of a single layer of output neurons directly connected to a layer
of input neurons. Each input neuron is connected to every output neuron through a weighted link,
and there is no hidden layer between the input and output layers.
30
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Three. Neural Network Architectures
▪ The input layer receives the raw input data, which is then passed directly to the output layer
▪ The output layer consists of one or more neurons, each computing a weighted sum of the
inputs, followed by the application of an activation function to produce the final output.
Example
Consider a simple binary classification problem where the task is to classify points in a two-
dimensional space as belonging to one of two classes (e.g., red or blue). A single-layer feed-
forward network with two input neurons (representing the coordinates of the points) and one output
neuron (representing the class label) can be used for this task.
Given inputs (𝑥1 ) and (𝑥2 ), the output ( 𝑦 ) of the network is:
[𝑦 = ϕ(𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏)]
where (𝑤1 ) and (𝑤2 ) are the weights, ( 𝑏 ) is the bias, and (ϕ(⋅)) is the activation function,
Example:
You are tasked with classifying points in a two-dimensional space into one of two classes: red
(Class 0) or blue (Class 1). The classification is done using a single-layer feed-forward neural
31
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Three. Neural Network Architectures
Given:
Point 1: (2,3)
Point 2: (1,1)
Point 3: (4,5)
Point 4: (3,2)
1 if z ≥ 0
output = {
0 if z < 0
32
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Three. Neural Network Architectures
Question:
For each point, calculate the weighted sum 𝑧 and determine the predicted class (0 for red, 1 for
Compare the predicted classes with the actual class labels and determine if the network correctly
Step-by-Step Solution:
Prediction:
Prediction:
Weighted Sum:
33
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Three. Neural Network Architectures
z=(0.2⋅4)+(0.4⋅5)+0.1=0.8+2.0+0.1=2.9
Prediction:
Weighted Sum:
z=(0.2⋅3)+(0.4⋅2)+0.1=0.6+0.8+0.1=1.5
Prediction:
Therefore:
To improve classification, the weights and bias can be adjusted using a learning algorithm such as
gradient descent based on the error between the predicted and actual outputs.
34
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Three. Neural Network Architectures
While single-layer feed-forward networks are simple and computationally efficient, they are
limited in their capacity to solve complex problems. Specifically, they can only solve linearly
separable problems—problems where a straight line (or hyperplane in higher dimensions) can
separate the classes. For example, a single-layer network would struggle to classify points that are
not linearly separable, such as those arranged in concentric circles or an XOR pattern.
A multi-layer feed-forward network, also known as a multi-layer perceptron (MLP), extends the
single-layer architecture by introducing one or more hidden layers between the input and output
layers. These hidden layers allow the network to model more complex, non-linear relationships in
the data.
▪ The input layer receives the raw data, which is passed to the first hidden layer through
weighted connections.
▪ Each neuron in the hidden layer computes a weighted sum of its inputs, applies an
35
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Three. Neural Network Architectures
▪ The process is repeated across all hidden layers, with the final layer (output layer)
The introduction of hidden layers enables the network to learn hierarchical representations of the
Example
Consider the task of recognizing handwritten digits (0–9) from images. A typical multi-layer feed-
image.
▪ One or more hidden layers, each containing a few hundred neurons, responsible for
▪ An output layer with 10 neurons, each representing one of the digits (0–9).
36
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Three. Neural Network Architectures
The network is trained using a labeled dataset, such as the Modified National Institute of Standards
and Technology (MNIST) dataset, where each image is labeled with the correct digit. During
training, the network adjusts its weights through backpropagation, minimizing the difference
Multi-layer feed-forward networks are powerful because they can solve both linearly and non-
linearly separable problems. The hidden layers enable the network to capture complex patterns and
interactions in the data. This makes MLPs suitable for a wide range of applications, including
image and speech recognition, natural language processing, and financial forecasting.
However, training multi-layer networks can be computationally intensive and requires careful
tuning of hyperparameters, such as the number of layers, number of neurons per layer, learning
Recurrent networks (RNNs) are a class of neural network architectures designed to model
sequential data, where the order of the data points is important. Unlike feed-forward networks,
which process each input independently, RNNs have connections that allow them to retain
37
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Three. Neural Network Architectures
In an RNN:
▪ Each neuron in the network has a connection not only to the neurons in the next layer but
also to itself or neurons in the previous layers, allowing information to be passed from one
▪ The network processes input sequences one element at a time, with the hidden state of the
network updating at each time step to incorporate information from both the current input
This structure makes RNNs particularly well-suited for tasks where context or memory is essential,
Example
Consider the task of language modeling, where the goal is to predict the next word in a sentence
based on the preceding words. An RNN can be used to achieve this by processing each word in
the sentence sequentially and updating its hidden state to reflect the context provided by the
previous words.
For instance, given the input sequence "The cat sits on the," the RNN would process each word in
turn, updating its hidden state after each word. By the time it reaches the word "the," the hidden
state would contain information about the entire preceding context, enabling the RNN to predict
38
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Three. Neural Network Architectures
While simple RNNs can capture short-term dependencies, they struggle with long-term
dependencies due to issues like vanishing and exploding gradients during training (diminishing or
excessive gradient models (vectors)). To address these limitations, several advanced RNN
Long Short-Term Memory (LSTM): LSTMs introduce memory cells that can store information
over long periods, along with gating mechanisms (input, output, and forget gates) to control the
flow of information. This allows LSTMs to capture long-term dependencies more effectively than
standard RNNs.
Gated Recurrent Unit (GRU): GRUs are a simplified version of LSTMs, combining the forget and
input gates into a single update gate. They are computationally more efficient while still addressing
Applications
RNNs, particularly LSTMs and GRUs, are widely used in applications where sequence data is
Natural Language Processing (NLP): RNNs are used for tasks like machine translation, text
Time Series Analysis: RNNs can model and predict time-dependent phenomena, such as stock
39
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Three. Neural Network Architectures
Music Generation: RNNs can be used to generate sequences of musical notes, creating new
40
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Four. Learning Methods in Neural Networks
Learning methods in neural networks are fundamental to how these models adapt and improve
their performance based on data. The learning process involves adjusting the parameters of the
network—primarily the weights and biases—so that the network can accurately predict or classify
unseen data. Different learning algorithms are used depending on the nature of the data and the
task at hand. We will discuss three primary categories of learning methods in neural networks:
unsupervised learning, supervised learning, and reinforced learning. Each category encompasses
Unsupervised learning is a type of machine learning where the model is trained on a dataset without
explicit labels or target outputs (i.e. the desired outcomes or predictions). The objective is for the
network to discover underlying patterns, structures, or features in the data. This method is
particularly useful in exploratory data analysis, clustering, and dimensionality reduction. Two
prominent unsupervised learning algorithms in neural networks are Hebbian learning and
competitive learning.
Hebbian Learning
Hebbian learning is one of the oldest and most fundamental learning rules, proposed by Donald
Hebb in 1949. It is based on the principle that "neurons that fire together, wire together." This
means that the connection between two neurons is strengthened if they are activated
41
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Four. Learning Methods in Neural Networks
simultaneously. Hebbian learning is often used in the context of associative learning, where the
Characteristics:
Synaptic Weight Update: The weight between two neurons increases if both neurons are activated
[Δ𝑤𝑖𝑗 = η ⋅ 𝑥𝑖 ⋅ 𝑥𝑗 ]
where (Δ𝑤𝑖𝑗 ) is the change in the weight between neurons ( 𝑖 ) and ( 𝑗 ), ( η) is the learning rate,
and (𝑥𝑖 ) and (𝑥𝑗 ) are the activations of the respective neurons.
Learning Process: Over time, Hebbian learning causes the network to strengthen connections
between frequently co-occurring patterns, leading to the formation of memory traces in the
network.
Example: Consider a simple neural network designed to recognize visual patterns, such as faces.
If a particular configuration of inputs (e.g., certain edges or features) frequently occurs together
when a face is presented, Hebbian learning will strengthen the connections between the neurons
responsible for detecting these features. Over time, the network becomes more sensitive to faces,
Competitive Learning
compete to respond to input data. Typically, only one neuron—or a small subset of neurons—
42
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Four. Learning Methods in Neural Networks
"wins" and is allowed to update its weights. This mechanism leads to a process known as "winner-
takes-all."
Characteristics:
Competition: Neurons in the network compete with each other based on their response to the input.
Weight Update: The weights of the winning neuron are adjusted to be more similar to the input
vector, reinforcing the neuron's specialization in recognizing similar inputs in the future.
[Δ𝑤𝑗 = η ⋅ (𝑥 − 𝑤𝑗 )]
where (𝑤𝑗 ) represents the weight vector of the winning neuron, and ( 𝑥 ) is the input vector.
Example: Competitive learning is often used in clustering tasks, such as self-organizing maps
(SOMs). For instance, if a network is trained on color data (e.g., RGB values), competitive learning
will cause different neurons to specialize in different colors. Eventually, the network might form
clusters corresponding to various color regions, effectively organizing the color space.
Supervised learning is a type of learning where the model is trained on a labeled dataset, meaning
that each input is associated with a corresponding target output. The network learns to map inputs
to outputs by minimizing the difference between its predictions and the actual labels. Two common
supervised learning algorithms are stochastic learning and gradient descent learning.
43
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Four. Learning Methods in Neural Networks
Stochastic Learning
Stochastic learning, also known as online learning, is a variant of gradient descent where the model
parameters are updated after each training example rather than after processing the entire dataset.
This approach introduces randomness into the learning process, which can help the network escape
Characteristics:
Incremental Updates: Weights are updated incrementally after each training example, making the
Faster Convergence: Stochastic learning can converge faster than batch gradient descent,
Noise and Stability: The randomness introduced by updating the weights after each example can
introduce noise, but it can also help in exploring the solution space more effectively.
Example: Consider a neural network trained to predict house prices based on features such as
square footage, number of bedrooms, and location. Using stochastic learning, the network updates
its weights after each individual example (e.g., a specific house sale). This allows the network to
quickly adapt to new patterns as it processes more data, potentially leading to faster convergence.
44
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Four. Learning Methods in Neural Networks
Gradient descent is one of the most widely used optimization algorithms in supervised learning. It
works by iteratively adjusting the network's weights in the direction that minimizes a loss function,
Characteristics:
Loss Function: The network computes a loss function that measures the difference between its
predictions and the actual labels. The goal is to minimize this loss.
Gradient Calculation: The gradient of the loss function with respect to each weight is calculated,
Weight Update: Weights are updated in the opposite direction of the gradient to move the network
[𝑤 = 𝑤 − η ⋅ ∇𝐿(𝑤)]
where (∇𝐿(𝑤)) is the gradient of the loss function with respect to the weights, and ( η) is the
learning rate.
Example: In a network trained to classify images of animals, gradient descent learning would
involve calculating the gradients of the loss function with respect to the network's weights after
processing a batch of images. The weights are then updated to reduce the loss, improving the
45
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Four. Learning Methods in Neural Networks
Reinforced learning, or reinforcement learning, is a type of learning where the model learns to
make decisions by interacting with an environment. The model, often called an agent, receives
feedback in the form of rewards or punishments based on the actions it takes. The goal is to learn
Characteristics:
Agent and Environment: The agent interacts with an environment, making decisions (actions)
based on the current state. The environment responds by providing a new state and a reward signal.
Reward Signal: The reward signal indicates the success of the action taken. Positive rewards
encourage the agent to repeat certain actions, while negative rewards discourage them.
Policy Learning: The agent learns a policy, which is a strategy for selecting actions based on states,
Example: A classic example of reinforced learning is training a model to play a game, such as
chess or Go. The agent (player) makes moves in the game (actions) based on the current board
configuration (state). After each move, the agent receives feedback (reward), such as gaining
points or capturing an opponent's piece. Over time, the agent learns to make better decisions that
A central challenge in reinforced learning is the trade-off between exploration (trying new actions
to discover their effects) and exploitation (choosing actions known to yield high rewards).
46
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Four. Learning Methods in Neural Networks
Real-World Application:
Reinforced learning is widely used in autonomous systems, such as self-driving cars. The car
(agent) learns to navigate roads by interacting with its environment (traffic, pedestrians, obstacles).
It receives rewards for safe driving and penalties for risky behaviors. Over time, the car learns to
47
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Five. Taxonomy of Neural Network Systems
Neural networks have evolved into a diverse and powerful set of tools used in numerous fields,
including computer vision, natural language processing, and autonomous systems. The diversity
of neural network systems stems from variations in their architectures, learning methods, and
applications. To effectively navigate this complex filed, it is essential to understand the taxonomy
of neural network systems. Therefore, we will explore popular neural network systems and classify
them based on their learning methods and architecture types. This taxonomy provides a structured
approach to understanding the wide range of neural networks and their appropriate use cases.
Taxonomy in the context of neural networks refers to the systematic categorization of these
systems based on certain characteristics such as architecture, learning methods, and applications.
This classification aids in the identification and selection of the appropriate neural network for a
given problem.
Over the years, several neural network systems have gained prominence due to their success in
solving complex tasks. Below are some of the most popular neural network systems:
Convolutional Neural Networks (CNNs) are specifically designed for processing structured grid
data, such as images. They have become the standard architecture for tasks involving image and
video processing.
48
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Five. Taxonomy of Neural Network Systems
Key Characteristics:
Convolutional Layers: These layers apply filters (convolutions) to the input image, extracting
Pooling Layers: These layers reduce the spatial dimensions of the data, making the network more
Fully Connected Layers: After feature extraction, fully connected layers are used to perform
Example: CNNs are widely used in facial recognition systems, where they learn to identify unique
features of faces from images, enabling them to accurately recognize individuals in various
conditions.
Recurrent Neural Networks (RNNs) are designed for sequence data, where the order of inputs is
crucial. They are widely used in tasks involving time series, language processing, and sequential
data analysis.
Key Characteristics:
Recurrent Connections: RNNs have connections that allow information to be passed from one time
step to the next, enabling the network to retain memory of previous inputs.
Sequence Prediction: RNNs are adept at tasks where the output at each time step depends on both
49
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Five. Taxonomy of Neural Network Systems
Example: RNNs are commonly used in language models for tasks such as text generation, where
the model generates text one word at a time based on the previous words, maintaining the context
of the sentence.
Generative Adversarial Networks (GANs) are a class of neural networks used for generating new
data that is similar to a given dataset. GANs consist of two networks: a generator that creates data
Key Characteristics:
Adversarial Process: The generator and discriminator are trained simultaneously in a game-
theoretic manner, where the generator tries to produce data indistinguishable from real data, and
Data Generation: GANs are capable of generating highly realistic images, text, and even video.
Example: GANs have been used to generate realistic images of non-existent people, as seen in
projects like "This Person Does Not Exist," where the generated faces appear indistinguishable
Transformer Networks
50
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Five. Taxonomy of Neural Network Systems
Key Characteristics:
words in a sequence, allowing the model to focus on relevant parts of the input for making
predictions.
Parallel Processing: Unlike RNNs, Transformers process all tokens in a sequence simultaneously,
making them more computationally efficient and capable of handling longer sequences.
Example: Transformers are the foundation of state-of-the-art language models like GPT-3 and
BERT, which excel in tasks such as language translation, summarization, and question-answering.
5.1.2 Classification of Neural Network Systems with Respect to Learning Methods and
Architecture Types
Neural network systems can be classified based on the learning methods they employ and their
architectural design. This classification helps in understanding how different networks are trained
Supervised Learning:
51
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Five. Taxonomy of Neural Network Systems
Characteristics: In supervised learning, the network is trained on labeled data, meaning each input
is paired with the correct output. The network learns to map inputs to outputs by minimizing a loss
function that measures the difference between the predicted and actual outputs.
Use Cases: Supervised learning is widely used in tasks like image classification, object detection,
Unsupervised Learning:
Characteristics: In unsupervised learning, the network is trained on data without explicit labels.
The goal is to uncover hidden structures or patterns within the data. This method is often used for
Reinforcement Learning:
and learns to make decisions by receiving rewards or penalties based on its actions. The goal is to
52
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Five. Taxonomy of Neural Network Systems
Use Cases: Reinforcement learning is used in robotics, game playing (e.g., AlphaGo), and
autonomous vehicles.
Feed-Forward Networks:
Networks (CNNs)
Characteristics: In feed-forward networks, information flows in one direction from the input layer
to the output layer, passing through any hidden layers without looping back. These networks are
typically used for tasks where the data has no inherent sequence or temporal dependency.
Applications: Feed-forward networks are used in tasks such as image recognition, speech
Recurrent Networks:
Examples: Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM), Gated
Characteristics: Recurrent networks include connections that loop back, allowing the network to
maintain a memory of previous inputs. This makes them well-suited for sequential data where
53
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Five. Taxonomy of Neural Network Systems
Applications: RNNs are applied in time series analysis, language modeling, and music
composition.
Convolutional Networks:
Characteristics: Convolutional networks are specifically designed to process grid-like data, such
as images. They use convolutional layers to automatically detect and learn features from the data.
Applications: CNNs are extensively used in computer vision tasks like image classification, object
Generative Networks:
Characteristics: Generative networks are designed to generate new data samples that resemble the
training data. They learn to capture the underlying distribution of the data, which allows them to
Applications: These networks are used in tasks such as image synthesis, text generation, and data
augmentation.
54
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Five. Taxonomy of Neural Network Systems
Transformers:
allow them to process and learn from long-range dependencies in sequences. They are particularly
Applications: Transformers are used in machine translation, text summarization, and language
generation.
55
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Six. Single-Layer Neural Network Systems
A single-layer neural network (NN) system is the simplest form of neural network architecture. It
consists of one layer of neurons that directly connect the input layer to the output layer without
any hidden layers. Despite its simplicity, the single-layer perceptron and variants such as the
ADAptive LInear NEuron (ADALINE) have played foundational roles in the development of more
complex neural network architectures. We will explore the structure and learning algorithms of the
problems like the XOR problem), and introduce ADALINE, which addresses some of the
perceptron’s shortcomings.
Input Layer: The input layer consists of input units (features), each representing a different
Weights: Each input is connected to the output via a weight, which determines the importance of
that input.
Summation Function: The perceptron computes the weighted sum of the inputs.
Activation Function: A threshold activation function is applied to the weighted sum to produce a
56
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Six. Single-Layer Neural Network Systems
[𝑦 = ϕ (∑ 𝑤𝑖 𝑥𝑖 + 𝑏)]
𝑖=1
where:
1 𝑖𝑓 𝑧 ≥ 0
𝜑(𝑧) = {
0 𝑖𝑓 𝑧 < 0
The goal of the perceptron learning algorithm is to adjust the weights and bias in such a way that
the perceptron correctly classifies input data. The algorithm works by iteratively updating the
weights based on the error between the predicted output and the true label.
1. Initialization: Initialize the weights (𝑤1 , 𝑤2 , … , 𝑤𝑛 ) and bias ( 𝑏 ) to small random values.
57
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Six. Single-Layer Neural Network Systems
[𝑦 = ϕ(∑𝑛𝑖=1 𝑤𝑖 𝑥𝑖 + 𝑏)]
Update the weights and bias if there is an error (i.e., if ( 𝑦 ≠ 𝑡 ), where ( 𝑡 ) is the target output):
[𝑏 = 𝑏 + η(𝑡 − 𝑦)]
Here, ( η) is the learning rate, a small positive constant that controls the size of the weight
updates.
3. Repeat this process for each training example until all examples are classified correctly or the
The perceptron is capable of solving linearly separable tasks, where the data points belonging to
different classes can be separated by a straight line (in two dimensions), a plane (in three
Example: Consider a binary classification problem with two classes, ( 𝐴 ) and ( 𝐵 ), where class
( 𝐴 ) consists of points ((0,0)) and ((1,1)), and class ( 𝐵 ) consists of points ((0,1)) and ((1,0)).
These points are linearly separable since a straight line can be drawn that separates the two classes.
Here is a diagram illustrating the linear separability of two classes, ( 𝐴 ) and ( 𝐵 ). The blue circles
represent points from Class (𝐴)((0,0)) and ((1,1)), and the red circles represent points from Class
(𝐵)((0,1)) and ((1,0)). The straight line in the diagram clearly separates the two classes, showing
58
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Six. Single-Layer Neural Network Systems
Limitations:
The perceptron performs well on linearly separable problems, but it fails on non-linearly separable
problems, such as the XOR problem, which requires more complex decision boundaries.
The XOR problem (exclusive OR) is a classic example of a non-linearly separable problem. It
involves two binary inputs and a binary output, where the output is 1 if the inputs are different,
59
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Six. Single-Layer Neural Network Systems
𝒙𝟏 𝒙𝟐 XOR(𝒙𝟏 , 𝒙𝟐 )
0 0 0
0 1 1
1 0 1
1 1 0
Non-Linearly Separable:
If you plot the XOR inputs ((0,0), (0,1), (1,0), (1,1)) on a graph, you will notice that there is no
straight line that can separate the two classes (output 1 and output 0). This is because the points
((0,1)) and ((1,0)) are mixed with points ((0,0)) and ((1,1)). Here is a diagram illustrating the
XOR problem. The blue circles represent points for Class 0((0,0)) and ((1,1)), while the purple
diamonds represent points for Class 1 ((0,1)) and ((1,0)). As shown, no straight line can separate
the two classes, demonstrating that the data is not linearly separable.
Perceptron Failure: Since the perceptron can only solve linearly separable problems, it cannot
correctly classify XOR data. To solve the XOR problem, a more complex architecture, such as a
60
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Six. Single-Layer Neural Network Systems
ADALINE (ADAptive LInear NEuron) is a variation of the perceptron that uses a different
learning rule and activation function. The key difference lies in how ADALINE computes and
updates its weights. While the perceptron uses a binary threshold activation function, ADALINE
Key Components:
Inputs and Weights: Like the perceptron, ADALINE takes multiple inputs (𝑥1 , 𝑥2 , … , 𝑥𝑛 ), each
[𝑧 = ∑ 𝑤𝑖 𝑥𝑖 + 𝑏]
𝑖=1
Output: Unlike the perceptron, ADALINE does not apply a threshold activation function during
training. Instead, it uses the linear output ( 𝑧 ) for weight updates, but applies a threshold at the
The training algorithm for ADALINE is based on minimizing the mean squared error (MSE)
between the actual output and the target output. This is a key difference from the perceptron, which
61
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Six. Single-Layer Neural Network Systems
[𝑧 = ∑𝑛𝑖=1 𝑤𝑖 𝑥𝑖 + 𝑏]
[𝑤𝑖 = 𝑤𝑖 + η ⋅ 𝑒 ⋅ 𝑥𝑖 ]
[𝑏 = 𝑏 + η ⋅ 𝑒]
where ( 𝑒 ) is the difference between the target ( 𝑡 ) and the actual output ( 𝑧 ), and ( η) is the
learning rate.
3. Repeat this process until the error falls below a specified threshold or the maximum number of
iterations is reached.
Error Minimization: Unlike the perceptron, which updates weights only when there is a
classification error, ADALINE updates its weights based on minimizing the error between the
predicted and target outputs using a linear activation function. This makes ADALINE more stable
during learning.
62
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Six. Single-Layer Neural Network Systems
separable, but it can also handle more complex decision boundaries with appropriate modifications
Consider a binary classification problem with inputs ((1,0), (0,1), (1,1), (0,0)) and corresponding
targets (1, 1, 0, 0). ADALINE would compute the weighted sum ( 𝑧 ) for each input, adjust the
weights based on the error, and eventually learn a linear decision boundary that separates the two
classes.
Solution:
[Targets = {1,1,0,0}]
The task is to use the ADALINE (ADAptive LInear NEuron) algorithm to compute the weighted
sum ( 𝑧 ), adjust the weights based on the error, and eventually find a linear decision boundary
1. Initialize Parameters
ADALINE will use weights (𝑤1 ), (𝑤2 ), and a bias term ( 𝑏 ). These parameters are initialized to
Let us initialize:
[𝑤1 = 0, 𝑤2 = 0, 𝑏 = 0]
63
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Six. Single-Layer Neural Network Systems
Also, we define a learning rate ( η = 0.1 ) (this controls the size of the weight updates).
The ADALINE algorithm uses the mean squared error (MSE) as a criterion for adjusting the
b. Compute the error ( 𝑒 ) between the target and the weighted sum.
[𝑧 = 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏]
[𝑤𝑖 = 𝑤𝑖 + η ⋅ 𝑒 ⋅ 𝑥𝑖 ]
[𝑏 = 𝑏 + η ⋅ 𝑒]
Now, let us apply these steps to each input point iteratively. We will go through multiple epochs
64
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Six. Single-Layer Neural Network Systems
[𝑧 = 𝑤1 ⋅ 1 + 𝑤2 ⋅ 0 + 𝑏 = 0 ⋅ 1 + 0 ⋅ 0 + 0 = 0]
Compute error:
[𝑒 = 1 − 𝑧 = 1 − 0 = 1]
[𝑤2 = 0 + 0.1 ⋅ 1 ⋅ 0 = 0]
[𝑏 = 0 + 0.1 ⋅ 1 = 0.1]
Compute error:
[𝑒 = 1 − 𝑧 = 1 − 0.1 = 0.9]
65
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Six. Single-Layer Neural Network Systems
Compute error:
[𝑒 = 0 − 𝑧 = 0 − 0.38 = −0.38]
Compute error:
[𝑒 = 0 − 𝑧 = 0 − 0.152 = −0.152]
66
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Six. Single-Layer Neural Network Systems
After the first epoch, we repeat the same steps, re-evaluating each input and adjusting the weights
accordingly. Typically, several epochs are required for the ADALINE to converge, but for
illustration purposes, we can stop after a few iterations once the changes in weights become small.
Compute error:
[𝑒 = 1 − 𝑧 = 1 − 0.1988 = 0.8012]
67
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Six. Single-Layer Neural Network Systems
Compute error:
[𝑒 = 1 − 𝑧 = 1 − 0.26892 = 0.73108]
The ADALINE algorithm progressively adjusts the weights based on the error between the
predicted output and the target values. Through multiple iterations (epochs), the weights converge
to values that separate the two classes with a linear decision boundary. Although ADALINE's
learning is more stable than the perceptron’s, it may still require multiple iterations for the weights
68