0% found this document useful (0 votes)
100 views68 pages

Fundamentals of Neural Networks-2

Uploaded by

rickmandavid80
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
100 views68 pages

Fundamentals of Neural Networks-2

Uploaded by

rickmandavid80
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh

One. Introduction to Neural Networks

1. Introduction to Neural Networks

Neural networks have become one of the most influential technologies in the field of artificial

intelligence (AI), contributing to major advancements in areas such as image recognition,

natural language processing, and autonomous systems. Their ability to model complex, non-

linear relationships in data has made them indispensable in solving problems that were previously

difficult to manage. We shall explore the foundational concepts of neural networks, beginning with

a discussion of the motivation behind their development, the historical evolution of neural

network research, and the underlying biological and artificial neuron models. We will also

introduce key notations and the neuron equation, which are essential for understanding how neural

networks operate.

1.1 Why Neural Networks?

Neural networks are inspired by the human brain's ability to learn from experience and

generalize from previously encountered situations. Traditional computational methods often

struggle with tasks that require pattern recognition, such as distinguishing between images of cats

and dogs, translating languages, or predicting stock prices. These tasks involve complex

relationships and patterns that are not easily captured by conventional algorithms, especially when

the input data is large and unstructured.

Neural networks address these challenges by learning directly from the data. They are designed

to automatically identify patterns and relationships in the input data without the need for

explicit programming. This capability makes them highly effective in tasks like classification,

regression, and clustering. For example, in image recognition, a neural network can be trained

1
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
One. Introduction to Neural Networks

on thousands of labeled images, allowing it to accurately identify objects in new, unseen

images.

1.2. Research History

The development of neural networks has a rich history, beginning in the early 1940s. The concept

of a computational model based on the human brain was first proposed by Warren McCulloch

and Walter Pitts in 1943, who introduced the idea of a neuron as a binary threshold unit (This

simply means that a neuron is a simple model that activates (outputs 1) if the weighted sum of its

inputs exceeds a certain threshold, otherwise it stays inactive (outputs 0)). This early model laid

the foundation for subsequent research into artificial intelligence and machine learning.

In the 1950s and 1960s, Frank Rosenblatt developed the perceptron, an early type of neural

network capable of learning from data. Despite its initial success, the perceptron was limited by

its inability to solve non-linearly separable problems (non-linearly separable problems are those

where data points cannot be separated into distinct classes by a straight line or a simple linear

boundary), as highlighted by Marvin Minsky and Seymour Papert in their 1969 book, Perceptrons.

This limitation led to a temporary decline in interest in neural networks.

The field experienced a resurgence in the 1980s with the development of backpropagation, a

method for training multi-layer neural networks, introduced by David E. Rumelhart, Geoffrey E.

Hinton, and Ronald J. Williams. Backpropagation addressed the limitations of earlier models by

enabling the training of deeper networks, leading to significant improvements in performance. This

period marked the beginning of the modern era of neural networks, with continuous advancements

in algorithms, architectures, and hardware accelerating their adoption in various domains.

2
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
One. Introduction to Neural Networks

1.3 Biological Neuron Model

The structure and function of neural networks are inspired by the biological neurons found in the

human brain. A biological neuron consists of three main components: the cell body (soma),

dendrites, and an axon. The dendrites receive electrical signals (inputs) from other neurons, which

are then processed in the cell body. If the combined signal exceeds a certain threshold, the neuron

fires, sending an electrical impulse down the axon to communicate with other neurons.

The key features of biological neurons include:

Summation of Inputs: Neurons receive multiple inputs, which are summed to determine whether

the neuron will fire.

Activation Function: The firing of the neuron is dependent on whether the summed inputs exceed

a certain threshold.

Propagation of Signal: Once a neuron fires, the signal is transmitted to other neurons, forming

complex networks.

This biological model serves as the inspiration for artificial neurons, which aim to replicate these

fundamental processes in a simplified form.

Structure of a neuron

3
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
One. Introduction to Neural Networks

A biological neuron is a specialized cell that serves as the fundamental building block of the

nervous system. It is designed to receive, process, and transmit information through electrical and

chemical signals. Understanding the different components of a biological neuron is essential for

comprehending how the brain and nervous system function. Below is a detailed description of the

various parts and components of a biological neuron:

a. Cell Body (Soma)

Structure: The cell body, or soma, is the central part of the neuron and contains the nucleus and

other essential organelles. It is typically spherical or pyramidal in shape.

Functions:

Nucleus: The nucleus houses the neuron's genetic material (DNA) and is responsible for regulating

the cell's activities, including growth, metabolism, and protein synthesis. The nucleus plays a

critical role in maintaining the health and functionality of the neuron.

Cytoplasm: The cytoplasm within the soma contains various organelles such as mitochondria (for

energy production), ribosomes (for protein synthesis), and the endoplasmic reticulum (for protein

and lipid synthesis). These organelles work together to sustain the neuron's metabolic needs.

Integration Center: The soma integrates incoming signals from the dendrites and determines

whether the neuron will generate an action potential (a nerve impulse). It acts as a decision-making

center by summing up excitatory and inhibitory inputs.

b. Dendrites

Structure: Dendrites are branched, tree-like extensions from the cell body. They are typically short

and highly branched, providing a large surface area for receiving signals from other neurons.

4
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
One. Introduction to Neural Networks

Functions:

Signal Reception: Dendrites are the primary sites for receiving chemical signals (neurotransmitters)

from the axon terminals of other neurons. These signals are received at specialized structures called

synapses, where the dendrites form connections with the presynaptic neuron.

Graded Potentials: When neurotransmitters bind to receptors on the dendrites, they cause changes

in the membrane potential of the dendrite. These changes are known as graded potentials, which

can be either excitatory (increasing the likelihood of an action potential) or inhibitory (decreasing

the likelihood of an action potential).

Signal Transmission: The graded potentials generated at the dendrites are transmitted toward the

soma. If the cumulative effect of these potentials reaches a certain threshold at the axon hillock,

an action potential will be triggered.

c. Axon Hillock

Structure: The axon hillock is a specialized region located at the junction between the soma and

the axon. It is often cone-shaped and has a high concentration of voltage-gated ion channels.

Functions:

Action Potential Initiation: The axon hillock is the site where the decision to initiate an action

potential is made. It is highly sensitive to changes in the membrane potential due to its abundance

of voltage-gated sodium channels.

Trigger Zone: If the membrane potential at the axon hillock reaches the threshold level, these

channels open, initiating an action potential. The axon hillock is thus referred to as the "trigger

zone" of the neuron.

5
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
One. Introduction to Neural Networks

d. Axon

Structure: The axon is a long, slender projection that extends from the axon hillock to the axon

terminals. Axons can vary in length, from a few millimeters to over a meter in some neurons. The

axon is often covered by a myelin sheath, which is interrupted at intervals by nodes of Ranvier.

Functions:

Signal Conduction: The primary function of the axon is to conduct the action potential away from

the soma and toward the axon terminals. The action potential is a rapid, all-or-nothing electrical

signal that travels along the axon without decreasing in strength.

Myelination: Many axons are wrapped in a myelin sheath, a fatty layer produced by glial cells

(oligodendrocytes in the central nervous system and Schwann cells in the peripheral nervous

system). Myelin insulates the axon, increasing the speed and efficiency of electrical signal

transmission.

Nodes of Ranvier: These are gaps in the myelin sheath along the axon. The action potential "jumps"

from one node to the next in a process called saltatory conduction, which significantly speeds up

signal transmission.

e. Axon Terminals (Synaptic Terminals)

Structure: The axon terminals, also known as synaptic terminals or terminal boutons, are the distal

ends of the axon. They are often branched, forming multiple connections with target cells, and

contain synaptic vesicles filled with neurotransmitters.

6
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
One. Introduction to Neural Networks

Functions:

Signal Transmission: When the action potential reaches the axon terminals, it triggers the opening

of voltage-gated calcium channels. The influx of calcium ions causes synaptic vesicles to fuse with

the presynaptic membrane and release their neurotransmitters into the synaptic cleft.

Neurotransmitter Release: The neurotransmitters diffuse across the synaptic cleft and bind to

receptors on the postsynaptic membrane (usually the dendrites of the next neuron). This binding

triggers a response in the postsynaptic neuron, either exciting or inhibiting it.

Synaptic Communication: The axon terminals are critical for synaptic communication between

neurons, allowing the transmission of information across the nervous system. The precise control

of neurotransmitter release and receptor activation ensures accurate signal transmission and

processing.

f. Myelin Sheath

Structure: The myelin sheath is a multilayered, lipid-rich covering that surrounds the axon of many

neurons. It is produced by glial cells—Schwann cells in the peripheral nervous system (PNS) and

oligodendrocytes in the central nervous system (CNS).

Functions:

Insulation: The myelin sheath acts as an electrical insulator, preventing the loss of electrical current

from the axon as the action potential travels along it. This insulation is essential for the rapid and

efficient transmission of signals.

7
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
One. Introduction to Neural Networks

Increased Conduction Speed: The myelin sheath allows for saltatory conduction, where the action

potential jumps between the nodes of Ranvier. This significantly increases the speed at which

electrical impulses are conducted along the axon.

Support and Protection: Myelin not only enhances the speed of conduction but also provides

structural support and protection to the axon.

g. Nodes of Ranvier

Structure: The nodes of Ranvier are small, unmyelinated gaps in the myelin sheath that occur at

regular intervals along the axon.

Functions:

Saltatory Conduction: The nodes of Ranvier are crucial for saltatory conduction, where the action

potential "jumps" from one node to the next. This process allows the action potential to propagate

rapidly along the axon.

Ion Exchange: The nodes contain a high density of voltage-gated sodium and potassium channels,

which are essential for the regeneration of the action potential as it travels down the axon.

h. Synapse

Structure: A synapse is the junction between two neurons, where the axon terminal of one neuron

(presynaptic neuron) meets the dendrite or soma of another neuron (postsynaptic neuron). The

synapse includes the presynaptic membrane, the synaptic cleft (the gap between the neurons), and

the postsynaptic membrane.

8
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
One. Introduction to Neural Networks

Functions:

Signal Transmission: The synapse is the site where electrical signals are converted into chemical

signals (neurotransmitters) and then back into electrical signals in the next neuron. This process

allows the transmission of information across the nervous system.

Neurotransmitter Release: When an action potential reaches the presynaptic terminal, it triggers

the release of neurotransmitters into the synaptic cleft. These neurotransmitters bind to receptors

on the postsynaptic membrane, initiating a response in the postsynaptic neuron.

Synaptic Plasticity: Synapses are dynamic structures that can strengthen or weaken over time, a

process known as synaptic plasticity. This ability to change is critical for learning, memory, and

adaptation.

i. Neurotransmitters

Structure: Neurotransmitters are chemical messengers stored in synaptic vesicles within the axon

terminals. They are released into the synaptic cleft in response to an action potential.

Functions:

Signal Transmission: Neurotransmitters are essential for transmitting signals across synapses.

They bind to specific receptors on the postsynaptic membrane, leading to either excitation or

inhibition of the postsynaptic neuron.

Diversity of Function: Different neurotransmitters have different effects. For example, glutamate

is an excitatory neurotransmitter that promotes action potentials, while gamma-aminobutyric acid

(GABA) is inhibitory and reduces the likelihood of action potential generation.

9
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
One. Introduction to Neural Networks

Reuptake and Degradation: After their release and action, neurotransmitters are typically removed

from the synaptic cleft by reuptake into the presynaptic neuron or degradation by enzymes. This

ensures that the signal is terminated appropriately.

k. Glial Cells

Structure: Glial cells are non-neuronal cells that provide support, insulation, and protection to

neurons. They include astrocytes, oligodendrocytes, Schwann cells, microglia, and ependymal

cells.

Functions:

Support: Astrocytes provide structural support to neurons, maintain the blood-brain barrier, and

regulate the chemical environment around neurons.

Myelination: Oligodendrocytes in the CNS and Schwann cells in the PNS produce the myelin

sheath that insulates axons.

Immune Defense: Microglia act as the immune cells of the CNS, removing debris and protecting

neurons from infection.

Cerebrospinal Fluid Production: Ependymal cells line the ventricles of the brain and produce

cerebrospinal fluid, which cushions the brain and spinal cord.

Information Flow in a Neuron: A Detailed Description

The information flow in a neuron, whether biological or artificial, involves a series of steps that

transform input signals into output signals. This process is crucial for the functioning of neural

10
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
One. Introduction to Neural Networks

networks, where multiple neurons work together to process information and make decisions.

Below is a detailed description of how information flows through a neuron, highlighting both the

biological and artificial perspectives.

Biological Neuron Information Flow

A biological neuron is a specialized cell in the nervous system that transmits information through

electrical and chemical signals. The flow of information in a biological neuron can be broken down

into several stages:

Dendrites: Receiving Signals

Dendrites are tree-like extensions from the neuron's cell body (soma) that receive signals from

other neurons. These signals arrive at the synapses, which are junctions where the axons of other

neurons connect to the dendrites.

The incoming signals are typically in the form of neurotransmitters, chemical messengers released

by the presynaptic neuron (the neuron sending the signal). These neurotransmitters bind to receptor

sites on the postsynaptic neuron’s dendrites, causing ion channels to open.

The binding of neurotransmitters causes changes in the electrical potential across the dendrite's

membrane, leading to the generation of graded potentials. These are small changes in the

membrane potential that occur in response to synaptic inputs.

11
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
One. Introduction to Neural Networks

Cell Body (Soma): Processing Signals

The cell body integrates the graded potentials received from the dendrites. If the cumulative signal

strength from all inputs exceeds a certain threshold, the neuron will "fire" an action potential. The

soma acts as the decision-making center, determining whether the incoming signals are strong

enough to trigger an output.

The signals that reach the soma are summed (spatial summation) or integrated over time (temporal

summation). If the combined signal (the depolarization of the cell membrane) reaches the threshold

level, it initiates an action potential. If it does not reach the threshold, the signal dissipates, and no

action potential is generated.

Axon Hillock: Initiating the Action Potential

The axon hillock, located at the junction of the soma and the axon, is the critical region where the

action potential is initiated. If the membrane potential at the axon hillock reaches the threshold,

voltage-gated ion channels open, allowing ions to flow across the membrane.

The action potential is an all-or-nothing electrical signal that propagates along the axon. Once

initiated, the action potential travels down the length of the axon without losing strength.

Axon: Transmitting the Signal

The action potential travels along the axon, a long, slender projection of the neuron, to the axon

terminals. The movement is facilitated by the opening and closing of voltage-gated sodium and

potassium channels along the axon membrane, which create a wave of electrical depolarization.

12
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
One. Introduction to Neural Networks

Many axons are covered by a myelin sheath, a fatty layer that insulates the axon and speeds up

signal transmission. The action potential jumps from one node of Ranvier (gaps in the myelin

sheath) to the next, in a process called saltatory conduction, making the transmission more efficient.

Axon Terminals: Sending the Signal

When the action potential reaches the axon terminals, it triggers the release of neurotransmitters

into the synaptic cleft (the small gap between the axon terminal and the dendrite of the next neuron).

This is done through the process of exocytosis, where synaptic vesicles containing

neurotransmitters fuse with the presynaptic membrane and release their contents.

The neurotransmitters cross the synaptic cleft and bind to receptors on the postsynaptic neuron’s

dendrites, thus continuing the cycle of information flow.

Artificial Neuron Information Flow

An artificial neuron, a fundamental unit of artificial neural networks, models the basic information

processing function of a biological neuron. The flow of information in an artificial neuron involves

several key steps:

Input Layer: Receiving Signals

Inputs: The artificial neuron receives multiple inputs, each representing a feature or signal from

external data. These inputs are often represented as numerical values and are analogous to the

signals received by the dendrites in a biological neuron.

13
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
One. Introduction to Neural Networks

Weights: Each input (𝑥𝑖 ) is associated with a weight (𝑤𝑖 ), which determines the importance or

influence of that input on the neuron’s output. These weights are adjustable parameters learned

during the training process. These will be studied in detail in subsequent lecturers.

Summation: Aggregating Inputs

Weighted Sum: The artificial neuron computes a weighted sum of all inputs, similar to the

integration of inputs in the biological neuron’s soma. The weighted sum is calculated as:

[𝑧 = ∑ 𝑤𝑖 𝑥𝑖 + 𝑏]
𝑖=1

where:

( 𝑧 ) is the weighted sum,

(𝑤𝑖 ) is the weight of the (𝑖 𝑡ℎ ) input,

(𝑥𝑖 ) is the value of the (𝑖 𝑡ℎ ) input, and

( 𝑏 ) is the bias term, which allows the neuron to shift the activation function and provides

additional flexibility.

14
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
One. Introduction to Neural Networks

Activation Function: Processing the Signal

Non-linearity: The weighted sum ( 𝑧 ) is passed through an activation function (ϕ(𝑧)) , which

introduces non-linearity into the model. The activation function determines whether the neuron

will activate (produce a significant output) or not.

Common Activation Functions include Threshold function, Step Function, Sigmoid Function,

ReLU (Rectified Linear Unit), Tanh Function.

Output: Sending the Signal

Output Generation: The result of the activation function (𝑦 = ϕ(𝑧)) is the output of the neuron.

This output can be passed to other neurons in subsequent layers of the network or can represent

the final prediction or classification in simpler networks.

Propagation: In a multi-layer network, the output of one neuron becomes the input to neurons in

the next layer, continuing the flow of information through the network until the final output layer

is reached. All of these will be studied in detail in subsequent lecturers.

Learning and Adjustment

Training: During the learning process, the network adjusts the weights and biases based on the

error between the predicted output and the actual target (in supervised learning). This adjustment

is typically done using optimization algorithms like gradient descent.

15
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
One. Introduction to Neural Networks

Backpropagation: For multi-layer networks, the error is propagated backward through the network

using a method called backpropagation, which updates the weights to minimize the overall error

in the network’s predictions.

1.4 Artificial Neuron Model

An artificial neuron, also known as a perceptron in its simplest form, is a mathematical model

that mimics the behavior of a biological neuron. It is the basic unit of a neural network, designed

to process inputs and produce an output based on a set of learned parameters.

Components of an Artificial Neuron:

Inputs: Analogous to the dendrites in biological neurons, artificial neurons receive multiple inputs,

each representing a feature of the data.

Weights: Each input is associated with a weight, which represents the strength of the input's

influence on the neuron. These weights are learned during the training process.

Summation Function: The inputs are combined by calculating the weighted sum, which is then

passed to the activation function.

Activation Function: This function determines whether the neuron will "fire" (produce an output).

Common activation functions include the sigmoid, hyperbolic tangent (tanh), and rectified linear

unit (ReLU).

Output: The result of the activation function is the neuron's output, which can be fed into other

neurons in the network or serve as the final prediction.

The mathematical representation of an artificial neuron is illustrated in the neuron equation.

16
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
One. Introduction to Neural Networks

McCulloch-Pitts Neuron

1.5 Notations

To formalize the discussion of neural networks, we introduce the following notations:

Scalar variables ( 𝑥𝑖 ), where ( 𝑖 ) is an index representing each scalar in the series. These variables

can be added together to give a final scalar sum:

[𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛]

Here, ( x1 , x2 , x3 , … , xn ) are individual scalar values, and ( 𝑛 ) is the total number of scalars in

the series. The sum of these scalar variables can be expressed as:

[Sum = ∑ 𝑥𝑖 = 𝑥1 + 𝑥2 + 𝑥3 + ⋯ + 𝑥𝑛 ]
𝑖=1

This sum is a scalar number that results from adding all the (𝑥𝑖 ) values together.

For vectors,

Let 𝑥 represent the input vector, where (𝑥 = [𝑥1 , 𝑥2 , … , 𝑥𝑛 ]).

𝑤 denotes the weight vector, where (𝑤 = [𝑤1 , 𝑤2 , … , 𝑤𝑛 ]).

17
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
One. Introduction to Neural Networks

𝑏 represents the bias term, a constant added to the weighted sum of inputs to allow for shifting the

activation function.

The output of the neuron is denoted by 𝑦.

Also, the following is a series of row vector variables, where each vector is represented as a

( 1 × 𝑛 ) row vector. These vectors are ordered sets of related numbers:

𝑥1 = [𝑥11 𝑥12 𝑥13 ⋯ 𝑥1𝑛 ]

𝑥2 = [𝑥21 𝑥22 𝑥23 ⋯ 𝑥2𝑛 ]

𝑥3 = [𝑥31 𝑥32 𝑥33 ⋯ 𝑥3𝑛 ]

𝑥𝑚 = [𝑥𝑚1 𝑥𝑚2 𝑥𝑚3 ⋯ 𝑥𝑚𝑛 ]

Here:

(𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑚 ) are the row vectors.

Each row vector (𝑥𝑖 ) has ( 𝑛 ) components, denoted as (𝑥𝑖1 , 𝑥𝑖2 , … , 𝑥𝑖𝑛 ), where ( 𝑖 ) indicates the

vector number and ( 𝑛 ) represents the number of elements in each row vector.

( 𝑚 ) is the total number of row vectors in the series.

These row vectors form an ordered set of related numbers and can be used in various mathematical

operations, such as matrix multiplication or vector addition.

18
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
One. Introduction to Neural Networks

Neuron Equation

The output of an artificial neuron can be expressed mathematically by the following equation:

[𝑦 = 𝜙 (∑ 𝑤𝑖 𝑥𝑖 + 𝑏)]
𝑖=1

Here:

(∑𝑛𝑖=1 𝑤𝑖 𝑥𝑖 ) represents the weighted sum of the inputs.

b is the bias, which allows for greater flexibility in the model.

(𝜙(⋅)) is the activation function, which introduces non-linearity into the model, enabling the

network to capture complex patterns.

19
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Two. Model of an Artificial Neuron

2. Model of an Artificial Neuron

Artificial neurons are the fundamental building blocks of neural networks, which are

computational models inspired by the biological neural networks in the human brain.

Understanding the model of an artificial neuron is crucial for comprehending how neural networks

learn from data, make decisions, and perform complex tasks. We will explore the basic elements

of an artificial neuron, focusing on its core components, including the activation function. We will

also examine various types of activation functions, such as the threshold function, piecewise linear

function, and sigmoidal function, and provide practical examples to illustrate their use.

2.1 Basic Elements of an Artificial Neuron

An artificial neuron, often referred to as a perceptron in its simplest form, is a mathematical

abstraction that simulates the behavior of a biological neuron. It consists of several key elements

that work together to process inputs and produce an output.

2.1.1 Inputs and Weights

An artificial neuron receives multiple inputs, each representing a feature of the data. These inputs

are denoted as (𝑥1 , 𝑥2 , … , 𝑥𝑛 ), where ( 𝑛 ) is the number of inputs. Each input is associated with a

weight (𝑤1 , 𝑤2 , … , 𝑤𝑛 ) , which determines the significance of the corresponding input. The

weights are learnable parameters that are adjusted during the training process to optimize the

neuron's performance.

2.1.2 Weighted Sum

The neuron computes a weighted sum of the inputs, which can be expressed as:

20
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Two. Model of an Artificial Neuron

[𝑧 = ∑ 𝑤𝑖 𝑥𝑖 + 𝑏]
𝑖=1

where ( 𝑧 ) is the weighted sum, and ( 𝑏 ) is the bias term. The bias allows the model to shift the

activation function, providing greater flexibility in capturing patterns in the data.

2.1.3 Activation Function

The weighted sum ( 𝑧 ) is passed through an activation function (ϕ(𝑧)) , which determines

whether the neuron will "fire" or produce an output. The activation function introduces non-

linearity into the model, enabling the network to learn complex relationships. The output of the

neuron is thus:

[𝑦 = ϕ(𝑧)]

where ( 𝑦 ) is the final output of the neuron.

2.2 Activation Functions

The activation function is a critical component of an artificial neuron. It defines how the weighted

sum of inputs is transformed into an output that can be passed to the next layer in the network or

used as the final prediction. Various activation functions serve different purposes and have unique

properties that make them suitable for different types of tasks. In this section, we will discuss three

common activation functions: the threshold function, the piecewise linear function, and the

sigmoidal function.

21
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Two. Model of an Artificial Neuron

2.2.1 Threshold Function

The threshold function, also known as the step function, is one of the simplest activation functions.

It outputs a binary value based on whether the input exceeds a certain threshold.

Definition

The threshold function (ϕ(𝑧)) is defined as:

1 𝑖𝑓 𝑧 ≥ 𝜃
𝜑(𝑧) = {
0 𝑖𝑓 𝑧 < 𝜃

where ( θ) is the threshold value.

Characteristics and Example

The threshold function is used in binary classification tasks where the objective is to classify inputs

into one of two categories. For example, in a simple binary classification problem such as email

spam detection, the neuron might output 1 (spam) if the input features (e.g., frequency of certain

words) exceed a certain threshold and 0 (not spam) otherwise.

However, the threshold function has limitations. It is not differentiable, making it unsuitable for

training neural networks using gradient-based optimization methods like backpropagation.

Additionally, it cannot capture the subtleties in data where small changes in inputs should lead to

small changes in output.

2.2.2 Piecewise Linear Function

The piecewise linear function, also known as the rectified linear unit (ReLU) when applied in its

most common form, is another popular activation function. It addresses some of the limitations of

the threshold function by allowing the output to vary continuously with the input.
22
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Two. Model of an Artificial Neuron

Definition

The ReLU function (ϕ(𝑧)) is defined as:

𝑧 𝑖𝑓 𝑧 ≥ 0
𝜑(𝑧) = {
0 𝑖𝑓 𝑧 < 0

This can be visualized as a function that is linear for positive inputs and zero for negative inputs.

Characteristics and Example

ReLU is widely used in deep learning because it is simple and computationally efficient. It

introduces non-linearity while retaining differentiability for positive inputs, making it compatible

with gradient-based optimization techniques. Additionally, ReLU helps mitigate the vanishing

gradient problem, a common issue with other activation functions, particularly in deep networks.

For instance, in an image classification task, ReLU can be applied to the output of convolutional

layers, allowing the network to learn more complex features by activating neurons only when the

weighted input sum is positive.

2.2.3 Sigmoidal Function

The sigmoidal function, or sigmoid function, is a smooth, S-shaped function that maps any input

to a value between 0 and 1. This makes it particularly useful for tasks where outputs need to be

interpreted as probabilities.

Definition

The sigmoid function (σ(𝑧)) is defined as:

1
[σ(𝑧) = ]
1 + 𝑒 −𝑧

23
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Two. Model of an Artificial Neuron

where ( 𝑒 ) is the base of the natural logarithm.

Characteristics and Example

The sigmoid function is commonly used in binary classification tasks where the output represents

the probability of a particular class. For example, in a medical diagnosis system, a neuron with a

sigmoid activation function might output the probability that a patient has a certain disease based

on input features such as age, symptoms, and test results.

One of the key advantages of the sigmoid function is its differentiability, which allows for the

application of gradient-based optimization methods. However, it is susceptible to the vanishing

gradient problem, particularly when the input values are very large or very small, causing the

gradients to become very small and slow down the learning process.

2.3 Example of Artificial Neuron with Different Activation Functions

Question: Consider a simple neuron with two inputs (𝑥1 ) and (𝑥2 ), weights (𝑤1 ) and (𝑤2 ), and
1
a bias ( 𝑏 ). Suppose the activation function is a sigmoid function (𝜎(𝑧) = 1+𝑒 −𝑧 ).

Solution:

The output of the neuron is calculated as follows:

[𝑦 = 𝜎(𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏)]

If (𝑤1 = 0.5), (𝑤2 = −1.2), (𝑥1 = 0.8), (𝑥2 = 0.4), 𝑎𝑛𝑑(𝑏 = 0.2), then:

[𝑦 = 𝜎((0.5 × 0.8) + (−1.2 × 0.4) + 0.2) = 𝜎(0.4 − 0.48 + 0.2) = 𝜎(0.12)]

24
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Two. Model of an Artificial Neuron

Finally, applying the sigmoid function:

1
[𝑦 = ≈ 0.53]
1 + 𝑒 −0.12

This output ( 𝑦 ) can then be used as an input to other neurons in a more complex network or serve

as the final prediction in a simple model.

Question: Given a simple neuron with inputs (𝑥1 = 2) and (𝑥2 = 3) , weights (𝑤1 = 0.5) and

(𝑤2 = −1), and a bias ( 𝑏 = 1 ), calculate the output of the neuron using the ReLU activation

function.

Solution:

1. Calculate the weighted sum (z): [𝑧 = 𝑤1 ⋅ 𝑥1 + 𝑤2 ⋅ 𝑥2 + 𝑏]

Substituting the given values:

[𝑧 = (0.5 ⋅ 2) + (−1 ⋅ 3) + 1]

[ 𝑧 = 1 − 3 + 1 ]

[ 𝑧 = −1 ]

2. Apply the ReLU activation function:

The ReLU (Rectified Linear Unit) function is defined as:

[ReLU(𝑧) = max(0, 𝑧)]

25
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Two. Model of an Artificial Neuron

Since ( 𝑧 = −1 ):

[ReLU(−1) = max(0, −1) = 0]

The output of the neuron is ( 0 ).

Questions: Consider a simple neuron with two inputs (𝑥1 ) and (𝑥2 ), weights (𝑤1 ) and (𝑤2 ), and

a bias ( 𝑏 ). The activation function for the neuron is the ReLU (Rectified Linear Unit) function,

which is defined as:

[ReLU(𝑧) = max(0, 𝑧)]

Given the following values:

(𝑥1 = 1.5)

(𝑥2 = −2.0)

(𝑤1 = 0.8)

(𝑤2 = −1.3)

( 𝑏 = 0.5 )

Calculate the output of the neuron.

Solution:

To calculate the output of the neuron, we follow these steps:

Step 1: Compute the Weighted Sum ( 𝑧 )

26
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Two. Model of an Artificial Neuron

The weighted sum ( 𝑧 ) is calculated using the formula:

[𝑧 = 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏]

Substituting the given values:

[𝑧 = (0.8 × 1.5) + (−1.3 × −2.0) + 0.5]

Calculate each term:

[0.8 × 1.5 = 1.2]

[−1.3 × −2.0 = 2.6]

So, the weighted sum ( 𝑧 ) becomes:

[𝑧 = 1.2 + 2.6 + 0.5 = 4.3]

Step 2: Apply the ReLU Activation Function

The ReLU activation function is defined as:

[ReLU(𝑧) = max(0, 𝑧)]

Substitute the calculated value of ( 𝑧 = 4.3 ):

[ReLU(4.3) = max(0,4.3) = 4.3]

Step 3: Output of the Neuron

The output of the neuron after applying the ReLU activation function is:

[Output = 4.3]

The output of the neuron is ( 4.3 ).

27
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Two. Model of an Artificial Neuron

Question: Consider a simple neuron with two inputs (𝑥1 ) and (𝑥2 ), weights (𝑤1 ) and (𝑤2 ), and

a bias ( 𝑏 ). The activation function for the neuron is a Threshold Function where the threshold

( θ) is 0. The Threshold Function is defined as:

1 𝑖𝑓 𝑧 ≥ 0
𝑦 ={
0 𝑖𝑓 𝑧 < 0

Given the following values:

(𝑥1 = 1.5)

(𝑥2 = −2.0)

(𝑤1 = 0.8)

(𝑤2 = −1.3)

( 𝑏 = 0.5 )

Calculate the output of the neuron.

Solution:

To calculate the output of the neuron, we follow these steps:

Step 1: Compute the Weighted Sum ( 𝑧 )

The weighted sum ( 𝑧 ) is calculated using the formula:

[𝑧 = 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏]

Substitute the given values into the equation:

28
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Two. Model of an Artificial Neuron

[𝑧 = (0.8 × 1.5) + (−1.3 × −2.0) + 0.5]

Calculate each term:

[0.8 × 1.5 = 1.2]

[−1.3 × −2.0 = 2.6]

So, the weighted sum ( 𝑧 ) becomes:

[𝑧 = 1.2 + 2.6 + 0.5 = 4.3]

Step 2: Apply the Threshold Activation Function

The Threshold Function is defined as:

1 𝑖𝑓 𝑧 ≥ 0
𝑦 ={
0 𝑖𝑓 𝑧 < 0

Since the calculated value of ( 𝑧 = 4.3 ) is greater than 0, we apply the threshold function:

[𝑦 = 1]

Step 3: Output of the Neuron

The output of the neuron is ( 𝑦 = 1 ).

29
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Three. Neural Network Architectures

3. Neural Network Architectures

Neural network architectures define the structure and connectivity of the artificial neurons within

a network. The architecture of a neural network plays a critical role in determining its capacity to

model complex relationships, its computational efficiency, and its ability to generalize from data.

Understanding different neural network architectures is essential for designing models that are

well-suited to specific tasks. We will explore three fundamental neural network architectures: the

single-layer feed-forward network, the multi-layer feed-forward network, and the recurrent

network. We will discuss their structures, functionality, and practical applications.

3.1 Single Layer Feed-Forward Network

Structure and Functionality

The single-layer feed-forward network, also known as a perceptron, is the most basic type of neural

network architecture. It consists of a single layer of output neurons directly connected to a layer

of input neurons. Each input neuron is connected to every output neuron through a weighted link,

and there is no hidden layer between the input and output layers.

Architecture of a single layer perceptron

30
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Three. Neural Network Architectures

In a single-layer feed-forward network:

▪ The input layer receives the raw input data, which is then passed directly to the output layer

through weighted connections.

▪ The output layer consists of one or more neurons, each computing a weighted sum of the

inputs, followed by the application of an activation function to produce the final output.

Example

Consider a simple binary classification problem where the task is to classify points in a two-

dimensional space as belonging to one of two classes (e.g., red or blue). A single-layer feed-

forward network with two input neurons (representing the coordinates of the points) and one output

neuron (representing the class label) can be used for this task.

Given inputs (𝑥1 ) and (𝑥2 ), the output ( 𝑦 ) of the network is:

[𝑦 = ϕ(𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏)]

where (𝑤1 ) and (𝑤2 ) are the weights, ( 𝑏 ) is the bias, and (ϕ(⋅)) is the activation function,

typically a threshold function or a sigmoid function.

Example:

You are tasked with classifying points in a two-dimensional space into one of two classes: red

(Class 0) or blue (Class 1). The classification is done using a single-layer feed-forward neural

network with two input neurons and one output neuron.

31
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Three. Neural Network Architectures

Given:

Input Points (coordinates):

Point 1: (2,3)

Point 2: (1,1)

Point 3: (4,5)

Point 4: (3,2)

Class Labels (target output for each point):

Point 1: Class 1 (blue)

Point 2: Class 0 (red)

Point 3: Class 1 (blue)

Point 4: Class 0 (red)

Initial Weights and Bias:

w1 =0.2 (weight for the first input)

w2 =0.4 (weight for the second input)

b=0.1 (bias term)

Activation Function: We will use a step function to determine the output:

1 if z ≥ 0
output = {
0 if z < 0

where 𝑧 = 𝑤1𝑥1 + 𝑤2𝑥2 + 𝑏

32
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Three. Neural Network Architectures

Question:

For each point, calculate the weighted sum 𝑧 and determine the predicted class (0 for red, 1 for

blue) using the step function.

Compare the predicted classes with the actual class labels and determine if the network correctly

classified each point.

Step-by-Step Solution:

Point 1: (2,3), Class 1 (blue)

Weighted Sum: 𝑧=(0.2⋅2)+(0.4⋅3)+0.1=0.4+1.2+0.1=1.7

Prediction:

Since z=1.7≥0, predicted class =1

Comparison: Actual class = 1, so the classification is correct.

Point 2: (1,1), Class 0 (red)

Weighted Sum: z=(0.2⋅1)+(0.4⋅1)+0.1=0.2+0.4+0.1=0.7

Prediction:

Since z=0.7≥0, predicted class =1

Comparison: Actual class = 0, so the classification is incorrect.

Point 3: (4,5), Class 1 (blue)

Weighted Sum:

33
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Three. Neural Network Architectures

z=(0.2⋅4)+(0.4⋅5)+0.1=0.8+2.0+0.1=2.9

Prediction:

Since z=2.9≥0, predicted class =1

Comparison: Actual class = 1, so the classification is correct.

Point 4: (3,2), Class 0 (red)

Weighted Sum:

z=(0.2⋅3)+(0.4⋅2)+0.1=0.6+0.8+0.1=1.5

Prediction:

Since z=1.5≥0, predicted class =1

Comparison: Actual class = 0, so the classification is incorrect.

Therefore:

The network correctly classified 2 points (Point 1 and Point 3).

The network incorrectly classified 2 points (Point 2 and Point 4).

To improve classification, the weights and bias can be adjusted using a learning algorithm such as

gradient descent based on the error between the predicted and actual outputs.

34
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Three. Neural Network Architectures

Limitations single-layer feed-forward networks

While single-layer feed-forward networks are simple and computationally efficient, they are

limited in their capacity to solve complex problems. Specifically, they can only solve linearly

separable problems—problems where a straight line (or hyperplane in higher dimensions) can

separate the classes. For example, a single-layer network would struggle to classify points that are

not linearly separable, such as those arranged in concentric circles or an XOR pattern.

To overcome these limitations, more complex architectures, such as multi-layer feed-forward

networks, are employed.

3.2 Multi-Layer Feed-Forward Network

Structure and Functionality

A multi-layer feed-forward network, also known as a multi-layer perceptron (MLP), extends the

single-layer architecture by introducing one or more hidden layers between the input and output

layers. These hidden layers allow the network to model more complex, non-linear relationships in

the data.

In a multi-layer feed-forward network:

▪ The input layer receives the raw data, which is passed to the first hidden layer through

weighted connections.

▪ Each neuron in the hidden layer computes a weighted sum of its inputs, applies an

activation function, and passes the result to the next layer.

35
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Three. Neural Network Architectures

▪ The process is repeated across all hidden layers, with the final layer (output layer)

producing the network's prediction.

The introduction of hidden layers enables the network to learn hierarchical representations of the

data, where each successive layer captures increasingly abstract features.

Architecture of a multi-layer perceptron with hidden layer

Example

Consider the task of recognizing handwritten digits (0–9) from images. A typical multi-layer feed-

forward network for this task might have:

▪ An input layer with 784 neurons, each corresponding to a pixel in a 28 x 28 grayscale

image.

▪ One or more hidden layers, each containing a few hundred neurons, responsible for

learning features such as edges, shapes, and digit contours.

▪ An output layer with 10 neurons, each representing one of the digits (0–9).

36
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Three. Neural Network Architectures

The network is trained using a labeled dataset, such as the Modified National Institute of Standards

and Technology (MNIST) dataset, where each image is labeled with the correct digit. During

training, the network adjusts its weights through backpropagation, minimizing the difference

between its predicted outputs and the actual labels.

Advantages and Applications

Multi-layer feed-forward networks are powerful because they can solve both linearly and non-

linearly separable problems. The hidden layers enable the network to capture complex patterns and

interactions in the data. This makes MLPs suitable for a wide range of applications, including

image and speech recognition, natural language processing, and financial forecasting.

However, training multi-layer networks can be computationally intensive and requires careful

tuning of hyperparameters, such as the number of layers, number of neurons per layer, learning

rate, and choice of activation function.

3.3 Recurrent Network

Structure and Functionality

Recurrent networks (RNNs) are a class of neural network architectures designed to model

sequential data, where the order of the data points is important. Unlike feed-forward networks,

which process each input independently, RNNs have connections that allow them to retain

information from previous inputs, enabling them to capture temporal dependencies.

37
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Three. Neural Network Architectures

In an RNN:

▪ Each neuron in the network has a connection not only to the neurons in the next layer but

also to itself or neurons in the previous layers, allowing information to be passed from one

time step to the next.

▪ The network processes input sequences one element at a time, with the hidden state of the

network updating at each time step to incorporate information from both the current input

and the previous hidden state.

This structure makes RNNs particularly well-suited for tasks where context or memory is essential,

such as language modeling, time series prediction, and sequence generation.

Example

Consider the task of language modeling, where the goal is to predict the next word in a sentence

based on the preceding words. An RNN can be used to achieve this by processing each word in

the sentence sequentially and updating its hidden state to reflect the context provided by the

previous words.

For instance, given the input sequence "The cat sits on the," the RNN would process each word in

turn, updating its hidden state after each word. By the time it reaches the word "the," the hidden

state would contain information about the entire preceding context, enabling the RNN to predict

the next word, such as "mat."

38
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Three. Neural Network Architectures

Variants of Recurrent Networks

While simple RNNs can capture short-term dependencies, they struggle with long-term

dependencies due to issues like vanishing and exploding gradients during training (diminishing or

excessive gradient models (vectors)). To address these limitations, several advanced RNN

architectures have been developed:

Long Short-Term Memory (LSTM): LSTMs introduce memory cells that can store information

over long periods, along with gating mechanisms (input, output, and forget gates) to control the

flow of information. This allows LSTMs to capture long-term dependencies more effectively than

standard RNNs.

Gated Recurrent Unit (GRU): GRUs are a simplified version of LSTMs, combining the forget and

input gates into a single update gate. They are computationally more efficient while still addressing

the long-term dependency problem.

Applications

RNNs, particularly LSTMs and GRUs, are widely used in applications where sequence data is

prevalent. Some examples include:

Natural Language Processing (NLP): RNNs are used for tasks like machine translation, text

generation, sentiment analysis, and speech recognition.

Time Series Analysis: RNNs can model and predict time-dependent phenomena, such as stock

prices, weather patterns, and sensor data.

39
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Three. Neural Network Architectures

Music Generation: RNNs can be used to generate sequences of musical notes, creating new

compositions that follow the structure of existing music.

40
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Four. Learning Methods in Neural Networks

4. Learning Methods in Neural Networks

Learning methods in neural networks are fundamental to how these models adapt and improve

their performance based on data. The learning process involves adjusting the parameters of the

network—primarily the weights and biases—so that the network can accurately predict or classify

unseen data. Different learning algorithms are used depending on the nature of the data and the

task at hand. We will discuss three primary categories of learning methods in neural networks:

unsupervised learning, supervised learning, and reinforced learning. Each category encompasses

distinct algorithms that are tailored to specific types of learning scenarios.

4.1 Learning Algorithms

4.1.1 Unsupervised Learning

Unsupervised learning is a type of machine learning where the model is trained on a dataset without

explicit labels or target outputs (i.e. the desired outcomes or predictions). The objective is for the

network to discover underlying patterns, structures, or features in the data. This method is

particularly useful in exploratory data analysis, clustering, and dimensionality reduction. Two

prominent unsupervised learning algorithms in neural networks are Hebbian learning and

competitive learning.

Hebbian Learning

Hebbian learning is one of the oldest and most fundamental learning rules, proposed by Donald

Hebb in 1949. It is based on the principle that "neurons that fire together, wire together." This

means that the connection between two neurons is strengthened if they are activated

41
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Four. Learning Methods in Neural Networks

simultaneously. Hebbian learning is often used in the context of associative learning, where the

goal is to form associations between different inputs.

Characteristics:

Synaptic Weight Update: The weight between two neurons increases if both neurons are activated

at the same time. Mathematically, this can be expressed as:

[Δ𝑤𝑖𝑗 = η ⋅ 𝑥𝑖 ⋅ 𝑥𝑗 ]

where (Δ𝑤𝑖𝑗 ) is the change in the weight between neurons ( 𝑖 ) and ( 𝑗 ), ( η) is the learning rate,

and (𝑥𝑖 ) and (𝑥𝑗 ) are the activations of the respective neurons.

Learning Process: Over time, Hebbian learning causes the network to strengthen connections

between frequently co-occurring patterns, leading to the formation of memory traces in the

network.

Example: Consider a simple neural network designed to recognize visual patterns, such as faces.

If a particular configuration of inputs (e.g., certain edges or features) frequently occurs together

when a face is presented, Hebbian learning will strengthen the connections between the neurons

responsible for detecting these features. Over time, the network becomes more sensitive to faces,

even if the exact combination of features varies slightly.

Competitive Learning

Competitive learning is another unsupervised learning algorithm where neurons in a network

compete to respond to input data. Typically, only one neuron—or a small subset of neurons—

42
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Four. Learning Methods in Neural Networks

"wins" and is allowed to update its weights. This mechanism leads to a process known as "winner-

takes-all."

Characteristics:

Competition: Neurons in the network compete with each other based on their response to the input.

The neuron with the highest response is selected as the winner.

Weight Update: The weights of the winning neuron are adjusted to be more similar to the input

vector, reinforcing the neuron's specialization in recognizing similar inputs in the future.

[Δ𝑤𝑗 = η ⋅ (𝑥 − 𝑤𝑗 )]

where (𝑤𝑗 ) represents the weight vector of the winning neuron, and ( 𝑥 ) is the input vector.

Example: Competitive learning is often used in clustering tasks, such as self-organizing maps

(SOMs). For instance, if a network is trained on color data (e.g., RGB values), competitive learning

will cause different neurons to specialize in different colors. Eventually, the network might form

clusters corresponding to various color regions, effectively organizing the color space.

4.1.2 Supervised Learning

Supervised learning is a type of learning where the model is trained on a labeled dataset, meaning

that each input is associated with a corresponding target output. The network learns to map inputs

to outputs by minimizing the difference between its predictions and the actual labels. Two common

supervised learning algorithms are stochastic learning and gradient descent learning.

43
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Four. Learning Methods in Neural Networks

Stochastic Learning

Stochastic learning, also known as online learning, is a variant of gradient descent where the model

parameters are updated after each training example rather than after processing the entire dataset.

This approach introduces randomness into the learning process, which can help the network escape

local minima and converge more quickly.

Characteristics:

Incremental Updates: Weights are updated incrementally after each training example, making the

learning process more dynamic and adaptable.

Faster Convergence: Stochastic learning can converge faster than batch gradient descent,

especially in large datasets, because it updates the weights more frequently.

Noise and Stability: The randomness introduced by updating the weights after each example can

introduce noise, but it can also help in exploring the solution space more effectively.

Example: Consider a neural network trained to predict house prices based on features such as

square footage, number of bedrooms, and location. Using stochastic learning, the network updates

its weights after each individual example (e.g., a specific house sale). This allows the network to

quickly adapt to new patterns as it processes more data, potentially leading to faster convergence.

44
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Four. Learning Methods in Neural Networks

Gradient Descent Learning

Gradient descent is one of the most widely used optimization algorithms in supervised learning. It

works by iteratively adjusting the network's weights in the direction that minimizes a loss function,

typically the mean squared error or cross-entropy loss.

Characteristics:

Loss Function: The network computes a loss function that measures the difference between its

predictions and the actual labels. The goal is to minimize this loss.

Gradient Calculation: The gradient of the loss function with respect to each weight is calculated,

indicating the direction and magnitude of the steepest ascent.

Weight Update: Weights are updated in the opposite direction of the gradient to move the network

closer to the minimum loss:

[𝑤 = 𝑤 − η ⋅ ∇𝐿(𝑤)]

where (∇𝐿(𝑤)) is the gradient of the loss function with respect to the weights, and ( η) is the

learning rate.

Example: In a network trained to classify images of animals, gradient descent learning would

involve calculating the gradients of the loss function with respect to the network's weights after

processing a batch of images. The weights are then updated to reduce the loss, improving the

network's accuracy in classifying new images.

45
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Four. Learning Methods in Neural Networks

4.1.3 Reinforced Learning

Reinforced learning, or reinforcement learning, is a type of learning where the model learns to

make decisions by interacting with an environment. The model, often called an agent, receives

feedback in the form of rewards or punishments based on the actions it takes. The goal is to learn

a policy that maximizes cumulative rewards over time.

Characteristics:

Agent and Environment: The agent interacts with an environment, making decisions (actions)

based on the current state. The environment responds by providing a new state and a reward signal.

Reward Signal: The reward signal indicates the success of the action taken. Positive rewards

encourage the agent to repeat certain actions, while negative rewards discourage them.

Policy Learning: The agent learns a policy, which is a strategy for selecting actions based on states,

to maximize long-term rewards.

Example: A classic example of reinforced learning is training a model to play a game, such as

chess or Go. The agent (player) makes moves in the game (actions) based on the current board

configuration (state). After each move, the agent receives feedback (reward), such as gaining

points or capturing an opponent's piece. Over time, the agent learns to make better decisions that

maximize its chances of winning the game.

Exploration vs. Exploitation

A central challenge in reinforced learning is the trade-off between exploration (trying new actions

to discover their effects) and exploitation (choosing actions known to yield high rewards).

Balancing these two aspects is crucial for effective learning.

46
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Four. Learning Methods in Neural Networks

Real-World Application:

Reinforced learning is widely used in autonomous systems, such as self-driving cars. The car

(agent) learns to navigate roads by interacting with its environment (traffic, pedestrians, obstacles).

It receives rewards for safe driving and penalties for risky behaviors. Over time, the car learns to

drive efficiently and safely by maximizing its cumulative rewards.

47
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Five. Taxonomy of Neural Network Systems

5. Taxonomy of Neural Network Systems

Neural networks have evolved into a diverse and powerful set of tools used in numerous fields,

including computer vision, natural language processing, and autonomous systems. The diversity

of neural network systems stems from variations in their architectures, learning methods, and

applications. To effectively navigate this complex filed, it is essential to understand the taxonomy

of neural network systems. Therefore, we will explore popular neural network systems and classify

them based on their learning methods and architecture types. This taxonomy provides a structured

approach to understanding the wide range of neural networks and their appropriate use cases.

5.1 Taxonomy of Neural Network Systems

Taxonomy in the context of neural networks refers to the systematic categorization of these

systems based on certain characteristics such as architecture, learning methods, and applications.

This classification aids in the identification and selection of the appropriate neural network for a

given problem.

5.1.1 Popular Neural Network Systems

Over the years, several neural network systems have gained prominence due to their success in

solving complex tasks. Below are some of the most popular neural network systems:

Convolutional Neural Networks (CNNs)

Convolutional Neural Networks (CNNs) are specifically designed for processing structured grid

data, such as images. They have become the standard architecture for tasks involving image and

video processing.

48
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Five. Taxonomy of Neural Network Systems

Key Characteristics:

Convolutional Layers: These layers apply filters (convolutions) to the input image, extracting

features like edges, textures, and patterns.

Pooling Layers: These layers reduce the spatial dimensions of the data, making the network more

computationally efficient and less sensitive to minor variations in the input.

Fully Connected Layers: After feature extraction, fully connected layers are used to perform

classification or regression tasks.

Example: CNNs are widely used in facial recognition systems, where they learn to identify unique

features of faces from images, enabling them to accurately recognize individuals in various

conditions.

Recurrent Neural Networks (RNNs)

Recurrent Neural Networks (RNNs) are designed for sequence data, where the order of inputs is

crucial. They are widely used in tasks involving time series, language processing, and sequential

data analysis.

Key Characteristics:

Recurrent Connections: RNNs have connections that allow information to be passed from one time

step to the next, enabling the network to retain memory of previous inputs.

Sequence Prediction: RNNs are adept at tasks where the output at each time step depends on both

the current input and the preceding inputs.

49
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Five. Taxonomy of Neural Network Systems

Example: RNNs are commonly used in language models for tasks such as text generation, where

the model generates text one word at a time based on the previous words, maintaining the context

of the sentence.

Generative Adversarial Networks (GANs)

Generative Adversarial Networks (GANs) are a class of neural networks used for generating new

data that is similar to a given dataset. GANs consist of two networks: a generator that creates data

and a discriminator that evaluates the authenticity of the generated data.

Key Characteristics:

Adversarial Process: The generator and discriminator are trained simultaneously in a game-

theoretic manner, where the generator tries to produce data indistinguishable from real data, and

the discriminator tries to correctly identify real from generated data.

Data Generation: GANs are capable of generating highly realistic images, text, and even video.

Example: GANs have been used to generate realistic images of non-existent people, as seen in

projects like "This Person Does Not Exist," where the generated faces appear indistinguishable

from real human faces.

Transformer Networks

Transformer Networks have revolutionized natural language processing by enabling efficient

processing of long-range dependencies in sequences without relying on recurrent connections.

50
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Five. Taxonomy of Neural Network Systems

Key Characteristics:

Self-Attention Mechanism: Transformers use self-attention to weigh the importance of different

words in a sequence, allowing the model to focus on relevant parts of the input for making

predictions.

Parallel Processing: Unlike RNNs, Transformers process all tokens in a sequence simultaneously,

making them more computationally efficient and capable of handling longer sequences.

Example: Transformers are the foundation of state-of-the-art language models like GPT-3 and

BERT, which excel in tasks such as language translation, summarization, and question-answering.

5.1.2 Classification of Neural Network Systems with Respect to Learning Methods and

Architecture Types

Neural network systems can be classified based on the learning methods they employ and their

architectural design. This classification helps in understanding how different networks are trained

and how their structures influence their performance on specific tasks.

Classification by Learning Methods

Supervised Learning:

Architecture Examples: Multi-layer Perceptron (MLP), Convolutional Neural Networks (CNNs)

51
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Five. Taxonomy of Neural Network Systems

Characteristics: In supervised learning, the network is trained on labeled data, meaning each input

is paired with the correct output. The network learns to map inputs to outputs by minimizing a loss

function that measures the difference between the predicted and actual outputs.

Use Cases: Supervised learning is widely used in tasks like image classification, object detection,

and sentiment analysis.

Unsupervised Learning:

Architecture Examples: Autoencoders, Self-Organizing Maps (SOMs)

Characteristics: In unsupervised learning, the network is trained on data without explicit labels.

The goal is to uncover hidden structures or patterns within the data. This method is often used for

clustering, dimensionality reduction, and anomaly detection.

Use Cases: Unsupervised learning is commonly applied in customer segmentation, gene

expression analysis, and feature extraction.

Reinforcement Learning:

Architecture Examples: Deep Q-Networks (DQN), Policy Gradient Methods

Characteristics: In reinforcement learning, the network, or agent, interacts with an environment

and learns to make decisions by receiving rewards or penalties based on its actions. The goal is to

learn a policy that maximizes cumulative rewards over time.

52
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Five. Taxonomy of Neural Network Systems

Use Cases: Reinforcement learning is used in robotics, game playing (e.g., AlphaGo), and

autonomous vehicles.

Classification by Architecture Types

Feed-Forward Networks:

Examples: Single-layer Perceptron, Multi-layer Perceptron (MLP), Convolutional Neural

Networks (CNNs)

Characteristics: In feed-forward networks, information flows in one direction from the input layer

to the output layer, passing through any hidden layers without looping back. These networks are

typically used for tasks where the data has no inherent sequence or temporal dependency.

Applications: Feed-forward networks are used in tasks such as image recognition, speech

recognition, and financial forecasting.

Recurrent Networks:

Examples: Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM), Gated

Recurrent Unit (GRU)

Characteristics: Recurrent networks include connections that loop back, allowing the network to

maintain a memory of previous inputs. This makes them well-suited for sequential data where

context and order matter.

53
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Five. Taxonomy of Neural Network Systems

Applications: RNNs are applied in time series analysis, language modeling, and music

composition.

Convolutional Networks:

Examples: Convolutional Neural Networks (CNNs)

Characteristics: Convolutional networks are specifically designed to process grid-like data, such

as images. They use convolutional layers to automatically detect and learn features from the data.

Applications: CNNs are extensively used in computer vision tasks like image classification, object

detection, and medical image analysis.

Generative Networks:

Examples: Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs)

Characteristics: Generative networks are designed to generate new data samples that resemble the

training data. They learn to capture the underlying distribution of the data, which allows them to

create new instances of data that are similar to the original.

Applications: These networks are used in tasks such as image synthesis, text generation, and data

augmentation.

54
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Five. Taxonomy of Neural Network Systems

Transformers:

Examples: BERT, GPT, T5

Characteristics: Transformers are characterized by their use of self-attention mechanisms, which

allow them to process and learn from long-range dependencies in sequences. They are particularly

effective in natural language processing tasks.

Applications: Transformers are used in machine translation, text summarization, and language

generation.

55
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Six. Single-Layer Neural Network Systems

6. Single-Layer Neural Network Systems

A single-layer neural network (NN) system is the simplest form of neural network architecture. It

consists of one layer of neurons that directly connect the input layer to the output layer without

any hidden layers. Despite its simplicity, the single-layer perceptron and variants such as the

ADAptive LInear NEuron (ADALINE) have played foundational roles in the development of more

complex neural network architectures. We will explore the structure and learning algorithms of the

single-layer perceptron, examine its limitations (especially in solving non-linearly separable

problems like the XOR problem), and introduce ADALINE, which addresses some of the

perceptron’s shortcomings.

6.1 Single-Layer Perceptron

Architecture of the Perceptron

A single-layer perceptron is composed of:

Input Layer: The input layer consists of input units (features), each representing a different

dimension of the input data.

Weights: Each input is connected to the output via a weight, which determines the importance of

that input.

Summation Function: The perceptron computes the weighted sum of the inputs.

Activation Function: A threshold activation function is applied to the weighted sum to produce a

binary output (1 for activation, 0 for no activation).

56
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Six. Single-Layer Neural Network Systems

Mathematically, the output ( 𝑦 ) of the perceptron is given by:

[𝑦 = ϕ (∑ 𝑤𝑖 𝑥𝑖 + 𝑏)]
𝑖=1

where:

(𝑥𝑖 ) is the (𝑖 𝑡ℎ ) input,

(𝑤𝑖 ) is the corresponding weight,

( 𝑏 ) is the bias term,

(ϕ(𝑧)) is the activation function, typically a step function such as:

1 𝑖𝑓 𝑧 ≥ 0
𝜑(𝑧) = {
0 𝑖𝑓 𝑧 < 0

6.1.1 Learning Algorithm for Training the Perceptron

The goal of the perceptron learning algorithm is to adjust the weights and bias in such a way that

the perceptron correctly classifies input data. The algorithm works by iteratively updating the

weights based on the error between the predicted output and the true label.

Steps of the Perceptron Learning Algorithm:

1. Initialization: Initialize the weights (𝑤1 , 𝑤2 , … , 𝑤𝑛 ) and bias ( 𝑏 ) to small random values.

2. For each training example ((𝑥𝑖 , 𝑡)):

Compute the output ( 𝑦 ) using the current weights and bias:

57
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Six. Single-Layer Neural Network Systems

[𝑦 = ϕ(∑𝑛𝑖=1 𝑤𝑖 𝑥𝑖 + 𝑏)]

Update the weights and bias if there is an error (i.e., if ( 𝑦 ≠ 𝑡 ), where ( 𝑡 ) is the target output):

[𝑤𝑖 = 𝑤𝑖 + η(𝑡 − 𝑦)𝑥𝑖 ]

[𝑏 = 𝑏 + η(𝑡 − 𝑦)]

Here, ( η) is the learning rate, a small positive constant that controls the size of the weight

updates.

3. Repeat this process for each training example until all examples are classified correctly or the

maximum number of iterations is reached.

6.1.2 Linearly Separable Tasks

The perceptron is capable of solving linearly separable tasks, where the data points belonging to

different classes can be separated by a straight line (in two dimensions), a plane (in three

dimensions), or a hyperplane (in higher dimensions).

Example: Consider a binary classification problem with two classes, ( 𝐴 ) and ( 𝐵 ), where class

( 𝐴 ) consists of points ((0,0)) and ((1,1)), and class ( 𝐵 ) consists of points ((0,1)) and ((1,0)).

These points are linearly separable since a straight line can be drawn that separates the two classes.

Here is a diagram illustrating the linear separability of two classes, ( 𝐴 ) and ( 𝐵 ). The blue circles

represent points from Class (𝐴)((0,0)) and ((1,1)), and the red circles represent points from Class

(𝐵)((0,1)) and ((1,0)). The straight line in the diagram clearly separates the two classes, showing

that the data is linearly separable.

58
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Six. Single-Layer Neural Network Systems

Limitations:

The perceptron performs well on linearly separable problems, but it fails on non-linearly separable

problems, such as the XOR problem, which requires more complex decision boundaries.

6.1.3 The XOR Problem

The XOR problem (exclusive OR) is a classic example of a non-linearly separable problem. It

involves two binary inputs and a binary output, where the output is 1 if the inputs are different,

and 0 if the inputs are the same.

59
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Six. Single-Layer Neural Network Systems

The XOR truth table is as follows:

𝒙𝟏 𝒙𝟐 XOR(𝒙𝟏 , 𝒙𝟐 )
0 0 0
0 1 1
1 0 1
1 1 0

Non-Linearly Separable:

If you plot the XOR inputs ((0,0), (0,1), (1,0), (1,1)) on a graph, you will notice that there is no

straight line that can separate the two classes (output 1 and output 0). This is because the points

((0,1)) and ((1,0)) are mixed with points ((0,0)) and ((1,1)). Here is a diagram illustrating the

XOR problem. The blue circles represent points for Class 0((0,0)) and ((1,1)), while the purple

diamonds represent points for Class 1 ((0,1)) and ((1,0)). As shown, no straight line can separate

the two classes, demonstrating that the data is not linearly separable.

Perceptron Failure: Since the perceptron can only solve linearly separable problems, it cannot

correctly classify XOR data. To solve the XOR problem, a more complex architecture, such as a

multi-layer neural network, is required.

60
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Six. Single-Layer Neural Network Systems

6.2 ADAptive LInear NEuron (ADALINE)

6.2.1 Architecture of ADALINE

ADALINE (ADAptive LInear NEuron) is a variation of the perceptron that uses a different

learning rule and activation function. The key difference lies in how ADALINE computes and

updates its weights. While the perceptron uses a binary threshold activation function, ADALINE

uses a linear activation function during the learning process.

Key Components:

Inputs and Weights: Like the perceptron, ADALINE takes multiple inputs (𝑥1 , 𝑥2 , … , 𝑥𝑛 ), each

associated with a weight (𝑤1 , 𝑤2 , … , 𝑤𝑛 ).

Weighted Sum: ADALINE computes the weighted sum of the inputs:

[𝑧 = ∑ 𝑤𝑖 𝑥𝑖 + 𝑏]
𝑖=1

Output: Unlike the perceptron, ADALINE does not apply a threshold activation function during

training. Instead, it uses the linear output ( 𝑧 ) for weight updates, but applies a threshold at the

final decision stage.

6.2.2 Training Algorithm for ADALINE

The training algorithm for ADALINE is based on minimizing the mean squared error (MSE)

between the actual output and the target output. This is a key difference from the perceptron, which

only considers whether the output is correct or incorrect.

61
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Six. Single-Layer Neural Network Systems

Steps of the ADALINE Learning Algorithm:

1. Initialization: Initialize the weights and bias to small random values.

2. For each training example ((𝑥𝑖 , 𝑡)):

Compute the weighted sum ( 𝑧 ):

[𝑧 = ∑𝑛𝑖=1 𝑤𝑖 𝑥𝑖 + 𝑏]

Update the weights and bias based on the error ( 𝑒 = 𝑡 − 𝑧 ):

[𝑤𝑖 = 𝑤𝑖 + η ⋅ 𝑒 ⋅ 𝑥𝑖 ]

[𝑏 = 𝑏 + η ⋅ 𝑒]

where ( 𝑒 ) is the difference between the target ( 𝑡 ) and the actual output ( 𝑧 ), and ( η) is the

learning rate.

3. Repeat this process until the error falls below a specified threshold or the maximum number of

iterations is reached.

6.2.3 Comparison with Perceptron

Error Minimization: Unlike the perceptron, which updates weights only when there is a

classification error, ADALINE updates its weights based on minimizing the error between the

predicted and target outputs using a linear activation function. This makes ADALINE more stable

during learning.

62
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Six. Single-Layer Neural Network Systems

Convergence: ADALINE is guaranteed to converge to a solution when the data is linearly

separable, but it can also handle more complex decision boundaries with appropriate modifications

(e.g., by introducing non-linear transformations in multi-layer networks).

Example: ADALINE on Linearly Separable Data

Consider a binary classification problem with inputs ((1,0), (0,1), (1,1), (0,0)) and corresponding

targets (1, 1, 0, 0). ADALINE would compute the weighted sum ( 𝑧 ) for each input, adjust the

weights based on the error, and eventually learn a linear decision boundary that separates the two

classes.

Solution:

[Inputs = {(1,0), (0,1), (1,1), (0,0)}]

[Targets = {1,1,0,0}]

The task is to use the ADALINE (ADAptive LInear NEuron) algorithm to compute the weighted

sum ( 𝑧 ), adjust the weights based on the error, and eventually find a linear decision boundary

that separates the two classes.

1. Initialize Parameters

ADALINE will use weights (𝑤1 ), (𝑤2 ), and a bias term ( 𝑏 ). These parameters are initialized to

small random values, but for simplicity, we will initialize them to 0.

Let us initialize:

[𝑤1 = 0, 𝑤2 = 0, 𝑏 = 0]

63
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Six. Single-Layer Neural Network Systems

Also, we define a learning rate ( η = 0.1 ) (this controls the size of the weight updates).

2. ADALINE Learning Algorithm

The ADALINE algorithm uses the mean squared error (MSE) as a criterion for adjusting the

weights. The steps are as follows:

a. Compute the weighted sum ( 𝑧 ) for each input.

b. Compute the error ( 𝑒 ) between the target and the weighted sum.

c. Update the weights and bias based on the error.

The weighted sum is calculated as:

[𝑧 = 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏]

The weight update rule is:

[𝑤𝑖 = 𝑤𝑖 + η ⋅ 𝑒 ⋅ 𝑥𝑖 ]

[𝑏 = 𝑏 + η ⋅ 𝑒]

Where (𝑒 = target − 𝑧).

Now, let us apply these steps to each input point iteratively. We will go through multiple epochs

(iterations over the entire dataset) to allow the ADALINE to learn.

64
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Six. Single-Layer Neural Network Systems

3. Epoch 1: First Iteration over the Data

Step 1: Input (1,0), Target = 1

Compute weighted sum:

[𝑧 = 𝑤1 ⋅ 1 + 𝑤2 ⋅ 0 + 𝑏 = 0 ⋅ 1 + 0 ⋅ 0 + 0 = 0]

Compute error:

[𝑒 = 1 − 𝑧 = 1 − 0 = 1]

Update weights and bias:

[𝑤1 = 0 + 0.1 ⋅ 1 ⋅ 1 = 0.1]

[𝑤2 = 0 + 0.1 ⋅ 1 ⋅ 0 = 0]

[𝑏 = 0 + 0.1 ⋅ 1 = 0.1]

Step 2: Input (0,1), Target = 1

Compute weighted sum:

[𝑧 = 𝑤1 ⋅ 0 + 𝑤2 ⋅ 1 + 𝑏 = 0.1 ⋅ 0 + 0 ⋅ 1 + 0.1 = 0.1]

Compute error:

[𝑒 = 1 − 𝑧 = 1 − 0.1 = 0.9]

Update weights and bias:

[𝑤1 = 0.1 + 0.1 ⋅ 0.9 ⋅ 0 = 0.1]

65
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Six. Single-Layer Neural Network Systems

[𝑤2 = 0 + 0.1 ⋅ 0.9 ⋅ 1 = 0.09]

[𝑏 = 0.1 + 0.1 ⋅ 0.9 = 0.19]

Step 3: Input (1,1), Target = 0

Compute weighted sum:

[𝑧 = 𝑤1 ⋅ 1 + 𝑤2 ⋅ 1 + 𝑏 = 0.1 ⋅ 1 + 0.09 ⋅ 1 + 0.19 = 0.38]

Compute error:

[𝑒 = 0 − 𝑧 = 0 − 0.38 = −0.38]

Update weights and bias:

[𝑤1 = 0.1 + 0.1 ⋅ (−0.38) ⋅ 1 = 0.1 − 0.038 = 0.062]

[𝑤2 = 0.09 + 0.1 ⋅ (−0.38) ⋅ 1 = 0.09 − 0.038 = 0.052]

[𝑏 = 0.19 + 0.1 ⋅ (−0.38) = 0.19 − 0.038 = 0.152]

Step 4: Input (0,0), Target = 0

Compute weighted sum:

[𝑧 = 𝑤1 ⋅ 0 + 𝑤2 ⋅ 0 + 𝑏 = 0.062 ⋅ 0 + 0.052 ⋅ 0 + 0.152 = 0.152]

Compute error:

[𝑒 = 0 − 𝑧 = 0 − 0.152 = −0.152]

66
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Six. Single-Layer Neural Network Systems

Update weights and bias:

[𝑤1 = 0.062 + 0.1 ⋅ (−0.152) ⋅ 0 = 0.062]

[𝑤2 = 0.052 + 0.1 ⋅ (−0.152) ⋅ 0 = 0.052]

[𝑏 = 0.152 + 0.1 ⋅ (−0.152) = 0.152 − 0.0152 = 0.1368]

4. Epoch 2: Second Iteration over the Data

After the first epoch, we repeat the same steps, re-evaluating each input and adjusting the weights

accordingly. Typically, several epochs are required for the ADALINE to converge, but for

illustration purposes, we can stop after a few iterations once the changes in weights become small.

Let us go through one more iteration for better clarity.

Step 1: Input (1,0), Target = 1

Compute weighted sum:

[𝑧 = 0.062 ⋅ 1 + 0.052 ⋅ 0 + 0.1368 = 0.1988]

Compute error:

[𝑒 = 1 − 𝑧 = 1 − 0.1988 = 0.8012]

Update weights and bias:

[𝑤1 = 0.062 + 0.1 ⋅ 0.8012 ⋅ 1 = 0.062 + 0.08012 = 0.14212]

[𝑤2 = 0.052 + 0.1 ⋅ 0.8012 ⋅ 0 = 0.052]

67
CSE 526: Fundamentals of Neural Networks Dr Chinenye Ezeh
Six. Single-Layer Neural Network Systems

[𝑏 = 0.1368 + 0.1 ⋅ 0.8012 = 0.1368 + 0.08012 = 0.21692]

Step 2: Input (0,1), Target = 1

Compute weighted sum:

[𝑧 = 0.14212 ⋅ 0 + 0.052 ⋅ 1 + 0.21692 = 0.26892]

Compute error:

[𝑒 = 1 − 𝑧 = 1 − 0.26892 = 0.73108]

Update weights and bias:

[𝑤1 = 0.14212 + 0.1 ⋅ 0.73108 ⋅ 0 = 0.14212]

[𝑤2 = 0.052 + 0.1 ⋅ 0.73108 ⋅ 1 = 0.052 + 0.073108 = 0.125108]

[𝑏 = 0.21692 + 0.1 ⋅ 0.73108 = 0.21692 + 0.073108 = 0.290028]

Step 3 and 4 can be repeated in a similar manner.

The ADALINE algorithm progressively adjusts the weights based on the error between the

predicted output and the target values. Through multiple iterations (epochs), the weights converge

to values that separate the two classes with a linear decision boundary. Although ADALINE's

learning is more stable than the perceptron’s, it may still require multiple iterations for the weights

to fully converge, depending on the learning rate and initial conditions.

68

You might also like