0% found this document useful (0 votes)
8 views

L6 Neural Network

The document provides an overview of Artificial Neural Networks (ANN), detailing their structure, components, and learning algorithms. It explains the functioning of neurons, activation functions, and the architecture of ANN, including feedforward and feedback networks. Additionally, it covers the backpropagation algorithm for training multi-layer networks and the importance of gradient descent in minimizing error during learning.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

L6 Neural Network

The document provides an overview of Artificial Neural Networks (ANN), detailing their structure, components, and learning algorithms. It explains the functioning of neurons, activation functions, and the architecture of ANN, including feedforward and feedback networks. Additionally, it covers the backpropagation algorithm for training multi-layer networks and the importance of gradient descent in minimizing error during learning.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

Neural Network (NN)

Lương Thái Lê
Outline

1. NN Introduction

2. Components of a NN

3. ANN architect

4. Perceptron

5. Gradient descent

6. Multi-layer ANN & Back propagation


Artificial Neural Network – Introduction (1)
• Artificial Neural Network (ANN)
• Simulation of biological neural systems (human brains)
• Is a structure made of a number of interconnected neurons.
• Each neuron:
• has an Input/Output characteristics
• Perform a local computation
• The output value of a neuron is determined by:
• it’s Input/Output characteristics
• Its connections with other neurons
• additional inputs (if any)
Artificial Neural Network – Introduction (2)
• ANN can be viewed as a structure that processes information in a
distributed and highly parallel method.
• ANN has the ability to learn, recall, and generalize from learning data
– by assigning and adjusting (adapting) the weight values of
connections between neurons
• The objective function of an ANN is determined by:
• The architecture (topology) of the neural network
• Input/output characteristics of each neuron
• Training Strategies
• Training data
ANN – Typical applications
• Image processing and Computer vision
• Matching, preprocessing, image segmentation and analysis, computer vision, image
compression, processing and understanding images that change over time
• Signal processing
• Signal and morphological seismic analysis, earthquake
• Pattern recognition
• Attribute extraction, speech recognition and comprehension, fingerprint recognition,
human face recognition
• Medical
• Analyze and understand electrocardiographic signals, diagnose diseases, and process
images in the medical field
• … and alots
Structure and operation of a neuron
• Input signal:
• Each input signal xi has a corresponding
weight wi
• w0 is adjusted weights - bias (for x0 =
1)
• Overall input:
• is a function that integrates the input
signals Net(w,x)
• Activate function:
• calculate the output value of neuron
Overall input and adjustment
• The net input is usually calculated by a linear function:
𝑚

𝑁𝑒𝑡 = 𝑤0 + 𝑤1 𝑥1 + 𝑤2 𝑥2 + ⋯ + 𝑤𝑚 𝑥𝑚 = ෍ 𝑤𝑖 𝑥𝑖
𝑖=0
• Meaning of bias:
• The family of separation functions Net = wixi cannot separate examples into
two classes
• But Net = wixi + w0 could
Activate function: Hard limiter
• Or call Threshold function:
• 𝜃 𝑖𝑠 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 𝑣𝑎𝑙𝑢𝑒
• Pitfall: not contituous and
the derivative is not continuous too
Activate function: Threshold logic

• 𝛼 determine the slope


of the linear interval
• Pitfall: Continuos but it’s
derivation is not
Activate function: Sigmoidal

• the most commonly used


• 𝛼 determine the slope of the linear
interval
• Output is in (0,1)
• Advantage: Continuos and it’s
derivation is too
Activate function: Hyperbolic tangent

• commonly used
• 𝛼 determine the slope of the linear
interval
• Output is in (-1,1)
• Advantage: Continuos and it’s
derivation is too
ANN – network architecture (1)
• The architecture of an ANN is determined by:
• Number of input and output signals
• Number of layers
• Number of neurons on each layer
• Number of weights for each neurons
• How neurons (on a layer, or between layers) connect to
each other
• Which neurons receive error correction signals
• An ANN must has:
• An input layer
• An output layer
• 0,1, or more hidden layer
ANN – network architecture (2)
• An ANN is said to be fully connected if every output from one layer is
connected to every neuron of the next layer
• An ANN is called a feedforward network if no output of one node is
the input of another node of the same layer (or of a previous layer).
• When the outputs of a node link back to the inputs of a node of the
same layer (or of a previous layer), it is a feedback network.
• Feedback networks that have closed loops are called recurrent
networks
ANN – network architecture (3)
ANN – Learning rules
• Two types of learning in ANN:
• Parameter learning
The goal is to adaptively change the weights of the links in the neural network
• Structure learning
The goal is to adaptively change the network structure, including the number
of neurons and the connection patterns between them
• These two types of learning can be done simultaneously or separately
• Most of the learning method in ANN are parameter learning
=> We will consider only parameter learning
Weight learning rules
• At learning step t, the degree of adjustment of the
weight vector w is proportional to the product of
the learning signal r(t) and input x(t)
∆𝒘(𝑡) = η𝑟 (𝑡) 𝒙(𝑡)
where η>0 is learning rate
• Learning signal r is a function of w, x and desired
output value d
r = g(w,x,d)
Note : xj can be input signal or an output
• Generalized weight learning rules: value from another neuron
∆𝒘(𝑡) = η𝑔(𝒘 𝑡 ,𝒙 𝑡 ,𝑑 𝑡 )𝒙(𝑡)
Perceptron
• A perceptron is the simplest of ANNs (include
only 1 neuron)
• Use hard limiter activate function:
𝑂𝑢𝑡 = 𝑠𝑖𝑔𝑛 𝑁𝑒𝑡 𝑤, 𝑥 = 𝑠𝑖𝑔𝑛(σ𝑚
𝑗=0 𝑤𝑗 𝑥𝑗 )

• For an input x, output of a perceptron is:


• 1 if Net(w,x)>0
• -1 if other
Perceptron – Learning Algorithm
• With a training set D = {(x,d)}
• x is an input vecto
• d is a desired output (-1 or 1)
• The perceptron's learning process aims to determine a weight vector that allows
the perceptron to produce the correct output value (-1 or 1) for each learning
example.
• For an example x that is accurately classified by the perceptron, the weight vector
w does not change
• If d=1 but the perceptron produces -1 (Out=-1), then w needs to be changed so
that the value Net(w,x) increases
• If d=-1 but the perceptron produces 1 (Out=1), then w needs to be changed so
that the value Net(w,x) decreases
Perceptron – Incremental Algorithm
Perceptron – Batch Algorithm
Perceptron – Limitation
• The learning algorithm for the perceptron is proven to converge if
• The learning examples are linearly separable
• Use a sufficiently small learning rate η
• The perceptron learning algorithm may not converge if the learning examples are
not linearly separable
• Then, apply Delta rules
• Ensure convergence to the most suitable approximation
of the objective function
• The delta rule uses a gradient descent strategy to find in the
hypothesis space (weight vectors) a weight vector that best
matches the learning examples
Error evaluation function
• Consider an ANN has n output neurons
• For an example (x,d), error value creates by current weight vecto w is
1 𝑛
𝐸𝑥 𝒘 = σ𝑖=1(𝑑𝑖 − 𝑂𝑢𝑡𝑖 )2
2
• Error value ceates by current weight vecto w for all training set D is:
1
𝐸𝐷 𝒘 = σ𝑛𝑖=1 𝐸𝑥 𝒘
|𝐷|
Gradient descent
• Gradient of E (𝛻𝐸) is a vecto:
• Has an upward direction (slope)
• Has a length proportional to the slope
• The gradient ∇E determines the direction that causes the steepest increase in the error value E
𝜕𝐸 𝜕𝐸 𝜕𝐸
𝛻𝐸 𝒘 = , ,…,
𝜕𝑤1 𝜕𝑤2 𝜕𝑤𝑁

where N is number of weight w in the network


• So that, the direction causing the steepest decrease is the negative value of the gradient of E
𝜕𝐸
𝛻𝒘 = −η. 𝛻𝐸 𝒘 ; 𝛻𝑤𝑖 = −η. , ∀𝑖 = 1, … , 𝑁
𝜕𝑤𝑖

=>Requirement: The activate functions used in the network must be continuous functions with
respect to the weights and have continuous derivatives.
Gradient descent - visualization
Stopping criterion: number of study cycles (epochs), error threshold…
Multi-layer ANN and back propagation
algorithm
• A perceptron can only represent a linear separable function
• A multi-layer neural network (NN) learned by the back-propagation
(BP- algorithm) can represent a highly non-linear separation
function
• The BP learning algorithm is used to learn the weights of a multilayer
neural network
• Fixed network structure (neurons and the links between them are fixed)
• For each neuron, the action function must have a continuous derivative
• The BP algorithm applies the gradient descent strategy in the weights
update rule to minimize error
.
Back propagation algorithm
• The backpropagation learning algorithm searches for a vector of weights that
minimizes the overall error of the system for the learning set
• BP includes 2 steps:
1. Signal forward step:
• The input signals are forward propagated from the input layer to the output layer
(passing through hidden layers).
2. Error backward step:
• Based on the desired output value of the input vector, the system calculates the error
value
• From the output layer, the error value is propagated back through the network, from
layer to layer, until the input layer
• Error back-propagation is performed through calculating (recursively) the local gradient
value of each neuron
BP-algorithm network structure
• Consider 3-layer NN:
• m input signal xj
• l hidden layer neural zq
• n output neurons yi
• wqj is the weight of the link from the input
signal xj to the hidden layer neuron zq
• wiq is the weight of the link from the
hidden layer neuron zq to output neuron yi
• Outq is the (local) output value of the
hidden layer neuron zq
• Outi is the output value of the network
corresponding to the output neuron yi
BP-algorithm: forward propagation (1)
• For each example x:
• The input vector x is propagated from the input layer to the output layer
• The network will produce an actual output value Out (which is a vector of Outi
values)
• For each input vecto x:
• a neuron zq in the hidden layer will receive a net input value of:
𝑁𝑒𝑡𝑞 = σ𝑚 𝑗=1 𝑤𝑞𝑗 𝑥𝑗
• and create a local output of:
𝑂𝑢𝑡𝑞 = 𝑓 𝑁𝑒𝑡𝑞 = 𝑓(σ𝑚 𝑗=1 𝑤𝑞𝑗 𝑥𝑗 )
where f is an activate function
BP-algorithm: forward propagation (2)
• The net input value of neuron yi at the output layer:
𝑁𝑒𝑡𝑖 = σ𝑙𝑞=1 𝑤𝑖𝑞 𝑂𝑢𝑡𝑞 = σ𝑙𝑞=1 𝑤𝑖𝑞 𝑓(σ𝑚
𝑗=1 𝑤𝑞𝑗 𝑥𝑗 )

• Neuron yi create an output value Outi :


𝑂𝑢𝑡𝑖 = 𝑓(𝑁𝑒𝑡𝑖 ) = 𝑓 σ𝑙𝑞=1 𝑤𝑖𝑞 𝑓 σ𝑚
𝑗=1 𝑤𝑞𝑗 𝑥𝑗

• Vecto created of Outi , i=1…n, is the output of the network for vecto x
BP-algorithm: backward propagate (1)
• For each example x
• Error signals due to the difference between the desired output value d and the
actual output value Out are calculated
• These error signals are back-propagated from the output layer to the front
layers, to update the weights.
• To consider error signals and their back propagation, it is necessary to
define an error evaluation function
𝑛 𝑛
1 1
𝐸 𝑤 = ෍(𝑑𝑖 − 𝑂𝑢𝑡𝑖 ) = ෍(𝑑𝑖 − 𝑓(𝑁𝑒𝑡𝑖 ))2
2
2 2
𝑖=1 𝑖=1
1 𝑛
= σ𝑖=1(𝑑𝑖 − 𝑓(σ𝑙𝑞=1 𝑤𝑖𝑞 𝑂𝑢𝑡𝑞 ))2
2
BP-algorithm: backward propagate (2)
• Apply Gradient-descent, weights from hidden to output layer are
update by:
𝜕𝐸
𝛻𝑤𝑖𝑞 = −η.
𝜕𝑤𝑖𝑞
• Apply the chain derivative rule we have
𝛻𝑤𝑖𝑞 = η𝛿𝑖 𝑂𝑢𝑡𝑞
• 𝛿𝑖 is the error signal of neuron yi at the output layer
𝛿𝑖 = 𝑑𝑖 − 𝑂𝑢𝑡𝑖 [𝑓′(𝑁𝑒𝑡𝑖 )]
BP-algorithm: backward propagate (3)
• To update the weights of the links from the input layer to the hidden layer,
we also apply the gradient-descent method and the derivative chain rule
𝜕𝐸
𝛻𝑤𝑞𝑗 = −η. = η𝛿𝑞 𝑥𝑗
𝜕𝑤𝑞𝑗
• 𝛿𝑞 is the error signal of neuron zq at the hidden layer
𝛿𝑞 = 𝑓′(𝑁𝑒𝑡𝑞 ) σ𝑛𝑖=1 𝛿𝑖 𝑤𝑖𝑞
• According to the above formulas for calculating error signals δi and δq, the
error signal of a neuron in the hidden layer is different from the error signal
of a neuron in the output layer.
• Due to this difference, the weight update procedure in the BP algorithm is
called the generalized delta learning rule
BP-algorithm: backward propagate (4)
• The error signal δq of neuron zq in the hidden layer is determined by:
• The error signals δi of neurons yi in the output layer (to which neuron zq is
linked) and
• The coefficients are wiq weights
• Important features of the BP algorithm: The weight update rule is
local
• To update the weight of a link, the system only needs to use the values at the
two ends of that link
• The general form of the weight updating rule in the BP algorithm is
∆𝑤𝑎𝑏 = η𝛿𝑎 𝑥𝑏
Back_propagation_incremental Alg (1)
Back_propagation_incremental(D, η)
Neural network include Q layers, q=1,2,…,Q
𝑞 𝑞
𝑁𝑒𝑡𝑖 and 𝑂𝑢𝑡𝑖 are net input and output of neural i at layer q
Network has m input signals and n output neural
𝑞
𝑤𝑖𝑗 is the weight of the link from neural j of the layer (q-1) to neural q of the layer q
Step 0 (Initalization)
Choose error threshold Eth
Initialize the initial values of the weights with small random values
Let E=0
Step 1 (Start a learning period)
Apply the input vector of the learning example k to the input layer (q=1)
𝑞 (𝑘)
𝑂𝑢𝑡𝑖 = 𝑂𝑢𝑡𝑖1 = 𝑥𝑖 , ∀𝑖
Step 2 (forward propagation)
𝑞
Forward propagation of input signals through the network, until the network output values are received (at the output
layer) 𝑂𝑢𝑡𝑖
Back_propagation_incremental Alg (2)
Step 3 (Output Error Coutting)
𝑞
Calculate the output error of the network and
𝑛
the error signal of each neuron δ 𝑖 in the output layer
1 𝑘 𝑞
𝐸 = 𝐸 + ෍(𝑑𝑖 − 𝑂𝑢𝑡𝑖 )
2
𝑖=1
𝑞 𝑘 𝑞 𝑞
𝛿𝑖 = 𝑑𝑖 − 𝑂𝑢𝑡𝑖 𝑓′(𝑁𝑒𝑡𝑖 )
Step 4 (Error back-propagation)
𝑞−1
Error back-propagation to update the weights and calculate the error signals 𝛿𝑖 for the above layers
𝑞 𝑞 𝑞−1 𝑞 𝑞 𝑞
∆𝑤𝑖𝑗 = η𝛿𝑖 𝑂𝑢𝑡𝑗 ; 𝑤𝑖𝑗 = 𝑤𝑖𝑗 + ∆𝑤𝑖𝑗
𝑞−1 𝑞−1 𝑞 𝑞
𝛿𝑖 = 𝑓′(𝑁𝑒𝑡𝑖 )𝑤𝑗𝑖 𝛿𝑗
Step 5 (Check the end of a learning cycle – epoch)
Check if the entire learning set has been used (a learning cycle has been completed – epoch)
If the entire study set has been used, go to Step 6; otherwise, go to Step 1
Step 6 (Check for overall errors)
If the overall error E is less than the acceptable error threshold (<Eth), the learning process ends and
the learned weights are returned;
Otherwise, reassign E=0, and start a new learning cycle (return to Step 1).
Forward propagation visualization (1)
Forward propagation visualization (2)
Forward propagation visualization (3)
Forward propagation visualization (4)
Forward propagation visualization (5)
Forward propagation visualization (6)
Forward propagation visualization (7)
Error Counting
Back propagation (1)
Back propagation (2)
Back propagation (3)
Back propagation (4)
Back propagation (5)
Weights updating (1)
Weights updating (2)
Weights updating (3)
Weights updating (4)
Weights updating (5)
Weights updating (6)
Weights Initialization
• Usually, the weights are initialized with small random values
• If the weights have large initial values
• The sigmoid functions will reach saturation state soon
• The system will clog at a local minimum or at a very flat plateau near the starting point
0
• Recomend for 𝑤𝑎𝑏 (link from neuron b to neuron a)
• Let na be the number of neurons in the same layer as neuron a
0 1 1
𝑤𝑎𝑏 ∈ [− , ]
𝑛𝑎 𝑛𝑎
• Let ka be the number of neurons with forward connections to neuron a (=number of
input connections of neuron a)
0 3 3
𝑤𝑎𝑏 ∈ [− , ]
𝑘𝑎 𝑘𝑎
Learning rate
• Important influence on the efficiency and convergence of the BP learning algorithm
• A large value of η can accelerate the convergence of the learning process, but can cause the
system to miss the global optimum or fall into the local optimum.
• A small value of η can make the learning process take a very long time
• Usually be chosen experimentally for each problem
• Good values of learning rate at the beginning may not be good at a later time
• An adaptive (dynamic) learning rate should be used
• After updating the weights, check whether updating the weights reduces the error value

You might also like