NN-BNU2
NN-BNU2
Networks
Lecture (2) _ ANN Architectures and Learning Schemes
2
Neural Networks
3
Neural Networks
4
Neural Networks
5
Neural Networks
6
Neural Networks
7
Neural Networks
8
Neural Networks
9
Neural Networks
10
Neural Networks
11
Neural Networks
12
Neural Networks
13
Neural Networks
14
Neural Networks
15
Neural Networks
16
Neural Networks
17
Neural Networks
Learning Problem
We represent the learning problem in terms of the minimization of a loss
index (f). Here, “f” is the function that measures the performance of a Neural
Network on a given dataset.
Generally, the loss index consists of an error term and a regularization term.
While the error term evaluates how a Neural Network fits a dataset, the
regularization term helps prevent the overfitting issue by controlling the
effective complexity of the Neural Network.
18
Neural Networks
Learning Problem
Loss index
The loss index plays a vital role in the use of neural networks. It defines the task the
neural network is required to do and provides a measure of the quality of the
representation required to learn. The choice of a suitable loss index depends on the
application.
When setting a loss index, two different terms must be chosen: an error term and a
regularization term.
loss_index = error_term + regularization_term
19
Neural Networks
Learning Problem
Error term
The error is the most important term in the loss expression. It
measures how the neural network fits the data set.
All those errors can be measured over different subsets of the data. In
this regard, the training error refers to the error measured on the
training samples, the selection error is measured on the selection
samples, and the testing error is measured on the testing samples.
20
Neural Networks
Learning Problem
Error term
Next, we describe the most important errors used in the field of neural
networks:
▪ Mean squared error.
▪ Normalized squared error.
▪ Weighted squared error.
▪ Cross entropy error.
▪ Minkowski error.
21
Neural Networks
Learning Problem
Error term
▪ Mean squared error.
1 2
𝑀𝑆𝐸 = σ(𝑦𝑖 − 𝑦ෝ𝑖 )
𝑛
𝑦𝑖 Observed values
𝑦ෝ𝑖 Predicted values
22
Neural Networks
Learning Problem
Error term
▪ Normalized squared error.
σ 2
1 (𝑦𝑖 − 𝑦ෝ𝑖 )
𝑁𝑀𝑆𝐸 = 2
𝑛 σ 𝑦ෝ𝑖
𝑦𝑖 Observed values
𝑦ෝ𝑖 Predicted values
23
Neural Networks
Learning Problem
Error term
▪ Cross entropy error.
𝑦𝑖 Observed values
𝑦ෝ𝑖 Predicted values
24
Neural Networks
Learning Problem
Error term
▪ Minkowski error is a loss index that is more insensitive to
outliers than the standard mean squared error.
What is an outlier?
Outliers refer to those data points which lie far away from most of the data
points. So, basically, outliers are points which are rare or distinct.
Here is a simple example :
Say we have a set of 10 numbers : {45,47,56,3,54,42,50,99,48,55}. In this
set, we observe that most of the numbers lie between 40 and 50. But there
are two numbers, 3 and 99 which are far away from most other numbers.
These numbers would be called outliers.
25
Neural Networks
Learning Problem
Error term
▪ Minkowski error.
1 𝑝
𝑀𝑖𝑛𝑜𝑤𝑠𝑘𝑖 𝐸𝑟𝑟𝑜𝑟 = σ(𝑦𝑖 − 𝑦ෝ𝑖 )
𝑛
𝑦𝑖 Observed values
𝑦ෝ𝑖 Predicted values
26
Neural Networks
Learning Problem
Regularization term
A solution is regular when small changes in the input variables led to small changes in
the outputs. An approach for non-regular problems is to control the effective
complexity of the neural network. We can achieve this by including a regularization
term in the loss index.
Regularization terms usually measure the values of the parameters in the neural
network. Adding that term to the error will cause the neural network to have smaller
weights and biases, which will force its response to be smoother.
The most used types of regularization are the following:
•L1 regularization.
•L2 regularization.
27
Neural Networks
Learning Problem
Regularization term
L1 regularization
The L1 regularization method consists of the sum of the absolute values of all the
parameters in the neural network.
l1_regularization = regularization_weight ⋅ ∑|parameters|
L2 regularization
The L2 regularization method consists of the squared sum of all the parameters in the
2
neural network. l2_regularization = regularization_weight ⋅ ∑parameters
28
Neural Networks
Learning Problem
Training Algorithms
The following chart depicts the computational
speed and the memory requirements of the
training algorithms
As we can see, the slowest training algorithm is
usually gradient descent, but it is the one
requiring less memory.
On the contrary, the fastest one might be the
Levenberg-Marquardt algorithm, but it usually
requires much memory.
A good compromise might be the quasi-Newton
method.
29
Neural Networks
Learning Problem
Training Algorithms
1. Gradient descent (GD)
𝑖 𝑖+1
and, until a stopping criterion is satisfied, moves from 𝑤 to 𝑤 in the training
𝑖 𝑖
direction 𝑑 = −𝑔 Therefore, the gradient descent method iterates in the following
way:
𝑖+1 𝑖 𝑖 𝑖
𝑤 =𝑤 − 𝑔 𝜂 , 𝑓𝑜𝑟 𝑖 = 0, 1, …
30
Neural Networks
Learning Problem
The gradient vector groups the first derivatives Hessian matrix groups the second derivatives
31
Neural Networks
Backpropagation
Backpropagation
Perceptron
Developed by Frank Rosenblatt 1957, by using McCulloch and Pitts
model, perceptron is the basic operational unit of artificial neural
networks. It employs supervised learning rule and is able to classify
the data into two classes “binary classifiers”.
Operational characteristics of the perceptron: It consists of a single
neuron with an arbitrary number of inputs along with adjustable
weights, but the output of the neuron is 1 or 0 depending upon the
threshold. It also consists of a bias whose weight is always 1.
Following figure gives a schematic representation of the perceptron.
33
Neural Networks
Backpropagation
Backpropagation
Perceptron Learning Algorithm
Training Algorithm for Single Output Unit
Step 1 − Initialize the following to start the training
• Weights
• Bias
• Learning rate or
For easy calculation and simplicity, weights and bias must be set equal to 0 and the
learning rate must be set equal to 1.
Step 2 − Continue step 3-8 when the stopping condition is not true.
Step 3 − Continue step 4-6 for every training vector x.
Step 4 − Activate each input unit as follows
𝒙𝒊 = 𝒔𝒊 (𝒊 = 𝟏 → 𝒏)
Step 5 − Now obtain the net input with the following relation
𝑛
𝑦𝑖𝑛 = 𝑏 + 𝑥𝑖 ∙ 𝑤𝑖
𝑖 35
Neural Networks
Backpropagation
Perceptron Learning Algorithm
Here ‘b’ is bias, and ‘n’ is the total number of input neurons.
Step 6 − Apply the following activation function to obtain the final output.
Here ‘y’ is the actual output and ‘t’ is the desired/target output.
Step 8 − Test for the stopping condition, which would happen when there is no change in weight. 36
Neural Networks
Backpropagation
Perceptron Learning Algorithm
37
Neural Networks
Backpropagation
Perceptron Learning Algorithm
38
Neural Networks
Backpropagation
Perceptron Learning Algorithm
39
Neural Networks
Backpropagation
Perceptron Learning Algorithm
Example1
We are going to set weights randomly. Let’s say that w1 = 0.9 and w2 = 0.9
40
Neural Networks
Backpropagation
Perceptron Learning Algorithm
Example1
Round 1
We will apply 1st instance to the perceptron. x1 = 0 and x2 = 0.
Sum unit will be 0 as calculated below
Σ = x1 * w1 + x2 * w2 = 0 * 0.9 + 0 * 0.9 = 0
Activation unit checks sum unit is greater than a threshold. If this rule is satisfied, then it is fired and the unit
will return 1, otherwise it will return 0. BTW, modern neural networks architectures do not use this kind of a
step function as activation.
41
Neural Networks
Backpropagation
Perceptron Learning Algorithm
Example1
What about errors?
Activation unit will return 1 because sum unit is greater than 0.5. However, output of this instance should be
0. This instance is not predicted correctly. That’s why, we will update weights based on the error.
ε = actual – prediction = 0 – 1 = -1
We will add error times learning rate value to the weights. Learning rate would be 0.5. BTW, we mostly set
learning rate value between 0 and 1.
w1 = w1 + α * ε = 0.9 + 0.5 * (-1) = 0.9 – 0.5 = 0.4
w2 = w2 + α * ε = 0.9 + 0.5 * (-1) = 0.9 – 0.5 = 0.4
Focus on the 3rd instance. x1 = 1 and x2 = 0.
42
Neural Networks
Backpropagation
Perceptron Learning Algorithm
Example1
Sum unit: Σ = x1 * w1 + x2 * w2 = 1 * 0.4 + 0 * 0.4 = 0.4
Activation unit will return 0 this time because output of the sum unit is 0.5 and it is less than 0.5. We will not
update weights.
Mention the 4rd instance. x1 = 1 and x2 = 1.
Sum unit: Σ = x1 * w1 + x2 * w2 = 1 * 0.4 + 1 * 0.4 = 0.8
Activation unit will return 1 because output of the sum unit is 0.8 and it is greater than the threshold value
0.5. Its actual value should 1 as well. This means that 4th instance is predicted correctly. We will not update
anything.
43
Neural Networks
Backpropagation
Perceptron Learning Algorithm
Example1
Round 2
In previous round, we’ve used previous weight values for the 1st instance and it was classified correctly. Let’s
apply feed forward for the new weight values.
Remember the 1st instance. x1 = 0 and x2 = 0.
Sum unit: Σ = x1 * w1 + x2 * w2 = 0 * 0.4 + 0 * 0.4 = 0.4
Activation unit will return 0 because sum unit is 0.4 and it is less than the threshold value 0.5. The output of
the 1st instance should be 0 as well. This means that the instance is classified correctly. We will not update
weights.
Feed forward for the 2nd instance. x1 = 0 and x2 = 1.
44
Neural Networks
Backpropagation
Perceptron Learning Algorithm
Example1
Sum unit: Σ = x1 * w1 + x2 * w2 = 0 * 0.4 + 1 * 0.4 = 0.4
Activation unit will return 0 because sum unit is less than the threshold 0.5. Its output should be 0 as well.
This means that it is classified correctly and we will not update weights.
We’ve applied feed forward calculation for 3rd and 4th instances already for the current weight values in the
previous round. They were classified correctly.
45
Neural Networks
Backpropagation
Perceptron Learning Algorithm
OR Function Using A Perceptron
46
Neural Networks
4
7