0% found this document useful (0 votes)
10 views

PNAL4 SingleLayerNets

Uploaded by

egeee eemmm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

PNAL4 SingleLayerNets

Uploaded by

egeee eemmm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Perceptron Networks and Applications

M. Ali Akcayol
Gazi University
Department of Computer Engineering
Content
 Perceptrons
 Linear separability
 Perceptron training algorithm
 Termination criterion
 Choice of learning rate
 Non-numeric inputs
 Adalines
 Multiclass discrimination

2
Perceptrons
 In supervised learning algorithms, the desired result is known
for samples in the training data.
 The learning algorithms are simpler for the networks
consisting of only one node in one layer.
 The modification of the weights is very simple.
 The perceptrons have simple description but limited
capabilities.
 A perceptron is defined to be a machine that learns using
examples.
 A perceptron also is defined as a stochastic gradient-descent
algortihm that separate a set of n-dimensional space linearly.

3
Perceptrons
 A perceptron has a single output whose values determine that
each input pattern belongs to which one of two classes.
 A perceptron can be represented by a single node.
 The perceptron applies a step function to the net weighted
sum of its inputs.
 The input pattern is considered to belong to one class or the
other.
 The output class is decided depending on whether the node
output is 0 or l.

4
Perceptrons
Example
 Consider two-dimensional samples (0,0), (0,1), (1,0), (-1,-1)
that belong to one class, and samples (2.1,0), (0, -2.5), (1.6,
-1.6) that belong to another class.
 These classes are linearly separable.
 The node function is a step function.
 The output of the node is 1 if the net weighted input is greater
than 2, and 0 otherwise.

x1 - x2 ≤ 2
x1 - x2 > 2
5
Content
 Perceptrons
 Linear separability
 Perceptron training algorithm
 Termination criterion
 Choice of learning rate
 Non-numeric inputs
 Adalines
 Multiclass discrimination

6
Linear separability
 If there exists a line that separates all samples of one class
from the other class, such classification problems are said to
be ‘linearly separable’.
 The line’s equation is

w0 + w1 x1 + w2 x2 = 0

 If there is perceptron with weights w0 , w1 , w2 for connections


from inputs 1, x1, x2 , the perceptron can separate samples of
two classes.
 If the samples are NOT linearly separable, i.e., no straight line
can possibly separate samples belonging to two classes, then
there cannot be any simple perceptron that achieves this task.
 This is the fundamental limitation of simple perceptrons.

7
Linear separability
 Examples of linearly non separable classes are:

 Most real-life classification problems are linearly nonseparable.

8
Linear separability
 If there is only one input dimension x, then the two-class
problem can be solved using a perceptron if and only if there
is some value x0 of x such that all samples of one class occur
for x > x0 , and all samples of the other class occur for x < x0.

9
Linear separability
 If there are three input dimensions, a two-class problem can
be solved using a perceptron if and only if there is a plane that
separates samples of different classes.
 As in the two-dimensional case, coefficients of terms
correspond to the weights of the perceptron.
 A generic perceptron for n-dimensional space.

 For this perceptron, hyperplane is .

10
Linear separability
 For spaces of higher number of input dimensions, the
geometric presentations need to be extended.
 Hyperplanes can separate samples of different classes in n-
dimensional space.
 Each hyperplane in n dimensions is defined by the equation

 Each hyperplane divides the n-dimensional space into two


regions:
1-
2-
 Training algorithms used to obtain the weights of a suitable
perceptron.

11
Content
 Perceptrons
 Linear separability
 Perceptron training algorithm
 Termination criterion
 Choice of learning rate
 Non-numeric inputs
 Adalines
 Multiclass discrimination

12
Perceptron training algorithm
 Perceptron training algorithm can be used to obtain
appropriate weights of a perceptron that separates two
classes.
 Using weight values, the equation of the hyperplane that
divide the solution space can be derived.
 The developed perceptron can be used to classify new
samples.
 Dot product or scalar product of two vectors, w and x, is
defined as follows,

 Euclidean length ǁvǁ of a vector v is defined as,


13
Perceptron training algorithm
 The presentation of the learning is simplified by using
perceptron output values  {-1, 1} instead of {0, 1}.
 Weight values are randomly chosen between 0 and 1.
 It is assumed that the perceptron with weight vector w has
output 1 if w.x > 0, and output -1 otherwise.
 If the network output differs from the desired output, the
weights must be changed, otherwise cannot be changed.
 If a sample (i) belongs to class 0, but w.i > 0, then the weight
vector needs to be modified.
 After each modification, the sample would have a better
chance in the following iteration.

14
Perceptron training algorithm
 If i belongs to a class (desired node output is -1) but w.i > 0,
then the weight vector needs to be modified to w + Δw
so that (w + Δw).i < w.i
 Δw = -η.i, where η > 0.
 After modification of the weight, i would have a better chance
of being classified correctly in the following iteration.

15
Perceptron training algorithm
 If i belongs to a class (desired node output is 1) but w.i < 0,
then the weight vector needs to be modified to w + Δw
so that (w + Δw).i > w.i
 Let i1 , i2 , …, ip denote the training set, containing p input
vectors.
 We define a function that maps each sample to either +1 (C1)
or -1 (C0).
 Samples are presented repeatedly to train the weights.

16
Perceptron training algorithm
Example
 Let there be 7 one-dimensional input patterns as shown
below.

 The 7 input paterns can be separable linearly.


 Samples {0.0, 0.17, 0.33, 0.50} belong to one class (desired
output 0), and samples {0.67, 0.83, 1.0} belong to the other
class (desired output 1).
 For the initial randomly chosen value of w1 = -0.36, and
w0 = -1.0, {0.83, 0.67, 1.0} are misclassified.

17
Perceptron training algorithm
Example – cont.
 For the input value 0.83, output is (0.83)(-0.36) – 1.0 = -1.2
 Then the sample has calculated class 0, which is an error (it
would be 1).
 For η = 0.1, new weights are calculated as,

 For the new weights, some samples are still misclassified.


 The weights are modified iteratively and the final weight
values are,
w1 = 0.3
w0 = -0.2
18
Perceptron training algorithm
Example – cont.
 The progress of the training process.

 What is the reason of the oscillations on weight values?


19
Perceptron training algorithm
 There are some important questions:
 How long should we execute this training procedure?
 What is the termination criterion (if the given samples are
not linearly separable)
 What is the appropriate choice of the learning rate?
 How can the perceptron training algorithm be applied to
problems in which the inputs are non-numeric values (color,
label, name, …)?
 Is there a guarantee that the training algorithm will always
succeed whenever the samples are linearly separable?
 Can the perceptron training algorithm work reasonably well
when samples are not linearly separable?

20
Content
 Perceptrons
 Linear separability
 Perceptron training algorithm
 Termination criterion
 Choice of learning rate
 Non-numeric inputs
 Adalines
 Multiclass discrimination

21
Termination criterion
 For many ANN learning algorithms, the termination criterion is
″stop when the goal is achieved″.
 For any kind of classifier, the goal is the correct classification
of all samples.
 So the perceptron training algorithm runs until all samples are
correctly classified.
 For perceptron, termination is assured if η sufficiently small
and samples are linearly separable.
 If η is not appropriate or samples are not linearly separable,
the algorithm runs indefinitely.
 How can we detect that this may be the case?

22
Termination criterion
 The amount of progress achieved in the recent past can be
used to terminate the training.
 For linear classifier, if the number of correct classification has
not changed in large of steps, the samples may not be linearly
separable.
 The same problem may be occurred with the inappropriate
choice of η.
 The different values of η may yield improvement for training
phase.

23
Termination criterion
 In some problems, two classes overlap (not linearly
separable).
 If the performance requirements allow some amount of
misclassification, we can modify the termination criterion.
 For example, it may known that at least 6% of the samples
will be misclassified (or user satisfied with 6%), the
termination criterion should be modified.
 We can then terminate the training algorithm as soon as 94%
of the samples are correctly classified.

24
Content
 Perceptrons
 Linear separability
 Perceptron training algorithm
 Termination criterion
 Choice of learning rate
 Non-numeric inputs
 Adalines
 Multiclass discrimination

25
Choice of learning rate
 The examination of extreme cases can help derive a good
choice for η.
 If η is too large (e.g. 1.000.000), then the components of
Δw = ±ηx can have very large magnitudes.
 If η is too large, each weight update swings perceptron
outputs completely in one direction as a result, the perceptron
considers all samples to be in the same class.
 The system oscillates between extremes.
 If η is very small (e.g. η = 0) the weights are never going to
be modified.
 If η equals some too small value, the change in the weights in
each step going to be too small. This makes the algorithm
exceedingly slow.
26
Choice of learning rate
 If η is too large, the progress will start very fast, but eventually
jump around the optimal solution and will never settle down.
 If η is too small, the training will eventually converge to the best
state, but this will take a long time.
 To find a fairly good learning rate, the network should be trained by
using various learning rates.

27
Choice of learning rate
 What is an appropriate choice for η, which is neither too small
nor too large?
 A common choice is η = 1, leading to the simple weight
change computational rule of Δw = ±x,
so that (w + Δw).x = w.x ± x.x
 If |w.x| > |x.x|, the sample x may not be correctly classified.
 In order to ensure that the sample x correctly classified,
(w + Δw).x and x.x have opposite signs.

28
Content
 Perceptrons
 Linear separability
 Perceptron training algorithm
 Termination criterion
 Choice of learning rate
 Non-numeric inputs
 Adalines
 Multiclass discrimination

29
Non-numeric inputs
 In some problems, the input dimensions are non-numeric.
 For example, input dimension may be ″color″.
 Its values may range over the set {red, blue, green, yellow}.
 We may not establish a relationships between colors on an
axis.
 The simplest way is to generate four new dimensions (″red″,
″blue″, ″green″, ″yellow″).
 We can replace each original attribute-value pair by a binary
vector.
 For instance, color = ″green″ is represented by the input
vector (0, 0, 1, 0), ″blue″ is (0, 1, 0, 0).
 The disadvantage of this approach is a drastic increase in the
number of dimensions.

30
Non-numeric inputs
Example
 The day of the week (Sunday/Monday/ . . .) is an important
variable in predicting the amount of electric power consumed
in a city.
 However, there is no obvious way of sequencing weekdays.
 So it is not appropriate to use a single variable whose values
range from 1 to 7.
 Instead, seven different variables should be chosen and each
input sample has a value of 1 for one of these coordinates,
and a value of 0 for others.
 For instance, ″Tuesday″ is represented as (0, 0, 1, 0, 0, 0, 0),
″Monday″ is (0, 1, 0, 0, 0, 0, 0).

31
Content
 Perceptrons
 Linear separability
 Perceptron training algorithm
 Termination criterion
 Choice of learning rate
 Non-numeric inputs
 Adalines
 Multiclass discrimination

32
Adalines
 The fundamental principle underlying the perceptron learning
algorithm is to modify weights to reduce the number of
misclassifications.
 Perfect classification using a linear element may not be
possible for all problems.
 Minimizing the mean squared error (MSE) instead of the
number of misclassified samples may be used while training.
 An adaptive linear element or Adaline, proposed by Widrow
(1959, 1960), is a simple perceptron-like system.

33
Adalines
 Adaline accomplishes classification by modifying weights in
such a way as to diminish the MSE at each iteration.
 This can be accomplished using gradient descent.
 MSE is a quadratic function whose derivative exists
everywhere.
 Unlike the perceptron, this algorithm implies that weight
changes are made to reduce MSE.
 Even when a sample is correctly classified by the network, the
weights may change.

34
Adalines
 In the training process, when a sample is presented to the
network, the linear weighted net input is computed.
 Computed net value is compared with the desired output.
 Generated error signal used to modify each weight in the
Adaline.
 The weight change rule use partial derivative with respect to
weights.

35
Adalines
 Let be an input vector for which dj is the
desired output value.
 Let be the net input to the node.
 is the presented value of the weight vector.
 The squared error is

 The weight update rule is

36
Adalines
 Adaline Least-Mean-Squares (LMS) training algorithm

 The weight vector w is changed when the input vector ij is


presented to the Adaline.
37
Adalines
 A modification on this LMS rule has been made by Widrow and
Hoff.
 The weight change magnitude independent of the magnitude
of the input vector.
 -LMS (or Widrow-Hoff delta rule) training rule is

where, dj is the desired output for the jth input ij ,


ǁ i ǁ denotes the length of vector i.

where, l is the length of the input vector.


38
Content
 Perceptrons
 Linear separability
 Perceptron training algorithm
 Termination criterion
 Choice of learning rate
 Non-numeric inputs
 Adalines
 Multiclass discrimination

39
Multiclass discrimination
 So far, we have considered dichotomies, or two-class
problems.
 Many important real-life problems require partitioning data
into three or more classes.
 For example, the character recognition problem consists of
distinguishing between samples of 29 (for Turkish alphabet)
different classes.
 A layer of perceptrons or Adalines may be used to solve some
such multiclass problems.
 Four perceptrons can put together to solve a four-class
classification problem.

40
Multiclass discrimination
 Each weight wi,j indicates the strength of the connection jth
input to the ith node.
 A sample is considered to belong to the ith class if and only if
the ith output oi = 1, and every other output ok = 0, for k ≠ i.
 This network is trained in the same way as perceptrons.
 If all outputs are zeroes or if more than one output value
equals 1, the network is considered to have failed in the
classification task.
 All outputs can have values in between 0 and 1, a ‘maximum-
selector’ can be used to select the highest-value output.

41
Homework

 Prepare a report on the use of artificial neural networks in the


speech-to-text and text-to-speech applications.

42

You might also like