Unit 2-nn
Unit 2-nn
UNIT –II
PERCEPTRON
Contents
Single layer Perceptron: Adaptive Filtering Problem
Unconstrained Organization Techniques
Linear Least Square Filters
Least Mean Square Algorithm
Learning Curves, Learning Rate Annealing Techniques
Perceptron Convergence Theorem
Relation Between Perceptron and Bayes Classifier for a Gaussian Environment
Multilayer Perceptron: Back Propagation Algorithm
XOR Problem
Heuristics
Output Representation and Decision Rule
Feature Detection
2
Adaptive Filtering Problem
Dynamic System The external behavior of the system:
T: {x(i), d(i); i=1, 2, …, n, …}
where x(i)=[x1(i), x2(i), …, xm(i)]T
x(i) can arise from:
Spatial: x(i) is a snapshot of data.
Signal-flow Graph of the Temporal: x(i) is uniformly spaced in time.
Adaptive Filter
Filtering Process
y(i) is produced in response to x(i).
e(i) = d(i) - y(i)
Adaptive Process
Automatic Adjustment of the synaptic
weights in accordance with e(i).
m y (i ) xT (i )w (i )
e(i ) d (i ) y (i )
3 y (i ) v(i ) wk (i ) xk (i )
where w(i) w1 (i ), w2 (i ),..., wm (i )
T
k 1
Important Points
Minimize the quadratic approximation of the cost function C(w) around the
current point w.
Applying second-order Taylor series expansion of C(w) around
Cw(n).
w n C w n 1 C w n
1 C(w) is minimized when
g n w n w n H n w n
T T
2 C w n
C 2
C
2
C
2 gn H n w n 0
w n
2 w 2
1 w 1w 2 w 1 wm w n H 1
n gn
C C C
2 2
H C w w w
2
w22
w2 wm w n 1 w n w n
2 1
2 w n H 1
n gn
C C C
2 2
wm w1 wm w2 wm2 Generally speaking, Newton’s
method converges quickly
7
Gauss-Newton Method
2
T
1
e(n) J n w w n e(n) J n w w n
2
T
1
eT (n) w w n J T n e(n) J n w w n
2
T
2
1 T 2 T T
e (n) eT (n)J n w w n w w n J T n e(n) w w n J T n J n w w n
eT (n)J n w w n w w n J T n e(n) and both of them are scalars.
T T
1 1 1
e(n, w ) eT (n) eT (n)J n w w n w w n J T n J n w w n
2 2 T
2 2 2
Differentiating this expression with respect to w and setting the result to be zero.
T
T
J n e( n ) J n J n w w n 0 w n 1 w n J T
n Jn 1 T
J n e(n)
To guard against the possibility that J(n) is rank deficient.
9
w n 1 w n J T n J n I J T n e(n)
1
Linear Least-Squares Filter
d(n) X(n)w n
where d(n)=[d(1), d(2),…, d(n)]T X(n)=[x(1), x(2),…, x(n)]T
e(n)
e ( n ) X T ( n ) J ( n) X( n)
w n
Substituting it into equation derived from Gauss-Newton Method
w n 1 w n XT n Xn XT n d(n) Xn w n
1
XT n Xn XT n d(n)
1
10
Let X n XT n Xn XT n
1
w n 1 X n d(n)
Wiener Filter
Limiting form of the Linear Least-Squares Filter for an
Ergodic Environment
Let w0 denote the Wiener solution to the linear optimum filtering problem.
n n
w 0 lim w n 1 lim XT n Xn XT n d(n)
1
n
lim XT n Xn lim XT n d(n)
1
n
R x1rxd
11
Least-Mean-Square (LMS) Algorithm
Virtues
– Simplicity
Limitations
– Slow rate of convergence
– Sensitivity to variations in the eigenstructure of
the input
13
Learning Curve
14
Learning Rate Annealing
15
Perceptron
The simplest form used for the classification of patterns said to be linearly
separable.
Bias, b m
x1
w1 v wi xi b
vk j(×) m
vn wi n xi n
w2 Output i 1
x2
Hard yk
...
Inputs
liniter Let x0=1 and b=w0 i 0
...
w T n xn
wm
xm
Goal: Classify the set {x(1), x(2), …, x(n)} into one of two classes, C1 or C2.
Decision Rule: Assign x(i) to class C1 if y=+1 and to class C2 if y=-1.
Algorithms:
w(n+1)=w(n) if wTx(n) > 0 and x(n) belongs to class C1
1.
w(n+1)=w(n) if wTx(n) 0 and x(n) belongs to class C2
w(n+1)=w(n)-(n)x(n) if wTx(n) > 0 and x(n) belongs to class C2
2.
w(n+1)=w(n)+(n)x(n) if wTx(n) 0 and x(n) belongs to class C1
1 if x(n) belongs to class C1
Let d n
1 if x(n) belongs to class C2
w(n+1) = w(n) + [d(n)-y(n)]x(n) (Error-correction learning rule form)
18
Perceptron (Cont.)
19
Perceptron (Cont.)
20
Perceptron (Cont.)
21
Perceptron (Cont.)
22
Perceptron (Cont.)
23
Perceptron (Cont.)
For n=nmax
24
Perceptron (Cont.)
25
Perceptron (Cont.)
26
Perceptron (Cont.)
27
Perceptron Convergence
Algorithm
28
Perceptron (Cont.)
29
Relation between the Perceptron
and Bayes Classifier
30
Relation between the Perceptron and
Bayes Classifier
31
Relation between the Perceptron and
Bayes Classifier
32
Relation between the Perceptron and
Bayes Classifier
33
Relation between the Perceptron and
Bayes Classifier
Likelihood function
Threshold
34
Relation between the Perceptron and
Bayes Classifier
Key Points:
35
Relation between the Perceptron and
Bayes Classifier
C- Covariance Matrix
C- non-diagonal Matrix & non-singular matrix
(C-1 exists)
36
Relation between the Perceptron and
Bayes Classifier
Linear Classifier
38
Relation between the Perceptron and
Bayes Classifier
39
Relation between the Perceptron and Bayes Classifier
Perceptron Bayes Classifier