0% found this document useful (0 votes)
98 views

The Mcculloch Neuron (1943) : A B G B P W G A

The McCulloch Neuron (1943) describes a mathematical model of a neuron that sums its weighted input signals and produces an output signal through an activation or step function. The inputs and weights are represented as vectors in geometric space, with the neuron acting as a linear classifier that divides the space into two regions based on whether the weighted sum is above or below a threshold. Later developments extended the McCulloch neuron model to perform more complex logical operations and represent nonlinear decision boundaries through multi-layer networks of artificial neurons.

Uploaded by

Heyder Araujo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
98 views

The Mcculloch Neuron (1943) : A B G B P W G A

The McCulloch Neuron (1943) describes a mathematical model of a neuron that sums its weighted input signals and produces an output signal through an activation or step function. The inputs and weights are represented as vectors in geometric space, with the neuron acting as a linear classifier that divides the space into two regions based on whether the weighted sum is above or below a threshold. Later developments extended the McCulloch neuron model to perform more complex logical operations and represent nonlinear decision boundaries through multi-layer networks of artificial neurons.

Uploaded by

Heyder Araujo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

The McCulloch Neuron (1943)

p 1
w 1

p w
2 2
+
b

p n
w n

for n=2
w1 p1 + w 2 p 2 = b
n
a = g wi p i b = g ( w t p b ) a [ 0;1] p
i =1 2

g = step function
B
The euclidian space n is divided in two regions A and B
p
1

Laboratrio de Automao e Robtica - A. Bauchspiess Soft Computing - Neural Networks and Fuzzy Logic 51
The McCulloch Neuron
as patterns classifier
o x o
o o
o o o
o o
o x o x
x x o x
x x
x x x x x x

Linearly separable collections Linearly dependent (non-separable) collections

Some Boolean functions of two variables represented in a binary plan.

Laboratrio de Automao e Robtica - A. Bauchspiess Soft Computing - Neural Networks and Fuzzy Logic 52
Linear and Non-Linear Classifiers

There exist 2 =2
m 2n
possible logical functions connecting n inputs to one binary output.

n # of binray # of logical # linearly % linearly


patterns functions separable separable
1 2 4 4 100
2 4 16 14 87,5
3 8 256 104 40,6
4 16 65536 1.772 2,9
5 32 4,3 x 109 94.572 2,2 x 10-3
6 64 1,8 x 1019 5.028.134 3,1 x 10-13

The logical functions of one variable:

A, A , 0, 1

The logical functions of two variables:

A, B, A , B , 0, 1
A B, A B, A B, A B,
A B , A B , A B , A B , A B, A B

Laboratrio de Automao e Robtica - A. Bauchspiess Soft Computing - Neural Networks and Fuzzy Logic 53
Two Step Binary Perceptron

The neuron 6 implements a logical AND function by choosing


5
b6 = w
i =3
i6 .
For example:
1
w36 = w46 = w56 = ; b6 = 1 a 6 = 1 if and only if a3 = a 4 = a5 = 1
3

Laboratrio de Automao e Robtica - A. Bauchspiess Soft Computing - Neural Networks and Fuzzy Logic 54
Three Step Binary Perceptron
w
1 3
3 w 3 9

p 2

w 1
4 4 9
9 A
w 5 9
A B
p
1
5
a 1 1

10
p
2
p 1
6
1
B
a =A B
11 ^
7 10

Laboratrio de Automao e Robtica - A. Bauchspiess Soft Computing - Neural Networks and Fuzzy Logic 55
Neurons and Artificial Neural Networks
Micro-structure
characteristics of each neuron in the network
Meso-Structure
organization of the network
Macro-Structure
association of networks, eventually with some analytical processing
approach for complex problems

p
1
w 1

p w
2 2
+
b Bias input

p w
n n
Bias:
with p=0,
output 0 still possible !

Laboratrio de Automao e Robtica - A. Bauchspiess Soft Computing - Neural Networks and Fuzzy Logic 56
Typical activation functions
Linear f ( s) = s Hopfield purelin f(s)
BSB

Signal + 1 se s 0 Perceptron hardlims f(s)


f (s) = 1
1 se s < 0

s
-1

Step + 1 se s 0 Perceptron hardlim f(s)


f ( s) = BAM 1
0 se s < 0

Hopfield/ +1 se s > 0 Hopfield f(s)


BAM BAM 1
f (s) = 1 se s < 0
unchanged if s = 0
s
-1

Laboratrio de Automao e Robtica - A. Bauchspiess Soft Computing - Neural Networks and Fuzzy Logic 57
Typical activation functions

BSB or K se s K BSB satlin f(s)


Logical satlins K
f ( s) = s se K < s < + K
Threshold + K se s + K
s
-K

Logstics 1 Perceptron logsig f(s)


f ( s) = 1
1 + e s Hopfield
BAM, BSB
s

Hiperbolic 1 e 2 s Perceptron tansig f(s)


Tangent f ( s) = tanh(s) = Hopfield 1
1 + e 2s BAM, BSB
s
-1

Laboratrio de Automao e Robtica - A. Bauchspiess Soft Computing - Neural Networks and Fuzzy Logic 58
Meso-Structure Network Organization...

# neurons per layer


# network layers
# connection type (forward, backward, lateral).

1- Multilayer Feedforward

Multilayer Perceptron (MLP)

Laboratrio de Automao e Robtica - A. Bauchspiess Soft Computing - Neural Networks and Fuzzy Logic 59
Meso-Structure Network Organization...

2- Single Layer laterally connected (BSB (self-feedback), Hopfield)

3 Bilayers Feedforward/Feedbackward

Laboratrio de Automao e Robtica - A. Bauchspiess Soft Computing - Neural Networks and Fuzzy Logic 60
Meso-Structure Network Organization

4 Multilayer Cooperative/Comparative Network

5 Hybrid Network

Sub- Sub-
Network
Rede 1 Network
Rede 2
1 2

Laboratrio de Automao e Robtica - A. Bauchspiess Soft Computing - Neural Networks and Fuzzy Logic 61
Neural Macro-Structure

Rede
NetW. 1
- # networks
- connection type
- size of networks
Rede 2a
NetW.2 Rede
NetW. 2b Rede
NetW. 2c
- degree of connectivity

Rede
NetW. 3

Laboratrio de Automao e Robtica - A. Bauchspiess Soft Computing - Neural Networks and Fuzzy Logic 62
Supervised Learning
d
-
x y
w w + x dy
Delta Rule Perceptron __
learning rate
Widrow-Hoff delta rule (LMS) ADALINE, MADALINE

Generalized Deta Rule

j x ij
wij wij +
x k2

Widrow-Hoff Delta Rule (LMS)

Laboratrio de Automao e Robtica - A. Bauchspiess Soft Computing - Neural Networks and Fuzzy Logic 63
Delta rule Perceptron
d
Perceptron Rosenblatt, 1957 -
x y
Dynamics: __

sj = w
i
ij pij + bj
1 bj
+ 1 se s j 0
y j = f (s j ) = p w
0 se s j < 0
1 j 1 j

p w sj
yj
2 j 2 j
+

pn j
w n j

j = dj yj

wij wij + j xij Delta Rule Psychology Reasoning:


- positive reinforcement
- learning rate
j = 0 the weight is not changed. - negative reinforcement

Laboratrio de Automao e Robtica - A. Bauchspiess Soft Computing - Neural Networks and Fuzzy Logic 64
ADALINE and MADALINE
Widrow & Hoff, 1960 (Mult.) Adaptive Linear Element

y j = wij pij + b j
i

Training:

j = d j sj = d j
wij pij + b j


j xij
wij wij +
x 2
k
Widrow-Hoff delta rule
LMS Least Mean Squared algorithm
0.1< <1 stability and convergency speed.
MatLab: NEWLIN, NEWLIND, ADAPT, LEARNWH

Obs : j j wij wij + j xij Delta Rule

Laboratrio de Automao e Robtica - A. Bauchspiess Soft Computing - Neural Networks and Fuzzy Logic 65
LMS Algorithm
Objective: learn a function f : n from the samples (xk, dk)

{ xk}, {dk } and {ek } stationary stochastic processes


e = d y actual stochastic error Linear neuron
n
y= x w
i =1
i i
= xw t

Expected value
E[e2] = E[(d-y)2]
= E[(d-xwt)2]
= E[d2 ] 2E[dx]wt + wE [xtx] wt

Assuming w deterministic.
With
E [xtx] R autocorrelation input matrix
E [dx] P cross correlated vector Optimal analytic
w* = PR-1 solution of the
E[e2 ] = E[d2 ] 2Pwt + wRwt optimization
(solvelin.m)
0 = 2w*R 2P

(Partial derivatives equal 0 for optimal w*)

Laboratrio de Automao e Robtica - A. Bauchspiess Soft Computing - Neural Networks and Fuzzy Logic 66
Iterative LMS Algorithm
Objective: adaptively learn a function f : n from the samples (xk, dk)

Knowing P and R, R-1 , then for some w:

w E[e2 ] = 2wR 2P

Post-multiplyting by R-1

w E[e2 ] R-1 = w 2P R-1 = w w*

w* = w w E[e2 ] R-1 i
*

wk+1 = wk ck w E[e2 ] R-1


(ck = Newtons method) How to, cautiously find
new (better ) values for
LMS Hypothesis:
E[e2k+1| e20 , e21, ... e2k] = e2k
wi , the free parameters ?

Laboratrio de Automao e Robtica - A. Bauchspiess Soft Computing - Neural Networks and Fuzzy Logic 67
Iterative LMS Algorithm...
assuming R = I estimated steppest decent algorithm:

wk+1 = wk ck w e2k

Gradient of e2k with respect to w


ek2 ek2
w e2k = , L

1w w n

(d k yk ) 2 ( d k yk ) 2
= , L
i

w1 wn *

y y
= 2 (d k yk ) k , L 2(d k yk ) k
w1 wn j xij
wij wij +
y
= 2ek k , L
yk

x k2 Norma-
lization

1w wn

[ ]
Iterative (adaptive) solution
= 2ek x , L x = 2ek x k
1
k
n
k ( yk = x k w ) t
k
(The optimal solution is never reached!)
MADALINE i-input, j-neuron
LMS algorithm reduces to wk+1 = wk + 2ck ek xk
Laboratrio de Automao e Robtica - A. Bauchspiess Soft Computing - Neural Networks and Fuzzy Logic 68
The Multilayer Perceptron
- The Generalized Delta Rule
Rumelhart, Hinton e Williams, PDP/MIT, 1986

p1 = x1(0) x1(1)
x1(2) = y1
x2(1)
p2 = x2(0)
x2(2) = y2

p3 = x3(0)
x3(1)

Neuron Dynamics:

s (jk ) = w0( kj ) + wij( k ) xi( k 1)


Processing Element (PE) j Turning Point Question:
in layer k
input i i
How to find the error
associated with an
with f (activation function) x (jk ) = f ( s (jk ) ) internal neuron??
continuous differentiable

Laboratrio de Automao e Robtica - A. Bauchspiess Soft Computing - Neural Networks and Fuzzy Logic 69
The generalized delta rule
Training p1 = x1( 0) x1(1)
x1(2) = y1
m
=
2
j =1
(d j y j )2 - quadratic error p2 = x2(0)
x 2(1)
x2(2) = y2

w (jk ) = ( woj( k ) , w1( kj ) ,..., wmj


(k )
) - weigths of PE j p3 = x3( 0)
x3(1)

x (jk 1) = (1, x1( kj 1) ,..., xnj( k 1) ) - input vector of PE j


s (jk )
With s (jk ) = w (jk ) x (jk 1) = x (jk 1)
w (jk )

Instantaneous gradient:
so (jk ) = ( k ) = ( k ) x (jk 1)
2 2

2 2 2 2 w j s j
(jk ) = = (k ) , , L
w j
(k )
w0 j w1 j
(k )
wmj
(k )

1 2
Defining the quadratic derivative error as (k )
j =
2 s (jk )
s j
(k )
2 2
(jk ) = =
w (jk ) s (jk ) w (jk ) (jk ) = 2 (j k ) x (jk 1) Gradient of the error with respect
to the weights as function of the
former layer signals!!

Laboratrio de Automao e Robtica - A. Bauchspiess Soft Computing - Neural Networks and Fuzzy Logic 70
The generalized delta rule...
p1 = x1( 0) x1(1)
For the output layer, the quadratic derivative error is: x1(2) = y1
x 2(1)
Nk Nk p2 = x2(0)
1

i =1
( d i yi ) 2
1

i =1
( d i f ( si( k ) )) 2 x2(2) = y2
(j k ) = = p3 = x3( 0)
2 s (jk ) 2 s (jk ) x3(1)

The partial derivatives are 0 for i j

1 ( d j f ( s j )) ( k ) ( d j f ( s j ))
(k ) 2 (k )
(j k ) = = ( d j f ( s j )) = ( d j x (k )
j ) f ( s (k )
j )
2 s (jk ) s (jk )

The output error associated with PEj, in the last layer:

(jk ) = d j x (jk ) = d j y j

Giving:

(j k ) = (jk ) . f ( s (jk ) )
Remember,
activation function, f, continuous differentiable

Laboratrio de Automao e Robtica - A. Bauchspiess Soft Computing - Neural Networks and Fuzzy Logic 71
The generalized delta rule...
p1 = x1( 0) x1(1)
For a hidden layer k, the quadratic derivative error x1(2) = y1
can be calculated using the linear outputs of layer k+1: x 2(1)
p2 = x2(0)
x2(2) = y2

1 2 1 N k +1 2 si( k +1) p3 = x3( 0)


(k )
= = ( k +1) x3(1)
si( k )
(Chain Rule)
2 s (jk ) 2 i =1 si
j

considering

1 2 si( k +1) N k +1 ( k +1) si( k +1)
N k +1
= ( k ) = i s j
(k )
f( )
sl
(k )
= 0 if l j and that
s j
(k )
( )
f s (jk ) = f s (jk ) ( )
( k +1) (k )
i =1 2 si si i =1 si

Nk
We have: (k )
j ( )
N k +1 ( k +1) ( k +1)
= i w ji . f s (jk ) ( )
Taking into account that s (jk ) = w0( kj ) + wij( k ) xi( k 1) 1i =41
42443
i =1 (k )
j
N k +1
( )
( k +1) Nk
( k +1)
(j k ) = i si( k )
w
0i
+ wli( k +1) f sl( k )
Finally, the quadratic derivative errror for a hidden layer:
i =1 l =1
(j k ) = (jk ) . f ( s (jk ) )
( )
N k +1 Nk

(k )
j = ( k +1)
i wli( k +1)
si( k )
f sl( k )

i =1 l =1

Laboratrio de Automao e Robtica - A. Bauchspiess Soft Computing - Neural Networks and Fuzzy Logic 72
The Error Backpropagation algorithm

1. wij( k ) random , initialize the network weigths


m
2. for (x,d), training pair, obtain y. Feedforward propagation: = 2
(d
j =1
j y j )2

3. k last layer
4. for each element j in the layer k do:
Compute (kj ) using (jk ) = d j x (jk ) = d j y j if k is the last layer,
N k +1
(k )
j = i( k +1) w(jik +1) if it is a hidden layer;
i =1

Compute (j k ) = (jk ) . f ( s (jk ) )


5. k k 1 if k > 0 go to step 4, else continue.
6. w (jk ) (n + 1) = w (jk ) (n) + 2 i( k ) x i( k )
7. For the next training pair go to step 2.

Laboratrio de Automao e Robtica - A. Bauchspiess Soft Computing - Neural Networks and Fuzzy Logic 73
The Backpropagation Algorithm in practice

1 In the standard form BP is very slow.


Ee-2 Energia da rede
2 BP Pathologies: paralysis in regions of small gradient.

3 Initial conditions can lead to local minima. PadroStart


Bad esprio Good Start
Valor Inicial

4 Stop conditions number of epochs, wij <

5 BP variants
- trainbpm (with momentum) Padro recuperado
Optimum
PadresMinima
Local armazenados
- trainbpx (adaptive learning rate)
- ....
wi,j
Estados
- trainlm (Levenberg-Marquard J, Jacobian)
e2(wi,j) - Illustrative quadratic error
W (j k ) = ( J T J + J ) 1 J T e as function of the weights

Obs: the error surface is, normally, unknown.

Steepest descent go in the opposite


direction of the local gradient (downhill).

Laboratrio de Automao e Robtica - A. Bauchspiess Soft Computing - Neural Networks and Fuzzy Logic 74
Computational Tools

SNNS
MatLab
- Neural Network Toolbox
NeuralWorks
Java
C++

Hardware Implementations of RNAs

Laboratrio de Automao e Robtica - A. Bauchspiess Soft Computing - Neural Networks and Fuzzy Logic 75
SNNS - Stuttgarter Neural Network Simulator

Laboratrio de Automao e Robtica - A. Bauchspiess Soft Computing - Neural Networks and Fuzzy Logic 76
MatLab
- complete environment purelin
Model Reference Controller

Reference
model liq 4 order
-System Simulation logsig qi
Neural
Network Control qi h4
Controller Signal h4
-Training tansig
Plant Output
h4

-Control Scope

radbas

3 f(u)
1 y(n)=Cx(n)+Du(n)
y
Fcn1 z x(n+1)=Ax(n)+Bu(n) K*u
Unit Delay 1 Discrete State -Space 2 Matrix
Gain 4
-C-
Constant 3
Switch 1 f(u)

Fcn2

f(u) 1 y(n)=Cx(n)+Du(n)
2 K*u
u z x(n+1)=Ax(n)+Bu(n) + K*u
Fcn +
Unit Delay 5 Discrete State -Space 1 Matrix
tansig Matrix
Gain 3
-C- Gain 1 netsum 1 purelin -C-
netsum 1
Constant 2 Constant 4 uhat
Switch Switch 2 Saturation 1 Zero -Order
Hold
B2_c
Constant 7

1 f(u) y(n)=Cx(n)+Du(n)
r K*u
x(n+1)=Ax(n)+Bu(n)
Fcn3
Discrete State -Space 3 Matrix
Gain 2

-C-
Constant 5 B1_c
Switch 3
Constant 6

Laboratrio de Automao e Robtica - A. Bauchspiess Soft Computing - Neural Networks and Fuzzy Logic 77
Demonstration - Perceptron
% Perceptron
% Training an ANN to learn to classify a non-linear problem
% Input Pattern
P=[ 0 0 0 0 1 1 1 1
0 0 1 1 0 0 1 1 T= 1 0 1 1 1 0 1 0
0 1 0 1 0 1 0 1]
% Target Y= 1 0 1 1 1 0 1 0
%T=[1 0 1 1 1 0 1 0] % Linear separable
T=[1 0 0 1 1 0 1 0] % non separable

% Try with Rosenblat's Perceptron T= 1 0 0 1 1 0 1 0


net=newp(P,T,'hardlim')
Y= 1 0 1 0 0 0 1 0
% train the network
net=train(net,P,T)

Y=sim(net,P)

Laboratrio de Automao e Robtica - A. Bauchspiess Soft Computing - Neural Networks and Fuzzy Logic 78
Demonstration - OCR
Training Vector

20 % Noise

ANN

Laboratrio de Automao e Robtica - A. Bauchspiess Soft Computing - Neural Networks and Fuzzy Logic 79
Demonstration OCR...
% of missclassifications Neural OCR Classifier

Training with 10 x (0,10,20,30,40,50) % noise

Noisy patterns used in training (unitl % of bits flipped)

* - error without noisy training patterns


* - error using noisy training patterns

With Some Noisy Training Pattern


Learns how to treat any noise
Laboratrio de Automao e Robtica - A. Bauchspiess Soft Computing - Neural Networks and Fuzzy Logic 80
Demonstration LMS, ADALINE, FIR
y (k ) = w0u (k ) + w1u (k 1) + w2u (k 2) + L wn u (k n)
Y ( z)
= w0 + w1 z 1 + w2 z 2 + L wn z n
U ( z)

FIR Model (always stable, only Zeros)


Obs : IIR Model is more compact, but can be unstable!

1 3
g1 (0 79.9 sec) = g 2 (80 150 sec) =
s 2 + 0.2 s + 1 s 2 + 2s + 1

System changes at 80 sec Sampling Time, Ts = 0.1sec


4

-2

(TDL Time Delay Line) -4


0 50 100 150
sec

Laboratrio de Automao e Robtica - A. Bauchspiess Soft Computing - Neural Networks and Fuzzy Logic 81
Demo LMS, ADALINE, FIR...
% ADALINE - Adaptive dynamic system identification
% First sampled system - until 80 sec
g1=tf(1,[1 .2 1]), gd1=c2d(g1,.1)
% Sytem changes dramatically - after 80 sec
g2=tf(3,[1 2 1]),gd2=c2d(g2,.1)

% Pseudo Random Binary Signal - good for identification


u=idinput(120*10,'PRBS',[0 0.01],[-1 1]);

% time vector
...
[y1,t1,x1]=lsim(gd1,u1,t1);
[y2,t2,x2]=lsim(gd2,u2,t2,x1);

% Creates new adaline nework with delayed inputs (FIR)


% Learning Rate = 0.09

net=newlin(t,y,[1 2 3 4 5 6 7 8 9 10],0.09)
[net,Y,E]=adapt(net,t,y)

% design an average transfer function


netd=newlind(t,y)

Laboratrio de Automao e Robtica - A. Bauchspiess Soft Computing - Neural Networks and Fuzzy Logic 82
Demo LMS, ADALINE, FIR...
RMSE Set 1=6.5742 u=idinput(1500,'PRBS',[0 0.01])
4

2 n=10, lr=0.1
0

-2

-4
0 500 1000 1500
ADALINE
Error
Learns System AND 4

also Changes in the Dynamics!! 2

-2

-4
0 500 1000 1500

10
RMSE Set 2=22.7817 u=idinput(1200,'PRBS',[0 0.05])

5 n=10, lr=0.1 Verification Signal


0
But, in other frequency range
not so good... -5
0 200 400 600 800 1000 1200

(needs to Adjust TDL, lr, Ts) Error


5

-5
0 200 400 600 800 1000 1200

Laboratrio de Automao e Robtica - A. Bauchspiess Soft Computing - Neural Networks and Fuzzy Logic 83

You might also like