0% found this document useful (0 votes)
40 views

Introduction To Neural Networks: John Paxton Montana State University Summer 2003

This document provides an introduction and overview of simple neural networks for pattern classification. It discusses the architecture of a basic neural network with one input and one output layer. It also covers representations of input data, interpreting weights, modeling simple problems, linear separability, Hebb's learning rule, perceptrons, activation functions, learning rules, convergence of the perceptron learning rule, Adalines, the delta rule, and Madalines.
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

Introduction To Neural Networks: John Paxton Montana State University Summer 2003

This document provides an introduction and overview of simple neural networks for pattern classification. It discusses the architecture of a basic neural network with one input and one output layer. It also covers representations of input data, interpreting weights, modeling simple problems, linear separability, Hebb's learning rule, perceptrons, activation functions, learning rules, convergence of the perceptron learning rule, Adalines, the delta rule, and Madalines.
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 31

Introduction to Neural

Networks
John Paxton
Montana State University
Summer 2003
Chapter 2: Simple Neural Networks
for Pattern Classification

1 x0
w0 w0 is the bias

f(yin) = 1 if yin >= 0


w1 y f(yin) = 0 otherwise
x1

ARCHITECTURE
wn
xn
Representations
• Binary: 0 no, 1 yes

• Bipolar: -1 no, 0 unknown, 1 yes

• Bipolar is superior
Interpreting the Weights
• w0 = -1, w1 = 1, w2 = 1
• 0 = -1 + x1 + x2 or x2 = 1 – x1

YES

x1
NO

x2 decision boundary
Modelling a Simple Problem
• Should I attend this lecture?
• x1 = it’s hot
x
• x2 = it’s raining 0
2.5

-2 y
x1

1
x2
Linear Separability

1 1 1 0
0 1

0 0 1 0 1
0

AND OR XOR
Hebb’s Rule
• 1949. Increase the weight between two
neurons that are both “on”.

• 1988. Increase the weight between two


neurons that are both “off”.

• wi(new) = wi(old) + xi*y


Algorithm
1. set wi = 0 for 0 <= i <= n
2. for each training vector
3. set xi = si for all input units
4. set y = t
5. wi(new) = wi(old) + xi*y
Example: 2 input AND
s0 s1 s2 t

1 1 1 1

1 1 -1 -1

1 -1 1 -1

1 -1 -1 -1
Training Procedure
w0 w1 w2 x0 x1 x2 y

0 0 0 1 1 1 1

1 1 1 1 1 -1 -1 (!)

0 0 2 1 -1 1 -1 (!)

-1 1 1 1 -1 -1 -1

-2 2 2
Result Interpretation
• -2 + 2x1 + 2x2 = 0 OR
• x2 = -x1 + 1

• This training procedure is order dependent


and not guaranteed.
Pattern Recognition Exercise
• #.# .#.
.#. #.#
#.# .#.

“X” “O”
Pattern Recognition Exercise
• Architecture?
• Weights?
• Are the original patterns classified
correctly?
• Are the original patterns with 1 piece of
wrong data classified correctly?
• Are the original patterns with 1 piece of
missing data classified correctly?
Perceptrons (1958)
• Very important early neural network
• Guaranteed training procedure under
certain circumstances
1 x0
w0

w1 y
x1

wn
xn
Activation Function
• f(yin) = 1 if yin > 
f(yin) = 0 if - <= yin <= 
f(yin) = -1 otherwise

• Graph interpretation
1

-1
Learning Rule
• wi(new) = wi(old) + *t*xi if error
•  is the learning rate
• Typically, 0 <  <= 1
Algorithm
1. set wi = 0 for 0 <= i <= n (can be random)
2. for each training exemplar do
3. xi = si
4. yin =  xi*wi
5. y = f(yin)
6. wi(new) = wi(old) + *t*xi if error
7. if stopping condition not reached, go to 2
Example: AND concept
• bipolar inputs
• bipolar target
• =0
• =1
Epoch 1
w0 w1 w2 x0 x1 x2 y t

0 0 0 1 1 1 0 1

1 1 1 1 1 -1 1 -1

0 0 2 1 -1 1 1 -1

-1 1 1 1 -1 -1 -1 -1
Exercise
• Continue the above example until the
learning algorithm is finished.
Perceptron Learning Rule
Convergence Theorem
• If a weight vector exists that correctly
classifies all of the training examples, then
the perceptron learning rule will converge
to some weight vector that gives the
correct response for all training patterns.
This will happen in a finite number of
steps.
Exercise
• Show perceptron x1 x2 x3 y
weights for the 2-of-3 1 1 1 1
concept
1 1 -1 1
1 -1 1 1
1 -1 -1 -1
-1 1 1 1
-1 1 -1 -1
-1 -1 1 -1
-1 -1 -1 -1
Adaline (Widrow, Huff 1960)
• Adaptive Linear Network
• Learning rule minimizes the mean squared
error
• Learns on all examples, not just ones with
errors
Architecture

1 x0
w0

w1 y
x1

wn
xn
Training Algorithm
1. set wi (small random values typical)
2. set  (0.1 typical)
3. for each training exemplar do
4. xi = si
5. yin =  xi*wi
6. wi(new) = wi(old) + *(t – yin)*xi
7. go to 3 if largest weight change big
enough
Activation Function
• f(yin) = 1 if yin >= 0
• f(yin) = -1 otherwise
Delta Rule
• squared error E = (t – yin)2

• minimize error E’ = -2(t – yin)xi


= (t – yin)xi
Example: AND concept
• bipolar inputs x0 x1 x2 yin t E
• bipolar targets
• w0 = -0.5, w1 = 0.5, 1 1 1 .5 1 .25
w2 = 0.5
• minimizes E 1 1 -1 -.5 -1 .25

1 -1 1 -.5 -1 .25

1 -1 -1 -1.5 -1 .25
Exercise
• Demonstrate that you understand the
Adaline training procedure.
Madaline
• Many adaptive linear neurons

1 1

y
x1 z1

xm zk
Madaline
• MRI (1960) – only learns weights from
input layer to hidden layer

• MRII (1987) – learns all weights

You might also like