Lecture 9
Lecture 9
Algorithm
Yinghao Wu
Department of Systems and Computational Biology
Albert Einstein College of Medicine
Fall 2014
Outline
• Background
• Supervised learning (BPNN)
• Unsupervised learning (SOM)
• Implementation in Matlab
Outline
• Background
• Supervised learning (BPNN)
• Unsupervised learning (SOM)
• Implementation in Matlab
Biological Inspiration
Idea : To make the computer more robust, intelligent, and learn, …
Let’s model our computer software (and/or hardware) after the brain
Neurons in the Brain
• Although heterogeneous, at a low level
the brain is composed of neurons
– A neuron receives input from other neurons
(generally thousands) from its synapses
– Inputs are approximately summed
– When the input exceeds a threshold the
neuron sends an electrical spike that travels
that travels from the body, down the axon, to
the next neuron(s)
Dendrites
Neuron Model
Axon
Cell Body
Synapse
• It is based on a
labeled training set. ε Class
• Tasks:
– Clustering - Group patterns based on similarity
– Vector Quantization - Fully divide up S into a small set of
regions (defined by codebook vectors) that also helps
cluster P.
– Feature Extraction - Reduce dimensionality of S by
removing unimportant features (i.e. those that do not help
in clustering P)
Supervised Vs Unsupervised
• Background
• Supervised learning (BPNN)
• Unsupervised learning (SOM)
• Implementation in Matlab
Neural Networks
● Artificial neural network (ANN) is a machine learning
approach that models human brain and consists of a number
of artificial neurons.
● Neuron in ANNs tend to have fewer connections than
biological neurons.
● Each neuron in ANN receives a number of inputs.
● An activation function is applied to these inputs which results
in activation level of neuron (output value of the neuron).
● Knowledge about the learning task is given in the form of
examples called training examples.
Contd..
● An Artificial Neural Network is specified by:
− neuron model: the information processing unit of the NN,
− an architecture: a set of neurons and links connecting neurons.
Each link has a weight,
− a learning algorithm: used for training the NN by modifying the
weights in order to model a particular learning task correctly on
the training examples.
● The aim is to obtain a NN that is trained and generalizes
well.
● It should behaves correctly on new instances of the
learning task.
Neuron
● The neuron is the basic information processing unit of a
NN. It consists of:
1 A set of links, describing the neuron inputs, with weights W1, W2,
…, Wm
2 An adder function (linear combiner) for computing the weighted
sum of the inputs: m
(real numbers) u = ∑ w jx j
j=1
y = ϕ (u + b)
The Neuron Diagram
Bias
b
x1 w1
Activation
Induced function
Field
Output
Input
x2 w2
∑ v
ϕ(−) y
values
Summing
function
xm wm
weights
Neuron Models
● The choice of activation function ϕ determines the
neuron model.
Examples:
a if v < c
● step function: ϕ ( v ) =
b if v > c
a if v < c
● ramp function:
ϕ ( v ) = b if v > d
a + (( v − c )( b − a ) /( d − c )) otherwise
c
Ramp Function
c d
Sigmoid function
Network Architectures
● Three different classes of network architectures
− single-layer feed-forward
− multi-layer feed-forward
− recurrent
+ 1 if v ≥ 0
ϕ ( v ) =
− 1 if v < 0
b (bias)
x1
w1
v y
x2 w2
ϕ(v)
wn
xn
Perceptron for Classification
● The perceptron is used for binary classification.
● First train a perceptron for a classification task.
− Find suitable weights in such a way that the training examples are
correctly classified.
− Geometrically try to find a hyper-plane that separates the examples of the
two classes.
● The perceptron can only model linearly separable classes.
● When the two classes are not linearly separable, it may be
desirable to obtain a linear separator that minimizes the mean
squared error.
● Given training examples of classes C1, C2 train the perceptron in
such a way that :
− If the output of the perceptron is +1 then the input is assigned to class C1
− If the output is -1 then the input is assigned to C2
Learning Process for Perceptron
X1
1 true true
false true
0 1 X2
Perceptron: Limitations
● The perceptron can only model linearly separable
functions,
− those functions which can be drawn in 2-dim graph and single
straight line separates values in two part.
● Boolean functions given below are linearly separable:
− AND
− OR
− COMPLEMENT
● It cannot model XOR function as it is non linearly
separable.
− When the two classes are not linearly separable, it may be
desirable to obtain a linear separator that minimizes the mean
squared error.
XOR – Non linearly separable function
X1
1 true false
false true
0 1 X2
Multi layer feed-forward NN (FFNN)
Input Output
layer layer
Hidden Layer
3-4-2 Network
FFNN for XOR
● The ANN for XOR has two hidden nodes that realizes this non-linear
separation and uses the sign (step) activation function.
● Arrows from input nodes to two hidden nodes indicate the directions of
the weight vectors (1,-1) and (-1,1).
● The output node is used to combine the outputs of the two hidden
nodes.
Input nodes Hidden layer Output layer Output
H1 –0.5
X1 1
–1 1
Y
–1 H2
X2 1 1
Inputs Output of Hidden Nodes Output X1 XOR X2
X1 X2 H1 H2 Node
0 0 0 0 –0.5 0 0
0 1 –1 0 1 0.5 1 1
1 0 1 –1 0 0.5 1 1
1 1 0 0 –0.5 0 0
Since we are representing two states by 0 (false) and 1 (true), we
will map negative outputs (–1, –0.5) of hidden and output layers
to 0 and positive output (0.5) to 1.
+1
1
3
2
+1
Three-layer networks
x1
x2
Input Output
xn
Hidden layers
What do each of the layers do?
m
y i = f (∑ w ij x j + bi )
j= 1
Properties of architecture
m
y i = f (∑ w ij x j + bi )
j= 1
Properties of architecture
m
y i = f (∑ w ij x j + bi )
j= 1
39
Properties of architecture
m
y i
= f (∑ w ij
x j
+ bi )
j= 1
42
Forward Propagation of Activity
• Step 1: Initialize weights at random, choose a
learning rate η
• Until network is trained:
• For each training example i.e. input pattern and
target output(s):
• Step 2: Do forward pass through net (with fixed
weights) to produce output(s)
– i.e., in Forward Direction, layer by layer:
• Inputs applied
• Multiplied by weights
• Summed
• ‘Squashed’ by sigmoid activation function
• Output passed to each neuron in next layer
– Repeat above until network output(s) produced
Step 3. Back-propagation of error
• Compute error (delta or local gradient) for each
output unit δ k
• Layer-by-layer, compute error (delta or local
gradient) for each hidden unit δ j by backpropagating
errors
• Background
• Supervised learning (BPNN)
• Unsupervised learning (SOM)
• Implementation in Matlab
Unsupervised Learning – Self Organizing
Maps
• Self-organizing maps (SOMs) are a data visualization
technique invented by Professor Teuvo Kohonen
– Also called Kohonen Networks, Competitive Learning,
Winner-Take-All Learning
– Generally reduces the dimensions of data through the use
of self-organizing neural networks
– Useful for data visualization; humans cannot visualize high
dimensional data so this is often a useful technique to
make sense of large data sets
Basic “Winner Take All” Network
• Two layer network
– Input units, output units, each input unit is connected to each
output unit
Input Layer Output Layer
I1
O1
I2
O2
I3
Wi,j
Typical Usage: 2D Feature Map
• In typical usage the output nodes form a 2D “map” organized
in a grid-like fashion and we update weights in a
neighborhood around the winner
Output Layers
Input Layer
O11 O12 O13 O14 O15
I1
O21 O22 O23 O24 O25
I3
O51 O52 O53 O54 O55
Basic Algorithm
– Initialize Map (randomly assign weights)
– Loop over training examples
• Assign input unit values according to the values in the current
example
• Find the “winner”, i.e. the output unit that most closely
matches the input units, using some distance metric, e.g.
Output Layers
∆W t +1 = c( X it − W t )
O11 O12 O13 O14 O15
• Background
• Supervised learning (BPNN)
• Unsupervised learning (SOM)
• Implementation in Matlab
Implementation
Redefine x axis:
>> x = [2 4 6 8];
>> plot(x,power(y,2));
Network creation
>>net = newff(PR,[S1 S2...SNl],{TF1 TF2...TFNl},BTF,BLF,PF)
S1: number
hidden neurons
Number of inputs
decided by PR
Network Initialisation
-1 1 neuron 1
>> PR = [-1 1; -1 1; -1 1; -1 1];
-1 1
-1 1
Min Max
-1 1
TF2: logsig
TF1: logsig
Network Training
• epochs: 100
• goal: 0
• max_fail: 5
• mem_reduc: 1
• min_grad: 1.0000e-010
• mu: 0.0010
• mu_dec: 0.1000
• mu_inc: 10
• mu_max: 1.0000e+010
• show: 25
• time: Inf
net.trainFcn options
• net.trainFcn=trainlm ; a variant of BP based on second
order algorithm (Levenberg-Marquardt)
Network Training(cont.)
TRAIN trains a network NET according to NET.trainFcn and NET.trainParam.
>> TRAIN(NET,P,T,Pi,Ai)
• NET - Network.
• P - Network inputs.
• T - Network targets, default = zeros.
• Pi - Initial input delay conditions, default = zeros.
• Ai - Initial layer delay conditions, default = zeros.
>> p = [-0.5 1 -0.5 1; -1 0.5 -1 0.5; 0.5 1 0.5 1; -0.5 -1 -0.5 -1];
-0.5 1 -0.5 1
-1 0.5 -1 0.5
0.5 1 0.5 1
For
Training -0.5 -1 -0.5 -1 neuron 1
pattern 1
Network Training(cont.)
>>TRAIN(NET,P,T,Pi,Ai)
• NET - Network.
• P - Network inputs.
• T - Network targets, default = zeros. (optional only for NN with targets)
• Pi - Initial input delay conditions, default = zeros.
• Ai - Initial layer delay conditions, default = zeros.
>> p = [-0.5 1 -0.5 1; -1 0.5 -1 0.5; 0.5 1 0.5 1; -0.5 -1 -0.5 -1];
Training
pattern 1
Simulation of the network
>> [Y] = SIM(model, UT)
• Y : Returned output in matrix or structure format.
• model : Name of a block diagram model.
• UT : For table inputs, the input to the model is interpolated.
>> Y = sim(net,UT);
-0.5 1.00
-0.25 1.00
-1.00 0.25
For
Training -1.00 0.50 neuron 1
pattern 1
Performance Evaluation
• Exmple:
>> P = [rand(1,400)*2; rand(1,400)];
>> net = newsom([0 2; 0 1],[3 5]);
>> plotsom(net.layers{1}.positions)
• Background
• Supervised learning (BPNN)
• Unsupervised learning (SOM)
• Implementation in Matlab
• Applications