UNIT4_Part1 aiml
UNIT4_Part1 aiml
UNIT - 4
1
OUTLINE
2
Biological Motivation
◻Human information processing system consists of
brain neuron: basic building block cell that
communicates information to and from various
parts of body
◻The study of artificial neural networks (ANNs) has been
inspired by the observation that biological learning
systems are built of very complex webs of interconnected
Neurons
Introduction
6
Introduction
7
Dendrites: Input
Cell body: Processor
Axon: Output
Introduction
8
Biological Motivation
◻ Facts of Human Neurobiology
Biological Motivation
◻Properties of Neural Networks
🞑 Many neuron-like threshold switching units
🞑 Many weighted interconnections among units
🞑 Highly parallel, distributed process
🞑 Emphasis on tuning weights automatically
🞑 Input is a high-dimensional discrete or real-valued (e.g, sensor
input )
Neural Networks
10
◻ Examples: Output
units
weight
s
Hidden
Units
Input
Units
Neural Network Representation
15
16
18
NEURAL NETWORK REPRESENTATIONS
19
20
21
22
24
How do ANNs work?
............
Input xm x2 x1
Processing ∑
∑= X1+X2 + ….+Xm
=y
Output y
25
Neuron
26
Activation Functions
1) Step function
2) Sign function
3) Sigmoid function
27
Neuron
28
29
30
How do we actually use an artificial
neuron?
31
❑Feedforward network:
◻ The neurons in each layer feed
their output forward to the next
layer until we get the final output
from the neural network.
◻ Introduction
◻ Neural Network representation
◻ Appropriate problems
◻ Perceptrons
◻ Backpropagation algorithm
Perceptron
33
◻ Given
inputs xl through xn the
output o(x1, . . . , xn)
computed by the
perceptron is
Representational Power of Perceptrons
36
= 0.9+0.5(-1)
= 0.4 <0.5
1 0 1 * 0.4 + 0 * 0.4=0.4 0 0
0.4<0.5
1 1 1 * 0.4 + 1 * 0.4 = 0.8 1 1
0.8>0.5
The Perceptron Training Rule
40
Perceptron Rule
◻ One way to learn an acceptable weight vector is to begin
with random weights, then iteratively apply the
perceptron to each training example, modifying the
perceptron weights whenever it misclassifies an example.
◻ This process is repeated, iterating through the training
examples as many times as needed until the perceptron
classifies all training examples correctly.
The Perceptron Training Rule
43
Perceptron Rule
◻ Weights are modified at each step according to the
perceptron training rule, which revises the weight wi
associated with input xi according to the rule
Perceptron Rule
◻ is a positive constant called the learning rate.
◻ Learning rate moderate the degree to which weights are
changed at each step.
◻ It is usually set to some small value (e.g., 0.1) and is
sometimes made to decay as the number of weight-tuning
iterations increases
The Perceptron Training Rule
45
Perceptron Rule
◻ Why should this update rule converge toward successful weight
values
🞑 Suppose the training example is correctly classified already by the
perceptron.
In this case, (t - o) is zero, making zero, so that no weights are
updated.
🞑 Suppose the perceptron outputs a -1,when the target output is + 1.
To make the perceptron output a + 1 instead of - 1 in this case, the
weights must be altered to increase the value of
The Perceptron Training Rule
46
Delta Rule
◻ Considering the task of training an unthresholded
perceptron; that is, a linear unit for which the output o is
given by
Delta Rule
◻ Derive a weight learning rule for linear units, specifying a
measure for the training error of a hypothesis (weight vector),
relative to the training examples.
◻ Define error,
🞑 D is the set of training examples,
🞑 td is the target output for training example d,
🞑 od is the output of the linear unit for training example d.
DERIVATION OF THE GRADIENT DESCENT RULE
50
How can we calculate the direction of steepest descent along the error surface?
◻ The gradient specifies the direction of steepest increase of E, the training rule
◻ is a positive constant called the learning rate, which determines the step
size in the gradient descent search.
◻ The negative sign is present because we want to move the weight vector in
the direction that decreases E.
DERIVATION OF THE GRADIENT DESCENT RULE
52
How can we calculate the direction of steepest descent along the error
surface?
4
DERIVATION OF THE GRADIENT DESCENT RULE
54
◻where td and od are the target value and the unit output
value for training example d.
The key differences between standard gradient descent
and stochastic gradient descent are:
60
Incremental (Stochastic) Gradient Descent
61
62
◻ Introduction
◻ Neural Network representation
◻ Appropriate problems
◻ Perceptrons
◻ Backpropagation algorithm
Multi-layer neural networks
64
Output
nodes
Internal
nodes
Input
nodes
MULTILAYER NETWORKS
65
w1
[3
]
w2
The BACKPROPAGATIAON algorithm
69
◻ tkd and Okd are the target and output values associated
with the kth output unit and training example d.
Back propagation Algorithm
70
Output
nodes
Compute
Internal
sigmoid
nodes
function
Input
nodes [3
]
Exampl
eX
Number of Hidden Units
76
THANK YOU