0% found this document useful (0 votes)
24 views

Lecture 13.3 Classification ANN

Uploaded by

Roky Das
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Lecture 13.3 Classification ANN

Uploaded by

Roky Das
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 64

Neural Network

Contents
Neural Network
Neuron
Network Architecture
Backpropagation
Bayesian belief Network
Neural Networks
● Artificial neural network (ANN) is a machine learning approach that
models human brain and consists of a number of artificial neurons.
● Neuron in ANNs tend to have fewer connections than biological
neurons.
● Each neuron in ANN receives a number of inputs.
● An activation function is applied to these inputs which results in
activation level of neuron (output value of the neuron).
● Knowledge about the learning task is given in the form of examples
called training examples.
Neural Network (Cont….)

● An Artificial Neural Network is specified by:


− neuron model: the information processing unit of the NN,
− an architecture: a set of neurons and links connecting neurons. Each
link has a weight,
− a learning algorithm: used for training the NN by modifying the
weights in order to model a particular learning task correctly on the train-
ing
examples.
● The aim is to obtain a NN that is trained and generalizes well.
● It should behaves correctly on new instances of the
learning task.
Neuron

● The neuron is the basic information processing unit of a NN. It


consists of:
 A set of links, describing the neuron inputs, with weights W ,
1
W2, …, Wm
 An adder function (linear combiner) for computing the
weighted sum of the inputs:
m
(real numbers) netj  wjxj
j 1

 Activation function  is applied to the net input.


 Bias act as a threshold.
Neuron diagram

Activation
function

output
Weighted Sum

Bias
Inputs
Weights Figure:1 Basic Neuron Model
Neuron Models
● The choice of activation function  determines the neuron model.

Examples:
0 if x  
● step function:  ( x)  
1 if x  

● sigmoid function 1
 ( x) 
1  exp(  x )

● Gaussian function:
1  1  x   2 
 ( x)  exp    
2   2   
 
Network Architectures

● Three different classes of network architectures

− single-layer feed-forward
− multi-layer feed-forward
− recurrent
Single Layer Feed-forward

Input layer Output layer


of of
source nodes neurons
Multi layer feed-forward NN (FFNN)
● FFNN is a more general network architecture, where there are hidden layers
between input and output layers.
● Hidden nodes do not directly receive inputs nor send outputs to the external
environment.
● FFNNs overcome the limitation of single-layer NN.
● They can handle non-linearly separable learning tasks.

Input Output
layer layer

Hidden Layer
3-4-2 Network
Training Algorithm: Backpropagation
● The Backpropagation algorithm learns in the same way as
single perceptron.
● It searches for weight values that minimize the total error of the
network over the set of training examples (training set).
● Backpropagation consists of the repeated application of the
following two passes:
− Forward pass: In this step, the network is activated on one
example and the error of (each neuron of) the output layer is
computed.
− Backward pass: in this step the network error is used for
updating the weights. The error is propagated backwards from
the output layer through the network layer by layer.
Backpropagation

● Back-propagation training algorithm

Network activation
Forward Step

Error propagation
Backward Step

● Backpropagation adjusts the weights of the NN in order to minimize


the network total mean squared error.
Backpropagation Algorithm
Neural network learning for classification or numeric
prediction, using the backpropagation algorithm.
Input:
D, a data set consisting of the training examples and
their associated target values;
l, the learning rate;
network, a multilayer feed-forward network.
Output: A trained neural network.
Backpropagation Algorithm (Cont…)
Backpropagation Algorithm (Cont…)
Initialize the weights:
The weights in the network are initialized to small random numbers
(e.g., ranging from -1.0 to 1.0, or -0.5 to 0.5). Each unit has a bias
associated with it, the biases are similarly initialized to small random
numbers.
Each training example, X, is processed by the following steps:

 Propagate the inputs forward:


● Consider a network of three layers.
● Let us use i to represent nodes in input layer, j to represent
nodes in hidden layer and output layer.
● wij refers to weight of connection from unit i in the previous layer to
unit j.
Backpropagation Algorithm (Cont…)

• Given a unit, j in a hidden or output layer, the net input,


Ij , to unit j is

Where oi is the output of unit i and j is threshold for unit


j.
• Given the net input Ij to unit j, then Oj , the output of
unit j, is computed as
Backpropagation Algorithm (Cont…)

Backpropagate the error:


The error is propagated backward by updating the
weights and biases to reflect the error of the network’s prediction.
For a unit j in the output layer, the error Errj is computed by

where Oj is the actual output of unit j, and Tj is the known target value
of the given training example.
The error of a hidden layer unit j is

where wjk is the weight of the connection from unit j to a unit k in the
next higher layer, and Errk is the error of unit k.
Backpropagation Algorithm (Cont…)

• The weights and biases are updated to reflect the propa-


gate errors. Weights are updated by the following equa-
tions, where ∆wij is the change in weight wij :

wij  wij  wij


The variable l is the learning rate, a constant typically having a
value between 0.0 and 1.0.
Biases are updated by the following equations, where ∆j is
the change in bias j :
Stopping criterions
 Total mean squared error change:
− Back-prop is considered to have converged when the absolute rate of
change in the average squared error per epoch is sufficiently small
(in the range [0.1, 0.01])
All ∆wij in the previous epoch are so small as to be below some
specified threshold
or
A prespecified number of epochs has expired.
Example

Figure 1: An example of a multilayer feed-forward neural network.


Assume that the learning rate l is 0.9 and the first training example, X =
(1,0,1) whose class label is 1.
Note: The sigmoid function is applied to hidden layer and output layer.
20
Example
Table 1: Initial input and weight values
x1 x2 x3 w14 w15 w24 w25 w34 w35 w46 w56 4 5 6
-----------------------------------------------------------------------------------
1 0 1 0.2 -0.3 0.4 0.1 -0.5 0.2 -0.3 -0.2 -0.4 0.2 0.1

Table 2: The net input and output calculation


Unit j Net input Ij Output Oj
-----------------------------------------------------------------------------------
4 0.2 + 0 -0.5 -0.4 = -0.7 1/(1+e 0.7)=0.332
5 -0.3 +0+0.2 +0.2 =0.1 1/(1+e 0.1)=0.525
6 (-0.3)(0.332)-(0.2)(0.525)+0.1 = -0.105 1/(1+e 0.105)=0.474

Table 3: Calculation of the error at each node


Unit j Errj
-----------------------------------------------------------------------------
6 (0.474)(1-0.474)(1-0.474)=0.1311
5 (0.525)(1-0.525)(0.1311)(-0.2)=-0.0065
4 (0.332)(1-0.332)(0.1311)(-0.3)=-0.0087
21
Example

Table 4: Calculation for weight and Bias updating


Weight New value
------------------------------------------------------------------------------
w46 -03+(0.9)(0.1311)(0.332)= -0.261
w56 -0.2+(0.9)(0.1311)(0.525)= -0.138
w14 0.2 +(0.9)(-0.0087)(1) = 0.192
w15 -0.3 +(0.9)(-0.0065)(1) = -0.306
w24 0.4+ (0.9)(-0.0087)(0) = 0.4
w25 0.1+ (0.9)(-0.0065)(0) = 0.1
w34 -0.5+ (0.9)(-0.0087)(1) = -0.508
w35 0.2 + (0.9)(-0.0065)(1) = 0.194
6 0.1 + (0.9)(0.1311) = 0.218
5 0.2 + (0.9)(-0.0065)=0.194
4 -0.4 +(0.9)(-0.0087) = -0.408
22
Example
The old error was 0.526 [ ( Targetj - Outputj )= 1- 0.474]

After adjusting weight the new error is 0.515.

Therefore error has reduced.


Back-Propagation Work in Neural Networks?
Back-Propagation Work in Neural Networks?
Definition: Back-propagation is a method for supervised
learning used by NN to update parameters to make the
network’s predictions more accurate. The parameter op-
timization process is achieved using an optimization al-
gorithm called gradient descent.
Back-Propagation Work in Neural Networks?
Loss Function
Cross Entropy Error:
Initial Weights and Bias
Updating Parameters on the Output-Hidden Layer
Updating Parameters on the Output-Hidden Layer
Updating Parameters on the Output-Hidden Layer
Updating Parameters on the Output-Hidden Layer
Updating Parameters on the Output-Hidden Layer
Updating Parameters on the Output-Hidden Layer
Updating Parameters on the Output-Hidden Layer
Updating Parameters on the Output-Hidden Layer
Updating Parameters on the Output-Hidden Layer
Updating Parameters on the Output-Hidden Layer
Updating Parameters at Hidden-Input Layer
Updating Parameters at Hidden-Input Layer
Updating Parameters at Hidden-Input Layer
Updating Parameters at Hidden-Input Layer
Updating Parameters at Hidden-Input Layer
Updating Parameters at Hidden-Input Layer
Updating Parameters at Hidden-Input Layer
Updating Parameters at Hidden-Input Layer
Recurrent Network
● FFNN is acyclic where data passes from input to the output nodes
and not vice versa.
− Once the FFNN is trained, its state is fixed and does not alter as new data is
presented to it. It does not have memory.
● Recurrent network can have connections that go backward from
output to input nodes and models dynamic systems.
− In this way, a recurrent network’s internal state can be altered as sets of input
data are presented. It can be said to have memory.
− It is useful in solving problems where the solution depends not just on the
current inputs but on all previous inputs.
● Applications
− predict stock market price,
− weather forecast
Recurrent Network Architecture

● Recurrent Network with hidden neuron: unit delay operator d


is used to model a dynamic system

input
d hidden
output

d
A Bayesian Belief Network

A Bayesian belief network is made up of:


1. A Directed Acyclic Graph
A

C D

2. A set of conditional probability tables for each node in the graph


A B P(B|A) B D P(D|B) B C P(C|B)
A P(A)
false false 0.01 false false 0.02 false false 0.4
false 0.6
false true 0.99 false true 0.98 false true 0.6
true 0.4
true false 0.7 true false 0.05 true false 0.9
true true 0.3 true true 0.95 true true 0.1
A Directed Acyclic Graph

Each node in the graph is a random A node X is a parent of another node Y


variable if there is an arrow from node X to
node Y e.g. A is a parent of B

C D

Informally, an arrow from node X to


node Y means X has a direct
influence on Y
A Set of Tables for Each Node
A P(A) A B P(B|A) Each node Xi has a conditional
false 0.6 false false 0.01 probability distribution P(Xi |
true 0.4 false true 0.99 Parents(Xi)) that quantifies the effect
true false 0.7 of the parents on the node
true true 0.3 The parameters are the probabilities
in these conditional probability tables
B C P(C|B) (CPTs)
false false 0.4
false true 0.6 A
true false 0.9
true true 0.1
B B D P(D|B)
false false 0.02
false true 0.98
C D true false 0.05
true true 0.95
A Set of Tables for Each Node (Cont…..)

Conditional Probability Distribution for C given B

B C P(C|B)
false false 0.4
false true 0.6
true false 0.9
true true 0.1 For a given combination of values of the parents
(B in this example), the entries for P(C=true | B)
and P(C=false | B) must add up to 1
eg. P(C=true | B=false) + P(C=false |B=false )=1
Properties
Two important properties:
 Encodes the conditional independence relationships
between the variables in the graph structure
 Is a compact representation of the joint probability
distribution over the variables
Conditional Independence

The Markov condition: given its parents (P 1, P2),


a node (X) is conditionally independent of its non-
descendants (ND1, ND2)

P1 P2

ND1 X ND2

C1 C2
The Joint Probability Distribution

Due to the Markov condition, we can compute the joint


probability distribution over all the variables X 1, …, Xn
in the Bayesian net using the formula:

n
P ( X 1  x1 ,..., X n  xn )   P ( X i  xi | Parents ( X i ))
i 1

Where Parents(Xi) means the values of the Parents of the node Xi with respect to the
graph
Using a Bayesian Network Example

Using the network in the example, suppose we want to


calculate:
P(A = true, B = true, C = true, D = true)
= P(A = true) * P(B = true | A = true) *
P(C = true | B = true) P( D = true | B = true)
= (0.4)*(0.3)*(0.1)*(0.95) A

C D
Using a Bayesian Network Example

Using the network in the example, suppose you want to calcu-


late: This is from the
P(A = true, B = true, C = true, D = true) graph structure

= P(A = true) * P(B = true | A = true) *


P(C = true | B = true) P( D = true | B = true)
= (0.4)*(0.3)*(0.1)*(0.95) A

B
These numbers are from the
conditional probability tables
C D
Inference

Using a Bayesian network to compute probabilities is


called inference
In general, inference involves queries of the form:
P( X | E )
E = The evidence variable(s)

X = The query variable(s)


Bayesian belief network example

Figure . A simple Bayesian network with conditional


probability tables
Bayesian belief network example
Suppose that there are two events which could cause grass to
be wet: either the sprinkler is on or it's raining. Also, suppose that
the rain has a direct effect on the use of the sprinkler (namely that
when it rains, the sprinkler is usually not turned on). Then the
situation can be modeled with a Bayesian network (shown). All
three variables have two possible values, T (for true) and F (for
false).
The joint probability function is:
P (G,S,R) = P (G| S,R) P(S|R) P(R)

where the names of the variables have been abbreviated to G = Grass


wet (yes/no), S = Sprinkler turned on (yes/no), and R = Raining
(yes/no).
Bayesian belief network example
The model can answer questions like "What is the
probability that it is raining, given the grass is wet?"

By using the conditional probability formula and


summing over all random variables:
Bayesian belief network example
Using the expansion for the joint probability function
P (G,S,R) and the conditional probabilities from the
conditional probability tables (CPTs) stated in the
diagram, we can evaluate each term in the sums in the
numerator and denominator. For example,
Bayesian belief network example
Then the numerical results (subscripted by the associ-
ated variable values) are :
Thank You

You might also like