0% found this document useful (0 votes)
24 views

Week 2

Uploaded by

om55500r
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Week 2

Uploaded by

om55500r
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Deep Feed Forward, Activation Functions

Network Topology
A network topology is the arrangement of a network along with its nodes and connecting
lines. According to the topology, ANN can be classified as the following kinds −
Feedforward Network
It is a non-recurrent network having processing units/nodes in layers and all the nodes
in a layer are connected with the nodes of the previous layers. The connection has
different weights upon them. There is no feedback loop means the signal can only flow
in one direction, from input to output. It may be divided into the following two types −
 Single layer feedforward network − The concept is of feedforward ANN having
only one weighted layer. In other words, we can say the input layer is fully
connected to the output layer.

 Multilayer feedforward network − The concept is of feedforward ANN having


more than one weighted layer. As this network has one or more layers between
the input and the output layer, it is called hidden layers.

Feedback Network
As the name suggests, a feedback network has feedback paths, which means the signal
can flow in both directions using loops. This makes it a non-linear dynamic system,
which changes continuously until it reaches a state of equilibrium. It may be divided into
the following types −
 Recurrent networks − They are feedback networks with closed loops. Following
are the two types of recurrent networks.
 Fully recurrent network − It is the simplest neural network architecture because
all nodes are connected to all other nodes and each node works as both input
and output.

 Jordan network − It is a closed loop network in which the output will go to the
input again as feedback as shown in the following diagram.

Adjustments of Weights or Learning


Learning, in artificial neural network, is the method of modifying the weights of
connections between the neurons of a specified network. Learning in ANN can be
classified into three categories namely supervised learning, unsupervised learning, and
reinforcement learning.
Supervised Learning
As the name suggests, this type of learning is done under the supervision of a teacher.
This learning process is dependent.
During the training of ANN under supervised learning, the input vector is presented to
the network, which will give an output vector. This output vector is compared with the
desired output vector. An error signal is generated, if there is a difference between the
actual output and the desired output vector. On the basis of this error signal, the weights
are adjusted until the actual output is matched with the desired output.
Unsupervised Learning
As the name suggests, this type of learning is done without the supervision of a teacher.
This learning process is independent.
During the training of ANN under unsupervised learning, the input vectors of similar type
are combined to form clusters. When a new input pattern is applied, then the neural
network gives an output response indicating the class to which the input pattern
belongs.
There is no feedback from the environment as to what should be the desired output and
if it is correct or incorrect. Hence, in this type of learning, the network itself must discover
the patterns and features from the input data, and the relation for the input data over
the output.

Reinforcement Learning
As the name suggests, this type of learning is used to reinforce or strengthen the
network over some critic information. This learning process is similar to supervised
learning, however we might have very less information.
During the training of network under reinforcement learning, the network receives some
feedback from the environment. This makes it somewhat similar to supervised learning.
However, the feedback obtained here is evaluative not instructive, which means there
is no teacher as in supervised learning. After receiving the feedback, the network
performs adjustments of the weights to get better critic information in future.
Figure 3: Different Training methods of Artificial Neural Network

1.3.1.1 Supervised learning

Every input pattern that is used to train the network is associated with an output pattern
which is the target or the desired pattern.

A teacher is assumed to be present during the training process, when a comparison is


made between the network’s computed output and the correct expected output, to determine
the error.The error can then be used to change network parameters, which result in an
improvement in performance.

5
1.3.1.2 Unsupervised learning

In this learning method the target output is not presented to the network.It is as if there
is no teacher to present the desired patterns and hence the system learns of its own by
discovering and adapting to structural features in the input patterns.

1.3.1.3 Reinforced learning

In this method, a teacher though available, doesnot present the expected answer but
only indicates if the computed output correct or incorrect.The information provided helps the
network in the learning process.

1.3.1.4 Hebbian learning


This rule was proposed by Hebb and is based on correlative weight adjustment.This is the oldest learning mechanism
inspired by biology.In this, the input-output pattern pairs ( , ) are associated by the weight matrix W, known as the correlation
matrix.

It is computed as

W=∑ ----------- eq (1)


=1

Here is the transposeof the associated output vector .Numerous variants of the rule have been
proposed.

1.3.1.5 Radient descent learning

This is based on the minimization of error E defined in terms of weights and activation
function of the network.Also it is required that the activation function employed by the network
is differentiable, as the weight update is dependent on the gradient of the error E.
Thus if ∆ is the weight update of the link connecting the ℎ and ℎ neuron of the two neighbouring layers, then ∆ is defined as,
∆ =ɳ

Where, ɳ is the learning rate parameter and ----------- eq (2)

weight . is the error gradient with reference to the

1.3.1.5 Competitive learning

In this method, those neurons which respond strongly to input stimuli have their
weights updated.

6
When an input pattern is presented, all neurons in the layer compete and the winning
neurons undergoes weight adjustment.Hence it is a winner-takes-all strategy.

1.3.1.6 Stochastic learning

In this method, weights are adjusted in a probablistic fashion.An example is evident in


simulated annealing the learning mechanism employed by Boltzmann and Cauchy machines,
which are a kind of NN systems.

1.3.2 Different Learning Rules

1. Hebb’s Learning Law


2. Perceptron Learning Law
3. Delta Learning Law
4. Wldrow and Hoff LMS Learning Law
5. Correlation Learning Law
6. lnstar (Winner-take-all) Learning Law
7. Outstar Learning Law

The different learning laws or rules with their features is given in Table1 which is given
below

Table 1: Different learning laws with their weight details and learning type

7
1.4 TYPES OF ACTIVATION FUNCTIONS

Common activation functions used in ANN are listed below

1.4.1 Identity Function

f(x) = x - for all x ----------- eq (3)

Figure 4: Identity function

Linear functions are simplest form of Activation function.Refer figure 4 . f(x) is just an
identity function.Usually used in simple networks. It collects the input and produces an output
which is proportionate to the given input. This is Better than step function because it gives
multiple outputs, not just True or False

1.4.2. Binary Step Function (with threshold) (aka Heaviside Function or Threshold
Function)


1 if x
f (x)
0 ----------- eq (4)
 if x

Figure 4: Binary step function

Binary step function is shown in figure 4. It is also called Heaviside function. Some
literatures it is also known as Threshold function. Equation 4 gives the output for this function.

8
1.4.3. Binary Sigmoid

This is also known as Logistic function.The graphical representation is provided in


figure5.Equation 5 gives the output values for this function.

F(x) = [ 1/(1+ e -ax)] ----------- eq (5)

Figure 5: Binary sigmoidal function

1.4.4. Bipolar Sigmoid

Also known as Hyperbolic tangent or tanh function. It is a bounded function whose


values lies in the range of (-1 to +1). This is a shifted version of binary Sigmoid Function. It is
a Non Linear function.Equation 6 represents this type of function.The pictorial representation
for this function is given in figure 6.

F(x) = [ (1- e -ax)/(1+ e -ax)] ----------- eq (6)

It is better than Sigmoidal function and its output is zero centered

Figure 6: Bipolar sigmoidal function

9
1.5 PERCEPTRON MODEL
1.5.1 Simple Perceptron for Pattern Classification

Perceptron network is capable of performing pattern classification into two or more


categories. The perceptron is trained using the perceptron learning rule. We will first consider
classification into two categories and then the general multiclass classification later. For
classification into only two categories, all we need is a single output neuron. Here we will use
bipolar neurons. The simplest architecture that could do the job consists of a layer of N input
neurons, an output layer with a single output neuron, and no hidden layers. This is the same
architecture as we saw before for Hebb learning. However, we will use a different transfer
function here for the output neurons as given below in eq (7). Figure 7 represents a single layer
perceptron network.

----------- eq (7)

Figure 7: Single Layer Perceptron

Equation 7 gives the bipolar activation function which is the most common function used
in the perceptron networks. Figure 7 represents a single layer perceptron network. The inputs
arising from the problem space are collected by the sensors and they are fed to the aswociation
units.Association units are the units which are responsible to associate the inputs based on their
similarities. This unit groups the similar inputs hence the name association unit.

10
A single input from each group is given to the summing unit.Weights are randomnly fixed
intially and assigned to this inputs. The net value is calculate by using the expression

x = Σ wiai – θ ----------- eq (8)

This value is given to the activation function unit to get the final output response.The
actual output is compared with the Target or desired .If they are same then we can stop training
else the weights haqs to be updated .It means there is error .Error is given as δ = b-s , where b
is the desired / Target output and S is the actual outcome of the machinehere the weights are
updated based on the perceptron Learning law as given in equation 9.

Weight change is given as Δw= η δ ai. So new weight is given as

Wi (new) = Wi (old) + Change in weight vector (Δw) ----------- eq (9)

1.5.2 Perceptron Algorithm

Step 1: Initialize weights and bias.For simplicity, set weights and bias to zero.Set
learning rate in the range of zero to one.

• Step 2: While stopping condition is false do steps 2-6


• Step 3: For each training pair s:t do steps 3-5
• Step 4: Set activations of input units xi = ai
• Step 5: Calculate the summing part value Net = Σ aiwi-θ
• Step 6: Compute the response of output unit based on the activation functions
• Step 7: Update weights and bias if an error occurred for this pattern (if y is not
equal to t)
Weight (new) = wi(old) + atxi , & bias (new) = b(old) +
at Else wi(new) = wi(old) & b(new) = b(old)
• Step 8: Test Stopping Condition

1.5.3 Limitations of Single Layer Perceptrons

• Uses only Binary Activation function


• Can be used only for Linear Networks
• Since uses Supervised Learning ,Optimal Solution is provided
• Training Time is More
• Cannot solve Linear In-separable Problem

11
1.5.3 Multi-Layer Perceptron Model
Figure 8 is the general representation of Multi layer Perceptron network.Inbetween
the input and output Layer there will be some more layers also known as Hidden layers.

Figure 8: Multi-Layer Perceptron

1.5.4 Multi Layer Perceptron Algorithm


1. Initialize the weights (Wi) & Bias (B0) to small random values near Zero
2. Set learning rate η or α in the range of “0” to “1”
3. Check for stop condition. If stop condition is false do steps 3 to 7
4. For each Training pairs do step 4 to 7
5. Set activations of Output units: xi = si for i=1 to N
6. Calculate the output Response yin
= b0 + Σ xiwi
7. Activation function used is Bipolar sigmoidal or Bipolar Step functions
For Multi Layer networks, based on the number of layers steps 6 & 7 are repeated
8. If the Targets is (not equal to) = to the actual output (Y), then update weights
and bias based on Perceptron Learning Law
Wi (new) = Wi (old) + Change in weight vector
Change in weight vector = ηtixi
Where η = Learning Rate
ti = Target output of ith unit
xi = ith Input vector
b0(new) = b0 (old) + Change in Bias
Change in Bias = ηti
Else Wi (new) = Wi (old)
b0(new) = b0 (old)
9. Test for Stop condition

12
1.6 LINEARLY SEPERABLE & LINEAR IN SEPARABLE TASKS

Figure 9: Representation of Linear seperable & Linear-in separable Tasks

Perceptron are successful only on problems with a linearly separable solution sapce. Figure
9 represents both linear separable as well as linear in seperable problem.Perceptron cannot handle,
in particular, tasks which are not linearly separable.(Known as linear inseparable problem).Sets of
points in two dimensional spaces are linearly separable if the sets can be seperated by a straight
line.Generalizing, a set of points in n-dimentional space are that can be seperated by a straight
line.is called Linear seperable as represented in Figure 9.

Single layer perceptron can be used for linear separation.Example AND gate.But it cant
be used for non linear ,inseparable problems.(Example XOR Gate).Consider figure 10.

Figure 10: XOR representation (Linear-in separable Task)

13
Here a single decision line cannot separate the Zeros and Ones Linearly.At least Two
lines are required to separate Zeros and Onesas shown in Figure 10. Hence single layer
networks can not be used to solve inseparable problems. To over come this problem we go for
creation of convex regions.

Convex regions can be created by multiple decision lines arising from multi layer
networks.Single layer network cannot be used to solve inseparable problem.Hence we go for
multilayer network there by creating convex regions which solves the inseparable problem.

1.6.1 Convex Region

Select any Two points in a region and draw a straight line between these two points. If
the points selected and the lines joining them both lie inside the region then that region is
known as convex regions.

1.6.2 Types of convex regions

(a) Open Convex region (b) Closed Convex region

Figure 11: Open convex region

Figure 12 A: Circle - Closed convex Figure 12 B: Triangle - Closed convex


region region

You might also like