4 Neural Networks
4 Neural Networks
SANTOSH PANDEYA 1
Introduction:
A Neural Network is a type of machine learning model that is inspired by the structure
and function of the human brain.
It is made up of interconnected “neurons” which are modeled after biological neurons in
the brain.
A neural network is a machine learning program, or model, that makes decisions in a
manner similar to the human brain, by using processes that mimic the way biological
neurons work together to identify phenomena, weigh options and arrive at conclusions.
Used for wide variety of tasks in AI including image & speech recognition , NLP and
Control system.
They are trend using a large dataset, & the training process involves adjusting the
weights & biases of the neurons to minimize the error between the networks outputs
and the true outputs.
Used in deep learning
Are capable of handling non-linearity and complexity in the data.
Dendrites Inputs
Cell :nucleus Nodes
Synapse Weights
Axon Output
Hidden Layer:
The hidden layer presents in-between input and output layers. It performs all the calculations to find hidden
features and patterns.
Output Layer:
The input goes through a series of transformations using the hidden layer, which finally results in output that
is conveyed using this layer.
The artificial neural network takes input and computes the weighted sum of the inputs and includes a bias.
This computation is represented in the form of a transfer function.
• Consists of a number of very simple and highly interconnected processors called neurons.
• The neurons are connected by weighted links passing signals from one neuron to another.
Characteristics of ANN
• Adaptive learning
• Self organization
• Error tolerance
• Real time operation
• Parallel information processing
Supervised learning
• Uses a set of inputs for which the desired outputs are known
• Example: Back propagation algorithm
Unsupervised learning
𝑋=σ 𝑖=1𝑛𝑥𝑖𝑤𝑖𝑌=+1𝑖𝑓𝑋≥𝜃−1𝑖𝑓𝑋≤𝜃
The perceptron uses an error correction learning algorithm in which the weights are modified after
erroneous decisions as follows:
The aim of the perceptron is to classify inputs, 𝑥1,𝑥2,𝑥3,𝑥4,…,𝑥𝑛.Into one of two classes, say 𝐴1, 𝐴2
In the case of an elementary perceptron, the n dimensional space is divided into a hyperplane into two decision
regions.
If at iteration p, the actual output is 𝑌(𝑝)𝑎𝑛𝑑the desired output is 𝑌𝑑(𝑝),then the error is
given by
𝑒𝑝=𝑌𝑑(𝑝 𝑌(𝑝 where p 1 2 3
Iteration p here refers to the pth training example presented to the perceptron
If the error, 𝑒𝑝is positive, we need to increase perceptron output 𝑌(𝑝 but if it is negative,
we need to decrease 𝑌(𝑝)
Where p = 1, 2, 3, …
α is the learning rate , a positive constant less than unity The perceptron learning rule was first proposed by
Rosenblatt in 1960. Using this rule we can derive
perceptron training algorithm for classification task
Set initial weights 𝑤1, 𝑤2, …, 𝑤𝑛and threshold θ to random numbers in the range [ 0.5, +0.5].
If error 𝑒𝑝is positive, we need to increase perceptron output 𝑌𝑝, but if it is negative, we need to decrease 𝑌𝑝
Step 2 : Activation
Activate the perceptron by applying inputs 𝑥1( 𝑥2( p),…, 𝑥𝑛(𝑝)and desired output 𝑌𝑑 p)
Calculate the actual output at iteration p = 1
Where ,
n is the number of the perceptron inputs, and step is activation
function
COMPILED BY: ER. SANTOSH PANDEYA 21
Step 3: Weight Training
Update the weights of the perceptron
Where Δ𝑤𝑖(𝑝)is the weight correction at iteration p . The weight correction is computed by the delta rule:
Step 4: Iteration
Increase iteration p by 1, go back to Step 2 and repeat the process until convergence
• Commercial ANNs incorporate three and sometimes four layers, including one or
two hidden layers. Each layer can contain from 10 to 1000 neurons.
• Experimental neural networks may have five or six layers, including three or four
hidden layers, and utilize millions of neurons.
Recursive Structure Handling: RvNN is designed to handle recursive structures which means it can
naturally process hierarchical relationships in data by combining information from child nodes to form
representations for parent nodes.
Parameter Sharing: RvNN often uses shared parameters across different levels of the hierarchy which
enables the model to generalize well and learn from various parts of the input structure.
Tree Traversal: RvNN traverses the tree structure in a bottom-up or top-down manner by simultaneously
updating node representations based on the information gathered from their children.
Composition Function: The composition function in RvNN combines information from child nodes to
create a representation for the parent node. This function is crucial in capturing the hierarchical
relationships within the data.
If we move towards a negative gradient or away from the gradient of the function at the current point, it will give the
local minimum of that function.
Whenever we move towards a positive gradient or towards the gradient of the function at the current point, we will
get the local maximum of that function.
Calculates the first-order derivative of the function to compute the gradient or slope of
that function.
Move away from the direction of the gradient, which means slope increased from the
current point by alpha times, where Alpha is defined as Learning Rate. It is a tuning
parameter in the optimization process which helps to decide the length of the steps.
Where,
y k (p) is the output of neuron k at iteration p, and Xk (p) is the net weighted input of
neuron k at same iteration.
Output Signals
Step 1: Initialization
Set all the weights and threshold levels of the network to random
numbers uniformly distributed inside a small range:
( −2.4/𝐹𝑖,+2.4/𝐹𝑖 )
where 𝐹𝑖is the total number of inputs of neuron 𝑖in the network. The weight initialization is
done on a neuron-by-neuron basis.
Where n is the number of inputs of neuron j in the hidden layer, and sigmoid is the sigmoid activation function
b) Calculate the actual outputs of the neurons in the output layer:
(b) Calculate the error gradient for the neurons in the hidden
layer:
Now the actual output of neuron 5 in the output layer is determined as:
• Then we determine the weight corrections assuming that the learning rate parameter, 𝛼, is equal to 0.1
Thank You !!
COMPILED BY: ER. SANTOSH PANDEYA 44