0% found this document useful (0 votes)
5 views

ML Unit-5 Final

This document provides an overview of Artificial Neural Networks (ANNs), detailing their structure, function, and learning algorithms, including the perceptron and multilayer perceptron. It discusses the biological basis of neural networks, characteristics, advantages, applications, and various activation functions used in ANN models. Additionally, it outlines the learning steps for perceptrons and multilayer perceptrons, emphasizing their capability to classify both linearly and non-linearly separable datasets.

Uploaded by

siva M
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

ML Unit-5 Final

This document provides an overview of Artificial Neural Networks (ANNs), detailing their structure, function, and learning algorithms, including the perceptron and multilayer perceptron. It discusses the biological basis of neural networks, characteristics, advantages, applications, and various activation functions used in ANN models. Additionally, it outlines the learning steps for perceptrons and multilayer perceptrons, emphasizing their capability to classify both linearly and non-linearly separable datasets.

Uploaded by

siva M
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Unit-5

Syllabus: ARTIFICIAL NEURAL NETWORKS


Introduction, The perceptron, the perceptron learning algorithm, Multilayer neural networks,
activation functions, Back Propagation algorithm and introduction to Deep learning models: CNN.

INTRODUCTION TO ARTIFICIAL NEURAL NETWORK:

a) Artificial Neural Network (ANNs) are programs designed to solve any problem by trying to mimic
the structure and the function of our nervous system.
b) Neural networks are based on simulated neurons, which are joined together in a variety of ways
to form networks.
c) Neural network resembles the human brain in the following two ways:
A neural network acquires knowledge through learning
A neural network’s knowledge is stored within the interconnection strengths known as a synaptic
weight.

Fig: Artificial Neural Network Model

Biological Neurons (also called nerve cells) or simply neurons are the fundamental units of the brain
and nervous system, the cells responsible for receiving sensory input from the external world via
dendrites, process it and gives the output through Axons.
Cell body (Soma): The body of the neuron cell contains the nucleus and carries out biochemical
transformation necessary to the life of neurons. Soma sums all the incoming signals
Dendrites: Dendrites receive signals from other neurons. Each neuron has fine, hair-like tubular
structures (extensions) around it. They branch out into a tree around the cell body. They accept
incoming signals.
Axon: Axons transmits the signals to other cells. It is a long, thin, tubular structure that works like a
transmission line.
Synapse: Neurons are connected to one another in a complex spatial arrangement. When axon
reaches its final destination it branches again called terminal arborization. At the end of the axon
are highly complex and specialized structures called synapses. The connection between two
neurons takes place at these synapses.
Dendrites receive input through the synapses of other neurons. The soma processes these incoming
signals over time and converts that processed value into an output, which is sent out to other
neurons through the axon and the synapses.
Relationship between Biological neural network and artificial neural network:

Biological Neuron vs. Artificial Neuron

Characteristics of Artificial Neural Network


 It is neurally implemented mathematical model
 It contains huge number of interconnected processing elements called neurons to do all
operations
 Information stored in the neurons are basically the weighted linkage of neurons
 The input signals arrive at the processing elements through connections and connecting
weights.
 It has the ability to learn , recall and generalize from the given data by suitable assignment
and adjustment of weights.
 The collective behavior of the neurons describes its computational power, and no single
neuron carries specific information .

Advantage of Using Artificial Neural Networks:


 Problem in ANNs can have instances that are represented by many attribute-value pairs.
 ANNs used for problems having the target function output may be discrete-valued, real-
valued, or a vector of several real- or discrete-valued attributes.
 ANN learning methods are quite robust to noise in the training data. The training examples
may contain errors, which do not affect the final output.
 It is used generally used where the fast evaluation of the learned target function may be
required.
 ANNs can bear long training times depending on factors such as the number of weights in
the network, the number of training examples considered, and the settings of various
learning algorithm parameters.

Application of Neural Network:


 Every new technology needs assistance from previous one i.e. data from previous ones
and these data are analyzed so that every pros and cons should be studied correctly. All
of these things are possible only through the help of neural network.
 Neural network is suitable for the research on Animal behavior, predator/prey
relationships and population cycles.
 It would be easier to do proper valuation of property, buildings, automobiles, machinery
etc. with the help of neural network.
 Neural Network can be used in betting on horse races, sporting events and most
importantly in stock market.
 It can be used to predict the correct judgment for any crime by using a large data of
crime details as input and the resulting sentences as output.
 By analyzing data and determining which of the data has any fault (files diverging from
peers) called as Data mining, cleaning and validation can be achieved through neural
network.
 Neural Network can be used to predict targets with the help of echo patterns we get
from sonar, radar, seismic and magnetic instruments.
 It can be used efficiently in Employee hiring so that any company can hire right
employee depending upon the skills the employee has and what should be it’s
productivity in future.
 It has a large application in Medical Research.
 It can be used to for Fraud Detection regarding credit cards, insurance or taxes by
analyzing the past records.

Feedforward network
• The information moves in only one direction - forward - from the input nodes, through the
hidden nodes (if any) and to the output nodes.
• There are no cycles or loops in the network.
• Two examples of feedforward networks are given below:
• Single Layer Perceptron - This is the simplest feedforward neural network and does not
contain any hidden layer.
– A single layer perceptron can only learn linear functions
• Multi Layer Perceptron - A Multi Layer Perceptron has one or more hidden layers.
– A multi layer perceptron can also learn non - linear functions
– Computation is performed in Hidden and Output layers but not in the Input layer

Architecture of a single layer and multi-layer perceptron

• A feedforward neural network can consist of three types of nodes:


• Input Nodes - The Input nodes provide information from the outside world to the network
and are together referred to as the "Input Layer".
• No computation is performed in any of the Input nodes - they just pass on the information
to the hidden nodes
• Hidden Nodes - The Hidden nodes have no direct connection with the outside world (hence
the name "hidden").
• They perform computations and transfer information from the input nodes to the output
nodes.
• Output Nodes - The Output nodes are collectively referred to as the "Output Layer" and are
responsible for computations and transferring information from the network to the outside
world

Perceptron or single layer neural network:


A perceptron, a neuron’s computational prototype, is categorized as the simplest form of a
neural network. Frank Rosenblatt invented the perceptron at the Cornell Aeronautical
Laboratory in 1957.

 Perceptron is an algorithm in machine learning for supervised learning of binary


classifiers, i.e., a function to determine the class in which the input vector belongs.
 A perceptron has one or more than one inputs, a process, and only one output.
 A single layer neural network or single artificial neuron is called a Perceptron. It gives
a single output
 The following diagram represents the general model of ANN, which is inspired by a
biological neuron.

Primary components of a perceptron (terminology of above diagram):


1. Input: All the features of the model we want to train the neural network will be passed as the
input to it, Like the set of features [X1, X2, X3…..Xn]. Where n represents the total number of
features and X represents the value of the feature. There is also a special input type, which is
called bias.
2. Weights: Initially, we have to pass some random values as values to the weights and these
values are automatically updated after each training error that is the values are generated
during the training of the model. In some cases, weights can also be called as weight
coefficients.
3. Weights Sum: Each input value will be first multiplied with the weight assigned to it and the
sum of all the multiplied values is known as a weighted sum.
4. Bias: Bias is a special input type. It allows the classifier to move the decision boundary
around from its original position to the right, left, up, or down. In terms of algebra, the bias
allows the classifier to turn its decision boundary around. The objective of the bias is to shift
each point in a particular direction for a specified distance. Bias allows for higher quality and
faster model training.
5. Step or Activation Function
Activation function applies step rule, which converts the numerical value to 0 or 1 so that it will
be easy for data set to classify. Based on the type of value we need as output we can change
the activation function. We have many activation functions, based on requirement we are using
particular activation function.
Sigmoid function, if we want values to be between 0 and 1 we can use a sigmoid function that
has a smooth gradient as well. Sign function, if we want values to be +1 and -1 then we can use
sign function. The hyperbolic tangent function is a zero centered function making it easy for the
multilayer neural networks. Relu function is highly computational but it cannot process input
values that approach zero. It is good for the values that are both greater than and less than a
Zero.
Perception learning algorithm or Perceptron Learning Steps:

Perceptron algorithms can be divided into two types they are single layer perceptron and multi-
layer perceptron. In single-layer, perceptron’s neurons are organized in one layer whereas in a
multilayer perceptron’s a group of neurons will be organized in multiple layers. Every single
neuron present in the first layer will take the input signal and send a response to the neurons in
the second layer and so on.

Steps to perform a perceptron learning algorithm:


1. Features of the model we want to train should be passed as input to the perceptrons in
the first layer.
2. These inputs will be multiplied by the weights or weight coefficients and the production
values from all perceptrons will be added.
3. Adds the Bias value, to move the output function away from the origin.
4. This computed value will be fed to the activation function (chosen based on the
requirement)
5. The result value from the activation function is the output value.
Note:
The perceptron is very useful for classifying data sets that are linearly separable.
They encounter serious limitations with data sets that do not conform to this
pattern as discovered with the XOR problem. The XOR problem shows that for any
classification of four points that there exists a set that are not linearly separable.

The Multilayer Perceptron (MLPs) breaks this restriction and classifies datasets
which are not linearly separable. They do this by using a more robust and complex
architecture to learn regression and classification models for difficult datasets.

Multi-Layer Perceptron(MLP) or multi-layer neural network:


This class of networks consists of multiple layers of computational units, usually
interconnected in a feed-forward way. Each neuron in one layer has directed
connections to the neurons of the subsequent layer.
A feedforward neural network or multi-layer neural network can consist of three types of
nodes:
1. Input Nodes - The Input nodes provide information from the outside world to the network
and are together referred to as the "Input Layer". No computation is performed in any of the
Input nodes - they just pass on the information to the hidden nodes.
2. Hidden Nodes - The Hidden nodes have no direct connection with the outside world (hence
the name "hidden"). They perform computations and transfer information from the input
nodes to the output nodes. A collection of hidden nodes forms a "Hidden Layer". While a
feedforward network will only have a single input layer and a single output layer, it can have
zero or multiple Hidden Layers.
3. Output Nodes - The Output nodes are collectively referred to as the "Output Layer" and are
responsible for computations and transferring information from the network to the outside
world
The algorithm for the MLP is as follows:
1. Just as with the perceptron, the inputs are pushed forward through the MLP by taking the
dot product of the input with the weights that exist between the input layer and the hidden
layer (WH). This dot product yields a value at the hidden layer. We do not push this value
forward as we would with a perceptron though.
2. MLPs utilize activation functions at each of their calculated layers. There are many activation
functions to discuss rectified linear units (ReLU), sigmoid function, tanh. Push the calculated
output at the current layer through any of these activation functions.
3. Once the calculated output at the hidden layer has been pushed through the activation
function, push it to the next layer in the MLP by taking the dot product with the corresponding
weights.
4. Repeat steps two and three until the output layer is reached.
5. At the output layer, the calculations will either be used for a backpropagation algorithm that
corresponds to the activation function that was selected for the MLP (in the case of training) or
a decision will be made based on the output (in the case of testing).
Activation Function:
Activation functions are mathematical equations that determine the output of a neural
network. The function is attached to each neuron in the network, and determines whether it
should be activated (“fired”) or not, based on whether each neuron’s input is relevant for the
model’s prediction. Activation functions also help normalize the output of each neuron to a
range between 1 and 0 or between -1 and 1.

Role of the Activation Function in a Neural Network Model


In a neural network, numeric data points, called inputs, are fed into the neurons in the input
layer. Each neuron has a weight, and multiplying the input number with the weight gives the
output of the neuron, which is transferred to the next layer.
The activation function is a mathematical “gate” in between the input feeding the current
neuron and its output going to the next layer. It can be as simple as a step function that turns
the neuron output on and off, depending on a rule or threshold. Or it can be a transformation
that maps the input signals into output signals that are needed for the neural network to
function.

Increasingly, neural networks use non-linear activation functions, which can help the network
learn complex data, compute and learn almost any function representing a question, and
provide accurate predictions.

Types of Activation Functions:


1. Binary Step Function

A binary step function is a threshold-based activation function. If the input value is above or
below a certain threshold, the neuron is activated and sends exactly the same signal to the next
layer. Values either 0 or 1.
f(x) = 1 if x > 0 else 0 if x < 0
The problem with a step function is that it does not allow multi-value outputs—for example, it
cannot support classifying the inputs into one of several categories.
2. Linear Activation Function:
 Equation : Linear function has the equation similar to as of a straight line i.e. y = ax
 No matter how many layers we have, if all are linear in nature, the final activation
function of last layer is nothing but just a linear function of the input of first layer.
 Range : -inf to +inf
 Uses : Linear activation function is used at just one place i.e. output layer.
 Issues : If we will differentiate linear function to bring non-linearity, result will no
more depend on input “x” and function will become constant, it won’t introduce any
ground-breaking behavior to our algorithm.

For example : Calculation of price of a house is a regression problem. House price may have any
big/small value, so we can apply linear activation at output layer. Even in this case neural net
must have any non-linear function at hidden layers.

3.Non-Linear Activation Functions:


Modern neural network models use non-linear activation functions. They allow the model to
create complex mappings between the network’s inputs and outputs, which are essential for
learning and modeling complex data, such as images, video, audio, and data sets which are non-
linear or have high dimensionality.
four Common Nonlinear Activation Functions
1). Sigmoid Function :-
 It is a function which is plotted as ‘S’ shaped graph.
 Equation : A = 1/(1 + e-x)
 Nature : Non-linear. Notice that X values lies between -2 to 2, Y values are very
steep. This means, small changes in x would also bring about large changes in the
value of Y.
 Value Range : 0 to 1
 Uses : Usually used in output layer of a binary classification, where result is either 0
or 1, as value for sigmoid function lies between 0 and 1 only so, result can be
predicted easily to be 1 if value is greater than 0.5 and 0 otherwise.

2). Tanh Function :- The activation that works almost always better than sigmoid function
is Tanh function also knows as Tangent Hyperbolic function. It’s actually mathematically
shifted version of the sigmoid function. Both are similar and can be derived from each
other.
Equation :-
f(x) = tanh(x) = 2/(1 + e-2x) - 1
OR
tanh(x) = 2 * sigmoid(2x) - 1
 Value Range :- -1 to +1
 Nature :- non-linear
 Uses :- Usually used in hidden layers of a neural network as it’s values lies
between -1 to 1 hence the mean for the hidden layer comes out be 0 or very
close to it, hence helps in centering the data by bringing mean close to 0. This
makes learning for the next layer much easier.
3). RELU :- Stands for Rectified linear unit. It is the most widely used activation function.
Chiefly implemented in hidden layers of Neural network.
 Equation :- A(x) = max(0,x). It gives an output x if x is positive and 0 otherwise.
 Value Range :- [0, inf)
 Nature :- non-linear, which means we can easily backpropagate the errors and
have multiple layers of neurons being activated by the ReLU function.
 Uses :- ReLu is less computationally expensive than tanh and sigmoid because it
involves simpler mathematical operations. At a time only a few neurons are
activated making the network sparse making it efficient and easy for
computation.

In simple words, RELU learns much faster than sigmoid and Tanh function.

4). Softmax Function :- The softmax function is also a type of sigmoid function but is
handy when we are trying to handle classification problems.
 Nature :- non-linear
 Uses :- Usually used when trying to handle multiple classes. The softmax
function would squeeze the outputs for each class between 0 and 1 and would
also divide by the sum of the outputs.
 Output:- The softmax function is ideally used in the output layer of the classifier
where we are actually trying to attain the probabilities to define the class of each
input.
Backpropagation Algorithm:
Back propagation is an algorithm commonly used to train neural networks. When the neural
network is initialized, weights are set for its individual elements, called neurons. Inputs are
loaded, they are passed through the network of neurons, and the network provides an output
for each one, given the initial weights. Back propagation helps to adjust the weights of the
neurons so that the result comes closer and closer to the known true result.

Why We Need Back propagation?


Most prominent advantages of Back propagation are:
 Back propagation is fast, simple and easy to program
 It has no parameters to tune apart from the numbers of input
 It is a flexible method as it does not require prior knowledge about the network
 It is a standard method that generally works well
 It does not need any special mention of the features of the function to be learned

How Backpropagation Works: Simple Algorithm

Simple Algorithm:
1. Inputs X, arrive through the preconnected path
2. Input is modeled using real weights W. The weights are usually randomly
selected.
3. Calculate the output for every neuron from the input layer, to the hidden layers, to
the output layer.
4. Calculate the error in the outputs
ErrorB= Actual Output – Desired Output
5. Travel back from the output layer to the hidden layer to adjust the weights such
that the error is decreased.
Keep repeating the process until the desired output is achieved

Procedure:
Backpropagation algorithm- Example

• Sample calculations for learning by the backpropagation algorithm.


• A multilayer feed-forward neural network. Let the learning rate be 0.9. The initial
weight and bias values of the network are given in Table along with the first training
tuple,
X = (1, 0, 1), whose class label is 1

Introduction to Deep Learning:

Deep learning is a branch of machine learning which is completely based on artificial


neural networks, as neural network is going to mimic the human brain so deep learning is
also a kind of mimic of human brain. In deep learning, we don’t need to explicitly program
everything. The concept of deep learning is not new. It has been around for a couple of
years now. It’s on hype nowadays because earlier we did not have that much processing
power and a lot of data. As in the last 20 years, the processing power increases
exponentially, deep learning and machine learning came in the picture. A formal definition
of deep learning is- neurons.
In human brain approximately 100 billion neurons all together this is a picture of an
individual neuron and each neuron is connected through thousands of their neighbours.
The question here is how do we recreate these neurons in a computer. So, we create an
artificial structure called an artificial neural net where we have nodes or neurons. We have
some neurons for input value and some for output value and in between, there may be lots
of neurons interconnected in the hidden layer.
Architectures:
1. Deep Neural Network – It is a neural network with a certain level of complexity (having
multiple hidden layers in between input and output layers). They are capable of modelling
and processing non-linear relationships.
2. Deep Belief Network (DBN) – It is a class of Deep Neural Network. It is multi-layer
belief networks. Steps for performing DBN: a. Learn a layer of features from visible units
using Contrastive Divergence algorithm. b. Treat activations of previously trained features
as visible units and then learn features of features. c. Finally, the whole DBN is trained
when the learning for the final hidden layer is achieved.
3. Recurrent (perform same task for every element of sequence) Neural Network – Allows
for parallel and sequential computation. Similar to the human brain (large feedback
network of connected neurons). They are able to remember important things about the
input they received and hence enables them to be more precise.

The Relationship Between AI, ML and DL:


Machine Learning is a sub-category of AI, and Deep Learning is a sub-category of ML,
meaning they are both forms of AI.

Artificial intelligence is the broad idea that machines can intelligently execute tasks by
mimicking human behaviors and thought processes.

Machine learning, a subset of AI, revolves around the idea that machines can learn and
adapt through experiences and data to complete specific tasks. An example would be
predicting the weather forecast for the next seven days based on data from the previous
year and the previous week. Every day, the data from the previous year/week changes, so
the ML model must adapt to the new data.

Deep learning is a subset of ML. DL models are based on highly complex neural networks
that mimic how the brain works. With many layers of processing units, deep learning takes
it a step further to learn complex patterns in large amounts of data. For example, deep
learning (combined with computer vision) in a driverless car can identify a person crossing
the road

Difference between Machine Learning and Deep Learning


Working : First, we need to identify the actual problem in order to get the right solution
and it should be understood, the feasibility of the Deep Learning should also be checked
(whether it should fit Deep Learning or not). Second, we need to identify the relevant data
which should correspond to the actual problem and should be prepared accordingly. Third,
Choose the Deep Learning Algorithm appropriately. Fourth, Algorithm should be used
while training the dataset. Fifth, Final testing should be done on the dataset.

Limitations :
1. Learning through observations only.
2. The issue of biases.

Advantages :
1. Best in-class performance on problems.
2. Reduces need for feature engineering.
3. Eliminates unnecessary costs.
4. Identifies defects easily that are difficult to detect.

Disadvantages :
1. Large amount of data required.
2. Computationally expensive to train.
3. No strong theoretical foundation.

Applications :
1. Automatic Text Generation – Corpus of text is learned and from this model new text is
generated, word-by-word or character-by-character. Then this model is capable of learning
how to spell, punctuate, form sentences, or it may even capture the style.
2. Healthcare – Helps in diagnosing various diseases and treating it.
3. Automatic Machine Translation – Certain words, sentences or phrases in one language is
transformed into another language (Deep Learning is achieving top results in the areas of
text, images).
4. Image Recognition – Recognizes and identifies peoples and objects in images as well as
to understand content and context. This area is already being used in Gaming, Retail,
Tourism, etc.
5. Predicting Earthquakes – Teaches a computer to perform viscoelastic computations
which are used in predicting earthquake

Convolutional Neural Network (CNN):


A convolutional neural network (CNN) is a type of artificial neural network used in image
recognition and processing that is specifically designed to process pixel data.
CNNs are powerful image processing, artificial intelligence (AI) that use deep learning to
perform both generative and descriptive tasks, often using machine vison that includes
image and video recognition, along with recommender systems and natural language
processing (NLP).
A CNN uses a system much like a multilayer perceptron that has been designed for reduced
processing requirements. The layers of a CNN consist of an input layer, an output layer and
a hidden layer that includes multiple convolutional layers, pooling layers, fully connected
layers and normalization layers. The removal of limitations and increase in efficiency for
image processing results in a system that is far more effective, simpler to trains limited for
image processing and natural language processing.
Convolution Neural Networks or covnets are neural networks that share their parameters.
Imagine you have an image. It can be represented as a cuboid having its length, width
(dimension of the image) and height (as image generally have red, green, and blue
channels).
Now imagine taking a small patch of this image and running a small neural network on it,
with say, k outputs and represent them vertically. Now slide that neural network across the
whole image, as a result, we will get another image with different width, height, and depth.
Instead of just R, G and B channels now we have more channels but lesser width and height.
his operation is called Convolution. If patch size is same as that of the image it will be a
regular neural network. Because of this small patch, we have fewer weights.

Now let’s talk about a bit of mathematics which is involved in the whole convolution
process.
 Convolution layers consist of a set of learnable filters (patch in the above image).
Every filter has small width and height and the same depth as that of input
volume (3 if the input layer is image input).
 For example, if we have to run convolution on an image with dimension
34x34x3. Possible size of filters can be axax3, where ‘a’ can be 3, 5, 7, etc but
small as compared to image dimension.
 During forward pass, we slide each filter across the whole input volume step by
step where each step is called stride (which can have value 2 or 3 or even 4 for
high dimensional images) and compute the dot product between the weights of
filters and patch from input volume.
 As we slide our filters we’ll get a 2-D output for each filter and we’ll stack them
together and as a result, we’ll get output volume having a depth equal to the
number of filters. The network will learn all the filters.

Layers used to build ConvNets


A covnets is a sequence of layers, and every layer transforms one volume to another
through differentiable function.
Fig: Complete architecture of CNN

Types of layers:

Let’s take an example by running a covnets on of image of dimension 32 x 32 x 3.


1. Input Layer: This layer holds the raw input of image with width 32, height 32 and depth
3.
2. Convolution Layer: This layer computes the output volume by computing dot product
between all filters and image patch. Suppose we use total 12 filters for this layer we’ll get
output volume of dimension 32 x 32 x 12.
3. Activation Function Layer: This layer will apply element wise activation function to the
output of convolution layer. Some common activation functions are RELU: max(0, x),
Sigmoid: 1/(1+e^-x), Tanh, Leaky RELU, etc. The volume remains unchanged hence output
volume will have dimension 32 x 32 x 12.
4. Pool Layer: This layer is periodically inserted in the covnets and its main function is to
reduce the size of volume which makes the computation fast reduces memory and also
prevents from overfitting. Two common types of pooling layers are max pooling and
average pooling. If we use a max pool with 2 x 2 filters and stride 2, the resultant volume
will be of dimension 16x16x12.
5. Fully-Connected Layer: This layer is regular neural network layer which takes input
from the previous layer and computes the class scores and outputs the 1-D array of size
equal to the number of classes.

You might also like