ML Unit-5 Final
ML Unit-5 Final
a) Artificial Neural Network (ANNs) are programs designed to solve any problem by trying to mimic
the structure and the function of our nervous system.
b) Neural networks are based on simulated neurons, which are joined together in a variety of ways
to form networks.
c) Neural network resembles the human brain in the following two ways:
A neural network acquires knowledge through learning
A neural network’s knowledge is stored within the interconnection strengths known as a synaptic
weight.
Biological Neurons (also called nerve cells) or simply neurons are the fundamental units of the brain
and nervous system, the cells responsible for receiving sensory input from the external world via
dendrites, process it and gives the output through Axons.
Cell body (Soma): The body of the neuron cell contains the nucleus and carries out biochemical
transformation necessary to the life of neurons. Soma sums all the incoming signals
Dendrites: Dendrites receive signals from other neurons. Each neuron has fine, hair-like tubular
structures (extensions) around it. They branch out into a tree around the cell body. They accept
incoming signals.
Axon: Axons transmits the signals to other cells. It is a long, thin, tubular structure that works like a
transmission line.
Synapse: Neurons are connected to one another in a complex spatial arrangement. When axon
reaches its final destination it branches again called terminal arborization. At the end of the axon
are highly complex and specialized structures called synapses. The connection between two
neurons takes place at these synapses.
Dendrites receive input through the synapses of other neurons. The soma processes these incoming
signals over time and converts that processed value into an output, which is sent out to other
neurons through the axon and the synapses.
Relationship between Biological neural network and artificial neural network:
Feedforward network
• The information moves in only one direction - forward - from the input nodes, through the
hidden nodes (if any) and to the output nodes.
• There are no cycles or loops in the network.
• Two examples of feedforward networks are given below:
• Single Layer Perceptron - This is the simplest feedforward neural network and does not
contain any hidden layer.
– A single layer perceptron can only learn linear functions
• Multi Layer Perceptron - A Multi Layer Perceptron has one or more hidden layers.
– A multi layer perceptron can also learn non - linear functions
– Computation is performed in Hidden and Output layers but not in the Input layer
Perceptron algorithms can be divided into two types they are single layer perceptron and multi-
layer perceptron. In single-layer, perceptron’s neurons are organized in one layer whereas in a
multilayer perceptron’s a group of neurons will be organized in multiple layers. Every single
neuron present in the first layer will take the input signal and send a response to the neurons in
the second layer and so on.
The Multilayer Perceptron (MLPs) breaks this restriction and classifies datasets
which are not linearly separable. They do this by using a more robust and complex
architecture to learn regression and classification models for difficult datasets.
Increasingly, neural networks use non-linear activation functions, which can help the network
learn complex data, compute and learn almost any function representing a question, and
provide accurate predictions.
A binary step function is a threshold-based activation function. If the input value is above or
below a certain threshold, the neuron is activated and sends exactly the same signal to the next
layer. Values either 0 or 1.
f(x) = 1 if x > 0 else 0 if x < 0
The problem with a step function is that it does not allow multi-value outputs—for example, it
cannot support classifying the inputs into one of several categories.
2. Linear Activation Function:
Equation : Linear function has the equation similar to as of a straight line i.e. y = ax
No matter how many layers we have, if all are linear in nature, the final activation
function of last layer is nothing but just a linear function of the input of first layer.
Range : -inf to +inf
Uses : Linear activation function is used at just one place i.e. output layer.
Issues : If we will differentiate linear function to bring non-linearity, result will no
more depend on input “x” and function will become constant, it won’t introduce any
ground-breaking behavior to our algorithm.
For example : Calculation of price of a house is a regression problem. House price may have any
big/small value, so we can apply linear activation at output layer. Even in this case neural net
must have any non-linear function at hidden layers.
2). Tanh Function :- The activation that works almost always better than sigmoid function
is Tanh function also knows as Tangent Hyperbolic function. It’s actually mathematically
shifted version of the sigmoid function. Both are similar and can be derived from each
other.
Equation :-
f(x) = tanh(x) = 2/(1 + e-2x) - 1
OR
tanh(x) = 2 * sigmoid(2x) - 1
Value Range :- -1 to +1
Nature :- non-linear
Uses :- Usually used in hidden layers of a neural network as it’s values lies
between -1 to 1 hence the mean for the hidden layer comes out be 0 or very
close to it, hence helps in centering the data by bringing mean close to 0. This
makes learning for the next layer much easier.
3). RELU :- Stands for Rectified linear unit. It is the most widely used activation function.
Chiefly implemented in hidden layers of Neural network.
Equation :- A(x) = max(0,x). It gives an output x if x is positive and 0 otherwise.
Value Range :- [0, inf)
Nature :- non-linear, which means we can easily backpropagate the errors and
have multiple layers of neurons being activated by the ReLU function.
Uses :- ReLu is less computationally expensive than tanh and sigmoid because it
involves simpler mathematical operations. At a time only a few neurons are
activated making the network sparse making it efficient and easy for
computation.
In simple words, RELU learns much faster than sigmoid and Tanh function.
4). Softmax Function :- The softmax function is also a type of sigmoid function but is
handy when we are trying to handle classification problems.
Nature :- non-linear
Uses :- Usually used when trying to handle multiple classes. The softmax
function would squeeze the outputs for each class between 0 and 1 and would
also divide by the sum of the outputs.
Output:- The softmax function is ideally used in the output layer of the classifier
where we are actually trying to attain the probabilities to define the class of each
input.
Backpropagation Algorithm:
Back propagation is an algorithm commonly used to train neural networks. When the neural
network is initialized, weights are set for its individual elements, called neurons. Inputs are
loaded, they are passed through the network of neurons, and the network provides an output
for each one, given the initial weights. Back propagation helps to adjust the weights of the
neurons so that the result comes closer and closer to the known true result.
Simple Algorithm:
1. Inputs X, arrive through the preconnected path
2. Input is modeled using real weights W. The weights are usually randomly
selected.
3. Calculate the output for every neuron from the input layer, to the hidden layers, to
the output layer.
4. Calculate the error in the outputs
ErrorB= Actual Output – Desired Output
5. Travel back from the output layer to the hidden layer to adjust the weights such
that the error is decreased.
Keep repeating the process until the desired output is achieved
Procedure:
Backpropagation algorithm- Example
Artificial intelligence is the broad idea that machines can intelligently execute tasks by
mimicking human behaviors and thought processes.
Machine learning, a subset of AI, revolves around the idea that machines can learn and
adapt through experiences and data to complete specific tasks. An example would be
predicting the weather forecast for the next seven days based on data from the previous
year and the previous week. Every day, the data from the previous year/week changes, so
the ML model must adapt to the new data.
Deep learning is a subset of ML. DL models are based on highly complex neural networks
that mimic how the brain works. With many layers of processing units, deep learning takes
it a step further to learn complex patterns in large amounts of data. For example, deep
learning (combined with computer vision) in a driverless car can identify a person crossing
the road
Limitations :
1. Learning through observations only.
2. The issue of biases.
Advantages :
1. Best in-class performance on problems.
2. Reduces need for feature engineering.
3. Eliminates unnecessary costs.
4. Identifies defects easily that are difficult to detect.
Disadvantages :
1. Large amount of data required.
2. Computationally expensive to train.
3. No strong theoretical foundation.
Applications :
1. Automatic Text Generation – Corpus of text is learned and from this model new text is
generated, word-by-word or character-by-character. Then this model is capable of learning
how to spell, punctuate, form sentences, or it may even capture the style.
2. Healthcare – Helps in diagnosing various diseases and treating it.
3. Automatic Machine Translation – Certain words, sentences or phrases in one language is
transformed into another language (Deep Learning is achieving top results in the areas of
text, images).
4. Image Recognition – Recognizes and identifies peoples and objects in images as well as
to understand content and context. This area is already being used in Gaming, Retail,
Tourism, etc.
5. Predicting Earthquakes – Teaches a computer to perform viscoelastic computations
which are used in predicting earthquake
Now let’s talk about a bit of mathematics which is involved in the whole convolution
process.
Convolution layers consist of a set of learnable filters (patch in the above image).
Every filter has small width and height and the same depth as that of input
volume (3 if the input layer is image input).
For example, if we have to run convolution on an image with dimension
34x34x3. Possible size of filters can be axax3, where ‘a’ can be 3, 5, 7, etc but
small as compared to image dimension.
During forward pass, we slide each filter across the whole input volume step by
step where each step is called stride (which can have value 2 or 3 or even 4 for
high dimensional images) and compute the dot product between the weights of
filters and patch from input volume.
As we slide our filters we’ll get a 2-D output for each filter and we’ll stack them
together and as a result, we’ll get output volume having a depth equal to the
number of filters. The network will learn all the filters.
Types of layers: