0% found this document useful (0 votes)
18 views88 pages

Machine - Learning (ANN)

Artificial neural networks are inspired by biological neurons in the human brain. They consist of interconnected layers of artificial neurons that use activation functions to switch "ON" and "OFF". During training, neural networks learn optimal weight values. Each neuron receives multiplied input values and weights, which are added to a bias value and passed through an activation function. Backpropagation is used to adjust the weights to minimize loss. Neural networks can learn complex patterns from data and make intelligent predictions. They are used for applications like computer vision, speech recognition, and natural language processing.

Uploaded by

xavieranosike
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views88 pages

Machine - Learning (ANN)

Artificial neural networks are inspired by biological neurons in the human brain. They consist of interconnected layers of artificial neurons that use activation functions to switch "ON" and "OFF". During training, neural networks learn optimal weight values. Each neuron receives multiplied input values and weights, which are added to a bias value and passed through an activation function. Backpropagation is used to adjust the weights to minimize loss. Neural networks can learn complex patterns from data and make intelligent predictions. They are used for applications like computer vision, speech recognition, and natural language processing.

Uploaded by

xavieranosike
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 88

ARTIFICIAL NEURAL NETWORK

Ishaya, Jeremiah Ayock


Lecture 8

May 1, 2023
Academic City University College, Agbogba Haatso, Ghana.
Artificial Neural Network
NEURAL NETWORK

Artificial neural networks are inspired by the biological


neurons within the human body which activate under certain
circumstances resulting in a related action performed by the
body in response.
Artificial neural nets consist of various layers of
interconnected artificial neurons powered by activation
functions that help in switching them ON/OFF.
Like traditional machine algorithms, here too, there are certain
values that neural nets learn in the training phase.

1
NEURAL NETWORK

Each neuron receives a multiplied version of inputs and


random weights, which is then added with a static bias value
(unique to each neuron layer); this is then passed to an
appropriate activation function which decides the final value to
be given out of the neuron.
There are various activation functions available as per the
nature of input values. Once the output is generated from the
final neural net layer, the loss function (input vs output)is
calculated, and backpropagation is performed where the
weights are adjusted to make the loss minimum.
Finding optimal values of weights is what the overall operation
focuses on.
2
NEURAL NETWORK

Neural networks can help computers make intelligent


decisions with limited human assistance. This is because they
can learn and model the relationships between input
and output data that are nonlinear and complex. For
instance, they can do the following tasks.
Make generalizations and inferences
Neural networks can understand unstructured data and
make general observations without explicit training. For
instance, they can recognize that two different input sentences
have a similar meaning:

3
MAKE GENERALIZATIONS AND INFERENCES CON’T

• Can you tell me how to make the payment?


• How do I transfer money?

A neural network would know that both sentences mean the


same thing. Or it would be able to broadly recognize that
Baxter Road is a place, but Baxter Smith is a person’s
name.

4
WHAT ARE NEURAL NETWORKS USED FOR?

Neural networks have several use cases across many industries,


such as the following:

• Medical diagnosis by medical image classification


• Targeted marketing by social network filtering and
behavioural data analysis
• Financial predictions by processing historical data of
financial instruments
• Electrical load and energy demand forecasting
• Process and quality control
• Chemical compound identification

5
APPLICATIONS OF NEURAL NETWORKS

Computer vision Computer vision is the ability of computers


to extract information and insights from images and videos.
With neural networks, computers can distinguish and recognize
images similar to humans. Computer vision has several
applications, such as the following:
• Visual recognition in self-driving cars so they can
recognize road signs and other road users
• Content moderation to automatically remove unsafe or
inappropriate content from image and video archives
• Facial recognition to identify faces and recognize
attributes like open eyes, glasses, and facial hair
• Image labelling to identify brand logos, clothing, safety
gear, and other image details 6
APPLICATIONS OF NEURAL NETWORKS CON’T

Speech recognition
Neural networks can analyze human speech despite varying
speech patterns, pitch, tone, language, and accent. Virtual
assistants like Amazon Alexa and automatic transcription
software use speech recognition to do tasks like these:

• Assist call centre agents and automatically classify calls


• Convert clinical conversations into documentation in
real-time
• Accurately subtitle videos and meeting recordings for
wider content reach

7
APPLICATIONS OF NEURAL NETWORKS

Natural language processing (NLP) is the ability to


process natural, human-created text. Neural networks help
computers gather insights and meaning from text data and
documents. NLP has several use cases, including these
functions:
• Automated virtual agents and chatbots
• Automatic organization and classification of written data
• Business intelligence analysis of long-form documents like
emails and forms
• Indexing of key phrases that indicate sentiment, like
positive and negative comments on social media
• Document summarization and article generation for a
given topic 8
APPLICATIONS OF NEURAL NETWORKS CON’T

Neural networks can track user activity to develop personalized


recommendations. They can also analyze all user behaviour
and discover new products or services that interest a specific
user.
For example, Curalate, a Philadelphia-based startup, helps
brands convert social media posts into sales. Brands use
Curalate’s intelligent product tagging (IPT) service to
automate the collection and curation of user-generated social
content.
IPT uses neural networks to automatically find and recommend
products relevant to the user’s social media activity.

9
HOW DO NEURAL NETWORKS WORK?

The human brain is the inspiration behind neural


network architecture. Human brain cells, called neurons,
form a complex, highly interconnected network and send
electrical signals to each other to help humans process
information.
Similarly, an artificial neural network is made of artificial
neurons that work together to solve a problem. Artificial
neurons are software modules, called nodes, and artificial
neural networks are software programs or algorithms that, at
their core, use computing systems to solve mathematical
calculations.

10
BUILDING BLOCKS OF A NEURAL NETWORK: LAYERS AND
NEURONS

There are two building blocks of a Neural Network;

• Layers
• Neurons

11
BUILDING BLOCKS OF A NEURAL NETWORK

A neural network is made up of vertically stacked


components called Layers Each dotted line in the image
represents a layer. There are three types of layers in a Neural
Network.

12
LAYERS

Figure 1: Layers of a Neural Network


13
SIMPLE NEURAL NETWORK ARCHITECTURE

A basic neural network has interconnected artificial


neurons in three layers:

1. Input Layer
2. Hidden Layer
3. Output Layer

14
INPUT LAYER

Information from the outside world enters the artificial


neural network from the input layer. Input nodes process
the data, analyze or categorize it, and pass it on to the next
layer.

15
HIDDEN LAYER

Hidden layers take their input from the input layer or other
hidden layers. Artificial neural networks can have a large
number of hidden layers. Each hidden layer analyzes the
output from the previous layer, processes it further, and passes
it on to the next layer.

16
OUTPUT LAYER

The output layer gives the final result of all the data
processing by the artificial neural network.
It can have single or multiple nodes. For instance, if we have a
binary (yes/no) classification problem, the output layer will
have one output node, which will give the result as 1 or 0.
However, if we have a multi-class classification problem, the
output layer might consist of more than one output node.

17
NEURONS IN A NEURAL NETWORK

A layer consists of small individual units called neurons.


A neuron in a neural network can be better understood with
the help of biological neurons.
An artificial neuron is similar to a biological neuron. It
receives input from the other neurons performs some
processing, and produces an output.

18
NEURON

Figure 2: Biological Neuron

Now let’s see an artificial neuron-

19
ARTIFICIAL NEURON

Figure 3: Artificial Neuron

Here, X 1 and X 2 are input to the artificial neurons, f (X )


represents the processing done on the inputs and y represents
the output of the neuron.

20
WHAT IS A FIRING OF A NEURON?

In real life, we all have heard the phrase-“Fire up those


neurons” in one form or another. The same applies to
artificial neurons as well.
Every neuron has a tendency to fire but only in certain
conditions.

21
ARTIFICIAL NEURON

Figure 4: Caption

If we represent this f (X ) by addition then this neuron may fire


when the sum is greater than, say 100. While there may be a
case where the other neuron may fire when the sum is greater
than 10

22
ARTIFICIAL NEURON

Figure 5: Caption

These certain conditions which differ from neuron to neuron


are called thresholds. For example, if the input X1 into the
first neuron is 30 and X2 is 0:

23
ARTIFICIAL NEURON (CONNECTIONS)

Figure 6: Caption

It connects one neuron in one layer to another neuron in


another layer or the same layer. A connection always has a
weight value associated with it. The goal of the training is to
update this weight value to decrease the loss(error).

24
BIAS(OFFSET)

It is an extra input to neurons and it is always 1 and has its


own connection weight.
This makes sure that even when all the inputs are none (all
0’s) there’s gonna be an activation in the neuron.

25
ARTIFICIAL NEURON

Figure 7: Artificial Neuron

This neuron will not fire, since the sum 30 + 0 = 30 is not


greater than the threshold i.e. 100. Whereas if the input had
remained the same for the other neuron then this neuron
would have fired since the sum of 30 is greater than the
26
threshold of 10.
ARTIFICIAL NEURON CON’T

Figure 8: Caption

Now, the negative threshold is called the Bias of a neuron.


Let us represent this a bit mathematically. So we can
represent the firing and non-firing conditions of a neuron using
27
these couple of equations.
ARTIFICIAL NEURON CON’T

Figure 9: A N Expression

If the sum of the inputs is greater than the threshold then the
neuron will fire. Otherwise, the neuron will not fire. Let’s
simplify this equation a bit and bring the threshold to the left
side of the equations. Now, this negative threshold is called
Bias-

28
ARTIFICIAL NEURON

Figure 10: Caption

One thing to note is that in an artificial neural network, all


the neurons in a layer have the same bias. Now that we
have a good understanding of bias and how it represents the
condition for a neuron to fire, let’s move to another aspect of
an artificial neuron called Weights.
29
ARTIFICIAL NEURON

Figure 11: Caption

Here X1 has a weight of 1 and X2 has a weight of 1 and the


bias has a weight of 1 but what if we want to have different
weights attached to different inputs?

30
ACTIVATION FUNCTION(TRANSFER FUNCTION)

Activation functions are used to introduce non-linearity to


neural networks. It squashes the values in a smaller range
viz. a Sigmoid activation function squashes values between
a range of 0 to 1.
There are many activation functions used in the deep learning
industry and ReLU, SeLU and TanH are preferred over the
sigmoid activation function.

31
ACTIVATION FUNCTION

Figure 12: Activations functions


32
WEIGHTS(PARAMETERS)

Weight represents the strength of the connection between


units. If the weight from node 1 to node 2 has a greater
magnitude, it means that neuron 1 has greater influence over
neuron 2.
A weight brings down the importance of the input
value. Weights near zero mean changing this input will not
change the output. Negative weights mean increasing this
input will decrease the output.
A weight decides how much influence the input will have on
the output.

33
WEIGHTS(PARAMETERS)

Figure 13: Weights

34
FORWARD PROPAGATION

Forward propagation is a process of feeding input values to


the neural network and getting an output which we call the
predicted value. Sometimes we refer to forward propagation as
inference.
When we feed the input values to the neural network’s first
layer, it goes without any operations. The second layer takes
values from the first layer and applies multiplication, addition
and activation operations and passes this value to the next
layer.
The same process repeats for subsequent layers and finally, we
get an output value from the last layer.

35
FORWARD PROPAGATION

Figure 14: Forward Propagation

36
BACK-PROPAGATION

After forward propagation, we get an output value which is the


predicted value. To calculate error we compare the predicted
value with the actual output value.
We use a loss function (mentioned below) to calculate the
error value.
Then we calculate the derivative of the error value with
respect to each and every weight in the neural network.
Back-Propagation uses the chain rule of Differential Calculus.

37
BACK-PROPAGATION

In the chain rule first, we calculate the derivatives of the error


value with respect to the weight values of the last layer. We
call these derivatives, gradients and use these gradient values
to calculate the gradients of the second last layer.
We repeat this process until we get gradients for each and
every weight in our neural network. Then we subtract this
gradient value from the weight value to reduce the error value.
In this way, we move closer (descent) to the Local
Minima(means minimum loss).

38
BACK-PROPAGATION

Figure 15: Caption

39
LEARNING RATE

When we train neural networks we usually use Gradient


Descent to optimize the weights. At each iteration, we use
back-propagation to calculate the derivative of the loss
function with respect to each weight and subtract it from that
weight.
The learning rate determines how quickly or how slowly you
want to update your weight(parameter) values.
The learning rate should be high enough so that it won’t take
ages to converge, and it should be low enough so that it finds
the local minimum.

40
CONVERGENCE

Convergence: Convergence is when as the iterations proceed


the output gets closer and closer to a specific value.
Regularization: It is used to overcome the over-fitting
problem.
In regularization, we penalise our loss term by adding an
L1 (LASSO) or an L2 (Ridge) norm on the weight vector w (it
is the vector of the learned parameters in the given algorithm).
L(Loss function) + λN(w ) here λ is your regularization term
and N(w ) is L1 or L2 norm.

41
NORMALISATION

Data normalization is the process of rescaling one or more


attributes to the range of 0 to 1.
Normalization is a good technique to use when you do not
know the distribution of your data or when you know the
distribution is not Gaussian (a bell curve).
It is good to speed up the learning process.

42
FULLY CONNECTED LAYERS

When activations of all nodes in one layer go to each and


every node in the next layer.
When all the nodes in the Lth layer connect to all the nodes in
the (L + 1) the layer we call these layers fully connected layers.

43
BACK-PROPAGATION

Figure 16: Fully Connected Layer

44
LOSS FUNCTION/COST FUNCTION

The loss function computes the error for a single training


example. The cost function is the average of the loss
functions of the entire training set.

• mse: for mean squared error.


• binary crossentropy: for binary logarithmic loss
(logloss).
• categorical crossentropy: for multi-class logarithmic
loss (logloss).

45
MODEL OPTIMIZERS

The optimizer is a search technique, which is used to update


weights in the model.

• SGD: Stochastic Gradient Descent, with support for


momentum.
• RMSprop: Adaptive learning rate optimization method
proposed by Geoff Hinton.
• Adam: Adaptive Moment Estimation (Adam) that also
uses adaptive learning rates.

46
PERFORMANCE METRICS

Performance Metrics: Performance metrics are used to


measure the performance of the neural network. Accuracy,
loss, validation accuracy, validation loss, mean absolute error,
precision, recall and f1 score are some performance metrics.
Batch Size: The number of training examples in one
forward/backward pass. The higher the batch size, the more
memory space you’ll need.
Training Epochs: It is the number of times that the model is
exposed to the training dataset.
One epoch = one forward pass and one backward pass of all
the training examples.
47
DEEP NEURAL NETWORK ARCHITECTURE

Deep neural networks, or deep learning networks, have


several hidden layers with millions of artificial neurons linked
together. A number, called weight, represents the connections
between one node and another. The weight is a positive
number if one node excites another, or negative if one node
suppresses the other. Nodes with higher weight values have
more influence on the other nodes.
Theoretically, deep neural networks can map any input type to
any output type. However, they also need much more training
as compared to other machine learning methods. They need
millions of examples of training data rather than perhaps the
hundreds or thousands that a simpler network might need.
48
TYPES OF NEURAL NETWORKS

Artificial neural networks(ANN) can be categorized by how


the data flows from the input node to the output node.
Below are some examples:

49
FEEDFORWARD NEURAL NETWORKS

Feedforward neural networks process data in one direction,


from the input node to the output node. Every node in
one layer is connected to every node in the next layer.
A feed forward network uses a feedback process to improve
predictions over time.

50
BACK PROPAGATION ALGORITHM

Artificial neural networks learn continuously by using


corrective feedback loops to improve their predictive analytics.
Example:
You can think of the data flowing from the input node to the
output node through many different paths in the neural
network. Only one path is the correct one that maps the input
node to the correct output node.

51
BACK PROPAGATION ALGORITHM CON’T

To find this path, the neural network uses a feedback loop,


which works as follows:

• Each node makes a guess about the next node in the


path.
• It checks if the guess was correct. Nodes assign higher
weight values to paths that lead to more correct guesses
and lower weight values to node paths that lead to
incorrect guesses.
• For the next data point, the nodes make a new prediction
using the higher weight paths and then repeat Step 1.

52
CONVOLUTIONAL NEURAL NETWORKS

The hidden layers in convolutional neural networks perform


specific mathematical functions, like summarizing or
filtering, called convolutions.
They are very useful for image classification because they can
extract relevant features from images that are useful for image
recognition and classification.
The new form is easier to process without losing features that
are critical for making a good prediction. Each hidden layer
extracts and processes different image features, like edges,
colour, and depth.

53
Convolutional Neural
Networks(CNN)
CNN INTRODUCTION

Figure 17: CNN


54
CNN

A Convolutional Neural Network, also known as CNN or


ConvNet, is a class of neural networks that specializes in
processing data that has a grid-like topology, such as an image.
A digital image is a binary representation of visual data.
It contains a series of pixels arranged in a grid-like fashion that
contains pixel values to denote how bright and what colour
each pixel should be.

55
CNN CON’T

A Convolutional Neural Network (ConvNet/CNN) is a


Deep Learning algorithm that can take in an input image,
assign importance (learnable weights and biases) to
various aspects/objects in the image, and be able to
differentiate one from the other.
The pre-processing required in a ConvNet is much lower as
compared to other classification algorithms. While in primitive
methods filters are hand-engineered, with enough training,
ConvNets have the ability to learn these filters/characteristics.

56
CNN CON’T

The architecture of a ConvNet is analogous to that of the


connectivity pattern of Neurons in the Human Brain and was
inspired by the organization of the Visual Cortex.
Individual neurons respond to stimuli only in a restricted
region of the visual field known as the Receptive Field. A
collection of such fields overlap to cover the entire visual area.

57
COVNET

A ConvNet is able to successfully capture the Spatial and


Temporal dependencies in an image through the application
of relevant filters.
The architecture performs a better fitting to the image dataset
due to the reduction in the number of parameters involved and
the reusability of weights.
In other words, the network can be trained to understand the
sophistication of the image better.

58
INTRODUCTION

Figure 18: Input Color image

59
CONVNET

The role of ConvNet is to reduce the images into a form that


is easier to process, without losing features that are critical for
getting a good prediction.
This is important when we are to design an architecture that is
not only good at learning features but also scalable to massive
datasets.

60
KERNEL OR FILTER OR FEATURE DETECTORS

In a convolutional neural network, the kernel is nothing but a


filter that is used to extract the features from the
images.
Formula = [i-k]+1
i =⇒ Size of input , K =⇒ Size of kernel

61
KERNEL

Figure 19: Feature Extraction with kernel

62
STRIDE

Stride is a parameter of the neural network’s filter that


modifies the amount of movement over the image or video.
We had stride 1 so it will take one by one. If we give stride 2
then it will take value by skipping the next 2 pixels.
Formula = [i − k/s] + 1
i =⇒ Size of input , K =⇒ Size of kernel, S =⇒ Stride

63
STRIDE CON’T

Figure 20: Stride

64
PADDING

Padding is a term relevant to convolutional neural networks


as it refers to the number of pixels added to an image when it
is being processed by the kernel of a CNN. For example, if
the padding in a CNN is set to zero, then every pixel value
that is added will be of value zero.
When we use the filter or Kernel to scan the image, the size of
the image will go smaller. We have to avoid that because we
wanna preserve the original size of the image to extract some
low-level features. Therefore, we will add some extra pixels
outside the image.
Formula = [i − k + 2p/s] + 1
i− >Size of input , K − > Size of kernel, S− > Stride, p− >
65
Padding
PADDING CON’T

Figure 21: Padding with Zeros

66
POOLING CON’T

Pooling in convolutional neural networks is a technique for


generalizing features extracted by convolutional filters and
helping the network recognize features independent of their
location in the image.

67
POOLING CON’T

Figure 22: Pooling Operation

68
FLATTEN

Flattening is used to convert all the resultant 2-Dimensional


arrays from pooled feature maps into a single long continuous
linear vector.
The flattened matrix is fed as input to the fully connected
layer to classify the image.

69
FLATTEN CON’T

Figure 23: Flatting

70
LAYERS USED TO BUILD CNN

Convolutional neural networks are distinguished from other


neural networks by their superior performance with
image, speech, or audio signal inputs.
They have three main types of layers, which are:

• Convolutional layer
• Pooling layer
• Fully-connected (FC) layer

71
CONVOLUTIONAL LAYER

This layer is the first layer that is used to extract the various
features from the input images. In this layer, We use a filter or
Kernel method to extract features from the input image.

72
CONVOLUTIONAL LAYER CON’T

Figure 24: Covolutional Layer

73
POOLING LAYER

The primary aim of this layer is to decrease the size of


the convolved feature map to reduce computational
costs.
This is performed by decreasing the connections between
layers and independently operating on each feature map.
Depending upon the method used, there are several types of
Pooling operations.
We have Max pooling and average pooling.

74
CONVOLUTIONAL LAYER

Figure 25: pooling layer

75
FULLY-CONNECTED LAYER

The Fully Connected (FC) layer consists of the weights


and biases along with the neurons and is used to connect
the neurons between two different layers.
These layers are usually placed before the output layer
and form the last few layers of a CNN Architecture.

76
DROPOUT

Another typical characteristic of CNNs is a Dropout layer.


The Dropout layer is a mask that nullifies the contribution
of some neurons towards the next layer and leaves unmodified
all others.

77
DROPOUT

Figure 26: Dropout

78
ACTIVATION FUNCTION

An Activation Function decides whether a neuron should be


activated or not. This means that it will decide whether the
neuron’s input to the network is important or not in the
process of prediction.
There are several commonly used activation functions such as
the ReLU, Softmax, tanh, and the Sigmoid functions.
Each of these functions has a specific usage.

79
ACTIVATION FUNCTION CON’T

• Sigmoid:
For a binary classification in the CNN model
• tanh:
The tanh function is very similar to the sigmoid function.
The only difference is that it is symmetric around the
origin. The range of values, in this case, is from -1 to 1.

80
ACTIVATION FUNCTION CON’T

• Softmax:
It is used in multinomial logistic regression and is often
used as the last activation function of a neural network to
normalize the output of a network to a probability
distribution over predicted output classes.
• RelU:
The main advantage of using the ReLU function over
other activation functions is that it does not activate all
the neurons at the same time.

81
USE CASES

• Image Recognition and Prediction (Manufacturing,


Medical, Research, Agriculture, etc.)
• Texts Classification (Sentiment analysis, translation, or
contextual entity linking)
• Automotive and Self-Driving Cars
• Retail and Popular Voice Assistants
• Voice-to-Voice Translators for Business Travel
• Predictive Advertising
• Recommend Engines

82
DEEP LEARNING ALGORITHMS

• Convolutional Neural Networks (CNNs)


• Recurrent Neural Networks (RNNs)
• Long Short-Term Memory Networks (LSTMs)
• Generative Adversarial Networks (GANs)
• Self-Organizing Maps (SOMs)
• Radial Basis Function Networks (RBFNs)
• Multilayer Perceptron (MLPs)
• Deep Belief Networks (DBNs)
• Restricted Boltzmann Machines( RBMs)
• Autoencoders

83
DEEP LEARNING FRAMEWORKS

• TensorFlow
• Keras
• PyTorch
• Theano
• Caffe
• Deeplearning4j
• MXNet
• Chainer

84
END OF PRESENTATION

THANK YOU

You might also like