0% found this document useful (0 votes)
11 views

DEEP LEARNING & IMAGE PROCESSING [DAY-1 NOTES]

Uploaded by

mevi.programs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

DEEP LEARNING & IMAGE PROCESSING [DAY-1 NOTES]

Uploaded by

mevi.programs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

FDP on Image Processing and Deep Learning using Python

DEEP LEARNING & IMAGE PROCESSING [DAY-1 NOTES]

What is Artificial Intelligence?

AI is a broader term that describes the capability of the machine to learn


and solve problems just like humans. In other words, AI refers to the
replication of humans, how it thinks, works and functions.
Artificial Intelligence is the concept of creating smart intelligent
machines.
Machine Learning is a subset of artificial intelligence that helps you
build AI-driven applications.
Deep Learning is a subset of machine learning that uses vast volumes of
data and complex algorithms to train a model.

How Does Machine Learning Work?


Machine learning accesses vast amounts of data (both structured and
unstructured) and learns from it to predict the future. It learns from the
data by using multiple algorithms and techniques. Below is a diagram
that shows how a machine learns from data.

Prepared by Megha B S MeVi Technologies LLP Page 1


FDP on Image Processing and Deep Learning using Python

Now that you have been introduced to the basics of machine learning and
how it works, let’s see the different types of machine learning methods.

Types of Machine Learning


Machine learning algorithms are classified into three main categories:

1. Supervised Learning
In supervised learning, the data is already labelled, which means you
know the target variable. Using this method of learning, systems can
predict future outcomes based on past data. It requires that at least an
input and output variable be given to the model for it to be trained.
Below is an example of a supervised learning method. The algorithm is
trained using labelled data of dogs and cats. The trained model predicts
whether the new image is that of a cat or a dog.

Some examples of supervised learning include linear regression, logistic


regression, support vector machines, Naive Bayes, and decision tree.

2. Unsupervised Learning
Unsupervised learning algorithms employ unlabelled data to discover
patterns from the data on their own. The systems are able to identify

Prepared by Megha B S MeVi Technologies LLP Page 2


FDP on Image Processing and Deep Learning using Python

hidden features from the input data provided. Once the data is more
readable, the patterns and similarities become more evident.
Below is an example of an unsupervised learning method that trains a
model using unlabelled data. In this case, the data consists of different
vehicles. The purpose of the model is to classify each kind of vehicle.

Some examples of unsupervised learning include k-means clustering,


hierarchical clustering, and anomaly detection.

3. Reinforcement Learning
The goal of reinforcement learning is to train an agent to complete a task
within an uncertain environment. The agent receives observations and a
reward from the environment and sends actions to the environment. The
reward measures how successful action is with respect to completing the
task goal.
Below is an example that shows how a machine is trained to identify
shapes.

Examples of reinforcement learning algorithms include Q-learning and


Deep Q-learning Neural Networks.

Prepared by Megha B S MeVi Technologies LLP Page 3


FDP on Image Processing and Deep Learning using Python

Machine Learning Processes


Machine Learning involves seven steps:

Machine Learning Applications

 Sales forecasting for different products


 Fraud analysis in banking
 Product recommendations
 Stock price prediction

What Is Deep Learning?


Deep learning can be considered as a subset of machine learning. It is a
field that is based on learning and improving on its own by examining
computer algorithms. While machine learning uses simpler concepts,
deep learning works with artificial neural networks, which are designed
to imitate how humans think and learn. Until recently, neural networks
were limited by computing power and thus were limited in complexity.
However, advancements in Big Data analytics have permitted larger,

Prepared by Megha B S MeVi Technologies LLP Page 4


FDP on Image Processing and Deep Learning using Python

sophisticated neural networks, allowing computers to observe, learn, and


react to complex situations faster than humans. Deep learning has aided
image classification, language translation, speech recognition. It can be
used to solve any pattern recognition problem and without human
intervention.
Artificial neural networks, comprising many layers, drive deep learning.
Deep Neural Networks (DNNs) are such types of networks where each
layer can perform complex operations such as representation and
abstraction that make sense of images, sound, and text. Considered the
fastest-growing field in machine learning, deep learning represents a
truly disruptive digital technology, and it is being used by increasingly
more companies to create new business models.

Deep Learning vs. Machine Learning


Aspect Machine Learning Deep Learning
Requires less data to Needs large amounts of data
Data Dependency
train effectively. to train effectively.
Generally less Requires high-end hardware
Hardware
demanding; can work (especially GPUs) due to its
Requirements
on low-end machines. computational complexity.
Often more Less interpretable because of
Interpretability interpretable due to complex model
simpler models. architectures.
Requires manual Learns features
Feature intervention for feature automatically, minimizing
Engineering extraction and the need for manual feature
selection. engineering.
Typically faster to train Requires longer training
Training Time than deep learning times due to more complex
models. architectures.
Model Utilizes simpler Uses complex neural
Prepared by Megha B S MeVi Technologies LLP Page 5
FDP on Image Processing and Deep Learning using Python

Complexity algorithms like linear networks with multiple


regression, decision layers.
trees, etc.
Excelling in areas with
Well-suited for small to substantial data and
Application Scope medium-sized data sets complex problems like
and simpler problems. image and speech
recognition.
Outputs are generally
in the form of Outputs can be more
Output
numerical values, complex, like entire new
Interpretation
labels, or simple images or sequences of text.
categories.
More feasible in
Less feasible due to the
Real-time machine learning with
heavy computational
Learning models that require less
requirements.
computational power.
Involves a variety of
algorithms that can be Primarily revolves around
Algorithm
applied depending on different architectures of
Variability
the type and structure deep neural networks.
of the data.
More dependent on
Less human intervention in
human expertise for
Human processing raw data but
setting up models and
Intervention requires careful network
choosing the right
architecture design.
algorithms.
Libraries like TensorFlow,
Libraries like Scikit-
Keras, and PyTorch are
Software Libraries learn, WEKA are
more tailored to deep
commonly used.
learning.

Prepared by Megha B S MeVi Technologies LLP Page 6


FDP on Image Processing and Deep Learning using Python

Approaches problems
Approaches problems
with traditional
Problem-Solving through layers of
algorithms that may or
Approach abstraction, learning from
may not involve
vast amounts of data.
iterative learning.
Success with Less effective with Highly effective with
Unstructured unstructured data if unstructured data like text,
Data carefully pre-processed. images, and audio.
More complex and time-
Easier and quicker to
Update and Re- consuming to update and
update and retrain with
training retrain models with new
new data.
data.

Artificial Neuron
Neural networks are a collection of artificial neurons arranged in a
particular structure. In this segment, you will understand how a single
artificial neuron works, i.e., how it converts inputs into outputs. You will
also understand the topology or structure of large neural networks. Let’s
get started by understanding the basic structure of an artificial neuron.

However, in perceptron’s, the commonly used activation/output is the


step function, whereas in the case of ANNs, the activation functions are
non-linear functions.

Prepared by Megha B S MeVi Technologies LLP Page 7


FDP on Image Processing and Deep Learning using Python

Here, a represent the inputs, w represent the weights associated with the
inputs, and b represents the bias of the neuron.
Multiple artificial neurons in a neural network are arranged in different
layers. The first layer is known as the input layer, and the last layer is
called the output layer. The layers in between these two are the hidden
layers.
The number of neurons in the input layer is equal to the number of
attributes in the data set, and the number of neurons in the output layer is
determined by the number of classes of the target variable (for a
classification problem).
For a regression problem, the number of neurons in the output layer
would be 1 (a numeric variable). Take a look at the image given below to
understand the topology of neural networks in the case of classification
and regression problems.

Note that the number of hidden layers or the number of neurons in each
hidden layer or the activation functions used in the neural network
Prepared by Megha B S MeVi Technologies LLP Page 8
FDP on Image Processing and Deep Learning using Python

changes according to the problem, and these details determine the


topology or structure of the neural network.

So far, you have understood the basic structure of artificial neural


networks. To summarise, there are six main elements that must be
specified for any neural network. They are as follows:
 Input layer
 Output layer
 Hidden layers
 Network topology or structure
 Weights and biases
 Activation functions

Inputs and Outputs of a Neural Network


The number of neurons in the input layer is determined by the input
given to the network, and the number of neurons in the output layer is
equal to the number of classes (for a classification task) or is one (for a
regression task).

The most important thing to note is that inputs can only be numeric. For
different types of input data, you need to use different ways to convert
the inputs into a numeric form. The most commonly used inputs for
ANNs are as follows:
 Structured data: The type of data that we use in standard machine
learning algorithms with multiple features and available in two
dimensions, such that the data can be represented in a tabular format,
can be used as input for training ANNs. Such data can be stored
in CSV files, MAT files, Excel files, etc. This is highly convenient
because the input to an ANN is usually given as a numeric feature
vector. Such structured data eases the process of feeding the input into
the ANN.
 Text data: For text data, you can use a one-hot vector or word
embedding corresponding to a certain word. For example, in one hot

Prepared by Megha B S MeVi Technologies LLP Page 9


FDP on Image Processing and Deep Learning using Python

vector encoding, if the vocabulary size is |V|, then you can represent
the word wn as a one-hot vector of size |V| with '1' at the nth element
with all other elements being zero. The problem with one-hot
representation is that, usually, the vocabulary size |V| is huge, in tens
of thousands at least; hence, it is often better to use word embedding’s
that are a lower-dimensional representation of each word. The one-hot
encoded array of the digits 0–9 will look as shown below.

data = np.array([0,1,2,3,4,5,6,7,8,9])
print(data.shape)
one_hot(data)

(10,)
array([[1.,0.,0.,0.,0.,0.,0.,0.,0.,0.,],
[0.,1.,0.,0.,0.,0.,0.,0.,0.,0.,],
[0.,0.,1.,0.,0.,0.,0.,0.,0.,0.,],
[0.,0.,0.,1.,0.,0.,0.,0.,0.,0.,],
[0.,0.,0.,0.,1.,0.,0.,0.,0.,0.,],
[0.,0.,0.,0.,0.,1.,0.,0.,0.,0.,],
[0.,0.,0.,0.,0.,0.,1.,0.,0.,0.,],
[0.,0.,0.,0.,0.,0.,0.,1.,0.,0.,],
[0.,0.,0.,0.,0.,0.,0.,0.,1.,0.,],
[0.,0.,0.,0.,0.,0.,0.,0.,0.,1.,]])

 Images: Images are naturally represented as arrays of numbers and


can thus be fed into the network directly. These numbers are the raw
pixels of an image. ‘Pixel’ is short for ‘picture element’. In images,
pixels are arranged in rows and columns (an array of pixel elements).
The figure given below shows the image of a handwritten 'zero' in the
MNIST data set (black and white) and its corresponding
representation in NumPy as an array of numbers. The pixel values are
high where the intensity is high, i.e., the colour is bright, while the
values are low in the black regions, as shown below.

Prepared by Megha B S MeVi Technologies LLP Page 10


FDP on Image Processing and Deep Learning using Python

 Images (cont.): In a neural network, each pixel of the input image is


a feature. For example, the image provided above is an 18 x 18 array.
Hence, it will be fed as a vector of size 324 into the network. Note that
the image given above is black and white (also called a grayscale
image), and thus, each pixel has only one ‘channel’. If it were
a coloured image called an RGB (Red, Green and Blue) image, each
pixel would have three channels, one each for red, blue, and green, as
shown below. Hence, the number of neurons in the input layer would
be 18 x 18 x 3 = 972. The three channels of an RGB image are shown
below.

 Speech: In the case of a speech/voice input, the basic input unit is in


the form of phonemes. These are the distinct units of speech in any
language. The speech signal is in the form of waves, and to convert
these waves into numeric inputs, you need to use Fourier Transforms
(you do not need to worry about this as it is covering areas of
specialised mathematics that will not be covered in this course). Note
that the input after conversion should be numeric, so you are able to
feed it into a neural network.
Prepared by Megha B S MeVi Technologies LLP Page 11
FDP on Image Processing and Deep Learning using Python

Now that you have learnt how to feed input vectors into neural networks,
let’s understand how the output layers are specified.
Depending on the nature of the given task, the outputs of neural
networks can either be in the form of classes (if it is a classification
problem) or numeric (if it is a regression problem). One of the commonly
used output functions is the softmax function for classification. Take a
look at the graphical representation of the softmax function shown below.

Softmax Function
A softmax output is similar to what we get from a multiclass logistic
function commonly used to compute the probability of an output
belonging to one of the multiple classes. It is given by the following
formula:

Where c is the number of classes or neurons in the output layer, x ′ is the


input to the network, and wi are the weights associated with the inputs.

Suppose the output layer of a data set has 3 neurons and all of them have
the same input x (coming from the previous layers in the network). The
weights associated with them are represented as w0, w1and w2. In such a
case, the probabilities of the input belonging to each of the classes are
expressed as follows:
Prepared by Megha B S MeVi Technologies LLP Page 12
FDP on Image Processing and Deep Learning using Python

o, we have seen the softmax function as a commonly used output


function in multiclass classification. Now, let’s understand how the
softmax function translates to the sigmoid function in the special case
of binary classification.
In the case of a sigmoid output, there is only one neuron in the output
layer because if there are two classes with probabilities p0 and p1, we
know that p0 + p1 = 1. Hence, we need to compute the value of
either p0 or p1. In other words, the sigmoid function is just a special case
of the softmax function (since binary classification is a special case of
multiclass classification).In fact, we can derive the sigmoid function from
the softmax function, as shown below. Let's assume that the softmax
function has two neurons with the following outputs:

ow that you have understood how the output is obtained from the
softmax function and how different types of inputs are fed into the ANN,

Prepared by Megha B S MeVi Technologies LLP Page 13


FDP on Image Processing and Deep Learning using Python

let's learn how to define inputs and outputs for image recognition on the
famous MNIST data set for multiclass classification.
There are various problems you will face while trying to recognise
handwritten text using an algorithm, including:
 Noise in the image
 The orientation of the text
 Non-uniformity in the spacing of text
 Non-uniformity in handwriting
The MNIST data set takes care of some of these problems, as the digits are
written in a box. Now the only problem the network needs to handle is
the non-uniformity in handwriting. Since the images in the MNIST data
set are 28 X 28 pixels, the input layer has 784 neurons (each neuron takes
1 pixel as an input) and the output layer has 10 neurons (each giving the
probability of the input image belonging to any of the 10 classes). The
image is classified into the class with the highest probability in the output
layer.

Workings of a Single Neuron


Now that you have seen how inputs are fed into a neuron and how
outputs are obtained using activation functions, let’s reiterate the
concepts with a short summary.

Prepared by Megha B S MeVi Technologies LLP Page 14


FDP on Image Processing and Deep Learning using Python

In the image above, you can see that x1, x2 and x3 are the inputs, and
their weighted sum along with bias is fed into the neuron to give the
calculated result as the output.

To summarise, the weights are applied to the inputs respectively, and


along with the bias, the cumulative input is fed into the neuron. An
activation function is then applied on the cumulative input to obtain the
output of the neuron. We have seen some of the activation functions such
as softmax and sigmoid in the previous segment. We will explore other
types of activation functions in the next segment. These functions apply
non-linearity to the cumulative input to enable the neural network to
identify complex non-linear patterns present in the data set.
An in-depth representation of the cumulative input as the output is given
below.

n the image above, z is the cumulative input. You can see how the
weights affect the inputs depending on their magnitudes. Also, z is the
dot product of the weights and inputs plus the bias.

The image provided below shows the graphical representation of a linear


function and one of the possible representations of a non-linear function.

Prepared by Megha B S MeVi Technologies LLP Page 15


FDP on Image Processing and Deep Learning using Python

The activation functions introduce non-linearity in the network, thereby


enabling the network to solve highly complex problems. Problems that
take the help of neural networks require the ANN to recognise complex
patterns and trends in the given data set. If we do not introduce non-
linearity, the output will be a linear function of the input vector. This will
not help us in understanding more complex patterns present in the data
set.
For example, as we can see in the image below, we sometimes have data
in non-linear shapes such as circular or elliptical. If you want to classify
the two circles into two groups, a linear model will not be able to do this,
but a neural network with multiple neurons and non-linear activation
functions can help you achieve this.

Non Linear Activation Function


Prepared by Megha B S MeVi Technologies LLP Page 16
FDP on Image Processing and Deep Learning using Python

While choosing activation functions, you need to ensure that they are:
 Non-linear,
 Continuous, and
 Monotonically increasing.
The different commonly used activation functions are represented below.

The features of these activation functions are as follows:


 Sigmoid: When this type of function is applied, the output from the
activation function is bound between 0 and 1 and is not centred on
zero. A sigmoid activation function is usually used when we want to
regularise the magnitude of the outputs we get from a neural network
and ensure that this magnitude does not blow up.
 Tanh (Hyperbolic Tangent): When this type of function is applied, the
output is centred around 0 and bound between -1 and 1, unlike a
sigmoid function in which case, it is centred around 0.5 and will give
only positive outputs. Hence, the output is centred around zero for
tanh.
 ReLU (Rectified Linear Unit): The output of this activation function is
linear in nature when the input is positive and the output is zero when
the input is negative. This activation function allows the network to
converge very quickly, and hence, its usage is computationally
efficient. However, its use in neural networks does not help the
network to learn when the values are negative.

Prepared by Megha B S MeVi Technologies LLP Page 17


FDP on Image Processing and Deep Learning using Python

 Leaky ReLU (Leaky Rectified Linear Unit): This activation function is


similar to ReLU. However, it enables the neural network to learn even
when the values are negative. When the input to the function is
negative, it dampens the magnitude, i.e., the input is multiplied with
an epsilon factor that is usually a number less than one. On the other
hand, when the input is positive, the function is linear and gives the
input value as the output. We can control the parameter to allow how
much ‘learning emphasis’ should be given to the negative value.

Prepared by Megha B S MeVi Technologies LLP Page 18

You might also like