DEEP LEARNING & IMAGE PROCESSING [DAY-1 NOTES]
DEEP LEARNING & IMAGE PROCESSING [DAY-1 NOTES]
Now that you have been introduced to the basics of machine learning and
how it works, let’s see the different types of machine learning methods.
1. Supervised Learning
In supervised learning, the data is already labelled, which means you
know the target variable. Using this method of learning, systems can
predict future outcomes based on past data. It requires that at least an
input and output variable be given to the model for it to be trained.
Below is an example of a supervised learning method. The algorithm is
trained using labelled data of dogs and cats. The trained model predicts
whether the new image is that of a cat or a dog.
2. Unsupervised Learning
Unsupervised learning algorithms employ unlabelled data to discover
patterns from the data on their own. The systems are able to identify
hidden features from the input data provided. Once the data is more
readable, the patterns and similarities become more evident.
Below is an example of an unsupervised learning method that trains a
model using unlabelled data. In this case, the data consists of different
vehicles. The purpose of the model is to classify each kind of vehicle.
3. Reinforcement Learning
The goal of reinforcement learning is to train an agent to complete a task
within an uncertain environment. The agent receives observations and a
reward from the environment and sends actions to the environment. The
reward measures how successful action is with respect to completing the
task goal.
Below is an example that shows how a machine is trained to identify
shapes.
Approaches problems
Approaches problems
with traditional
Problem-Solving through layers of
algorithms that may or
Approach abstraction, learning from
may not involve
vast amounts of data.
iterative learning.
Success with Less effective with Highly effective with
Unstructured unstructured data if unstructured data like text,
Data carefully pre-processed. images, and audio.
More complex and time-
Easier and quicker to
Update and Re- consuming to update and
update and retrain with
training retrain models with new
new data.
data.
Artificial Neuron
Neural networks are a collection of artificial neurons arranged in a
particular structure. In this segment, you will understand how a single
artificial neuron works, i.e., how it converts inputs into outputs. You will
also understand the topology or structure of large neural networks. Let’s
get started by understanding the basic structure of an artificial neuron.
Here, a represent the inputs, w represent the weights associated with the
inputs, and b represents the bias of the neuron.
Multiple artificial neurons in a neural network are arranged in different
layers. The first layer is known as the input layer, and the last layer is
called the output layer. The layers in between these two are the hidden
layers.
The number of neurons in the input layer is equal to the number of
attributes in the data set, and the number of neurons in the output layer is
determined by the number of classes of the target variable (for a
classification problem).
For a regression problem, the number of neurons in the output layer
would be 1 (a numeric variable). Take a look at the image given below to
understand the topology of neural networks in the case of classification
and regression problems.
Note that the number of hidden layers or the number of neurons in each
hidden layer or the activation functions used in the neural network
Prepared by Megha B S MeVi Technologies LLP Page 8
FDP on Image Processing and Deep Learning using Python
The most important thing to note is that inputs can only be numeric. For
different types of input data, you need to use different ways to convert
the inputs into a numeric form. The most commonly used inputs for
ANNs are as follows:
Structured data: The type of data that we use in standard machine
learning algorithms with multiple features and available in two
dimensions, such that the data can be represented in a tabular format,
can be used as input for training ANNs. Such data can be stored
in CSV files, MAT files, Excel files, etc. This is highly convenient
because the input to an ANN is usually given as a numeric feature
vector. Such structured data eases the process of feeding the input into
the ANN.
Text data: For text data, you can use a one-hot vector or word
embedding corresponding to a certain word. For example, in one hot
vector encoding, if the vocabulary size is |V|, then you can represent
the word wn as a one-hot vector of size |V| with '1' at the nth element
with all other elements being zero. The problem with one-hot
representation is that, usually, the vocabulary size |V| is huge, in tens
of thousands at least; hence, it is often better to use word embedding’s
that are a lower-dimensional representation of each word. The one-hot
encoded array of the digits 0–9 will look as shown below.
data = np.array([0,1,2,3,4,5,6,7,8,9])
print(data.shape)
one_hot(data)
(10,)
array([[1.,0.,0.,0.,0.,0.,0.,0.,0.,0.,],
[0.,1.,0.,0.,0.,0.,0.,0.,0.,0.,],
[0.,0.,1.,0.,0.,0.,0.,0.,0.,0.,],
[0.,0.,0.,1.,0.,0.,0.,0.,0.,0.,],
[0.,0.,0.,0.,1.,0.,0.,0.,0.,0.,],
[0.,0.,0.,0.,0.,1.,0.,0.,0.,0.,],
[0.,0.,0.,0.,0.,0.,1.,0.,0.,0.,],
[0.,0.,0.,0.,0.,0.,0.,1.,0.,0.,],
[0.,0.,0.,0.,0.,0.,0.,0.,1.,0.,],
[0.,0.,0.,0.,0.,0.,0.,0.,0.,1.,]])
Now that you have learnt how to feed input vectors into neural networks,
let’s understand how the output layers are specified.
Depending on the nature of the given task, the outputs of neural
networks can either be in the form of classes (if it is a classification
problem) or numeric (if it is a regression problem). One of the commonly
used output functions is the softmax function for classification. Take a
look at the graphical representation of the softmax function shown below.
Softmax Function
A softmax output is similar to what we get from a multiclass logistic
function commonly used to compute the probability of an output
belonging to one of the multiple classes. It is given by the following
formula:
Suppose the output layer of a data set has 3 neurons and all of them have
the same input x (coming from the previous layers in the network). The
weights associated with them are represented as w0, w1and w2. In such a
case, the probabilities of the input belonging to each of the classes are
expressed as follows:
Prepared by Megha B S MeVi Technologies LLP Page 12
FDP on Image Processing and Deep Learning using Python
ow that you have understood how the output is obtained from the
softmax function and how different types of inputs are fed into the ANN,
let's learn how to define inputs and outputs for image recognition on the
famous MNIST data set for multiclass classification.
There are various problems you will face while trying to recognise
handwritten text using an algorithm, including:
Noise in the image
The orientation of the text
Non-uniformity in the spacing of text
Non-uniformity in handwriting
The MNIST data set takes care of some of these problems, as the digits are
written in a box. Now the only problem the network needs to handle is
the non-uniformity in handwriting. Since the images in the MNIST data
set are 28 X 28 pixels, the input layer has 784 neurons (each neuron takes
1 pixel as an input) and the output layer has 10 neurons (each giving the
probability of the input image belonging to any of the 10 classes). The
image is classified into the class with the highest probability in the output
layer.
In the image above, you can see that x1, x2 and x3 are the inputs, and
their weighted sum along with bias is fed into the neuron to give the
calculated result as the output.
n the image above, z is the cumulative input. You can see how the
weights affect the inputs depending on their magnitudes. Also, z is the
dot product of the weights and inputs plus the bias.
While choosing activation functions, you need to ensure that they are:
Non-linear,
Continuous, and
Monotonically increasing.
The different commonly used activation functions are represented below.