0% found this document useful (0 votes)
56 views

Deep Learning

This document discusses deep learning and convolutional neural networks. Deep learning models are trained on large datasets using backpropagation to minimize errors and improve performance. Convolutional neural networks are commonly used for computer vision tasks and consist of convolutional, pooling, and fully connected layers that learn features through training.

Uploaded by

khillarevarun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views

Deep Learning

This document discusses deep learning and convolutional neural networks. Deep learning models are trained on large datasets using backpropagation to minimize errors and improve performance. Convolutional neural networks are commonly used for computer vision tasks and consist of convolutional, pooling, and fully connected layers that learn features through training.

Uploaded by

khillarevarun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

DEEP LEARNING

Deep learning takes inspiration from the human brain's structure and
function. Artificial neural networks, the core building blocks of deep
learning, mimic the interconnected neurons in our brains.

Deep learning models are trained on massive datasets.

During training, the model adjusts the connections between its


nodes (weights) to minimize errors in its predictions.

This process, called backpropagation, allows the network to


learn from its mistakes and improve its performance
Deep learning thrives on large amounts of data. The more data a model is trained on, the
better it becomes at recognizing patterns and making accurate predictions. Training involves
feeding the model with data samples and comparing its predictions with the actual outcomes.
Through backpropagation, the model adjusts its internal connections (weights) to minimize
the difference between its predictions and the true values. This iterative process refines
the model's ability to learn and improve its performance.

Image recognition: Deep learning excels at tasks like facial recognition, medical image
analysis, and self-driving car vision systems.

Natural language processing (NLP): Deep learning powers chatbots, machine translation,
and sentiment analysis in social media.

Speech recognition: Virtual assistants like Siri and Alexa leverage deep learning for
accurate voice recognition.
Convolution Neural Network
CNN
A Convolutional Neural Network (CNN) is a type of Deep
Learning neural network architecture commonly used in
Computer Vision. Computer vision is a field of Artificial
Intelligence that enables a computer to understand and
interpret the image or visual data.
CNN architecture
Convolutional Neural Network consists of multiple layers like the
input layer, Convolutional layer, Pooling layer, and fully connected
layers.

The Convolutional layer applies filters to the input image to extract features, the Pooling layer
downsamples the image to reduce computation, and the fully connected layer makes the final
prediction. The network learns the optimal filters through backpropagation and gradient descent.
The Convolutional Layer
First, a smidge of theoretical background
When you first saw the term “convolution,” you might have recognized it as the mathematical
operation commonly used in signal processing.
In neural networks, the mechanics of a convolutional layer is not exactly identical to the
mathematical operation, but the general idea is the same: something called a “kernel” gets
swept over an input array and generates an output array.

A kernel is a 2-D array of weights.


Remember how linear regression trains its weights using gradient descent? A CNN does the same
thing: it trains all of its weights during backpropagation. The weights associated with the
convolutional layers in a CNN are what make up the kernels (remember that not every layer in a
CNN is a convolutional layer). Until the weights are trained, none of the kernels know which
“features” they should detect.
So if each kernel is just an array of weights, how do these weights operate on the input image
during convolution? The networkk simply performs an element-wise multiplication between the
kernel and the input pixels within its receptive field, then sums everything up and sends that value
to the output array.
Once the first element of the output array has been filled in, the kernel sweeps over to its next stop.
The next element of the output array is calculated, then the next, and so on until the kernel has swept
over the entire input image and the entire output array has been filled. Take a look at Figure 8 to help
visualize this process. The tiny numbers in the dark blue box sweeping over the input image
correspond to the kernel weights. Notice how these weights never change as the kernel performs its
full sweep:

The set of weights assigned to a kernel have rich visual meaning and
encode which “feature” that kernel will look for as it sweeps across an
image.
Convolution Neural Networks or covnets are neural networks that
share their parameters. Imagine you have an image. It can be
represented as a cuboid having its length, width (dimension of the
image), and height (i.e the channel as images generally have red,
green, and blue channels).
Now imagine taking a small patch of this image and running a small neural
network, called a filter or kernel on it, with say, K outputs and representing
them vertically. Now slide that neural network across the whole image, as a
result, we will get another image with different widths, heights, and depths.
Instead of just R, G, and B channels now we have more channels but lesser width
and height. This operation is called Convolution. If the patch size is the same as
that of the image it will be a regular neural network. Because of this small patch,
we have fewer weights.
Now let’s talk about a bit of mathematics that is involved in the whole
convolution process.
Convolution layers consist of a set of learnable filters (or kernels) having small widths and
heights and the same depth as that of input volume (3 if the input layer is image input).

For example, if we have to run convolution on an image with dimensions 34x34x3. The possible
size of filters can be axax3, where ‘a’ can be anything like 3, 5, or 7 but smaller as compared to
the image dimension.
During the forward pass, we slide each filter across the whole input volume step by step where
each step is called stride (which can have a value of 2, 3, or even 4 for high-dimensional images)
and compute the dot product between the kernel weights and patch from input volume.

As we slide our filters we’ll get a 2-D output for each filter and we’ll stack them together as a
result, we’ll get output volume having a depth equal to the number of filters. The network will
learn all the filters.
Layers used to build ConvNets
Input Layers: It’s the layer in which we give input to our model. In CNN,
Generally, the input will be an image or a sequence of images. This layer holds the
raw input of the image with width 32, height 32, and depth 3.

Convolutional Layers: This is the layer, which is used to extract the feature from
the input dataset. It applies a set of learnable filters known as the kernels to the
input images. The filters/kernels are smaller matrices usually 2×2, 3×3, or 5×5
shape. it slides over the input image data and computes the dot product between
kernel weight and the corresponding input image patch. The output of this layer is
referred as feature maps. Suppose we use a total of 12 filters for this layer we’ll get
an output volume of dimension 32 x 32 x 12.
Flattening: The resulting feature maps are flattened into a one-dimensional vector
after the convolution and pooling layers so they can be passed into a completely
linked layer for categorization or regression.
Fully Connected Layers: It takes the input from the previous layer and
computes the final classification or regression task.

Output Layer: The output from the fully connected layers is then fed into a
logistic function for classification tasks like sigmoid or softmax which
converts the output of each class into the probability score of each class.

You might also like