0% found this document useful (0 votes)
7 views

Introduction to Deep Learning

The document provides an introduction to Convolutional Neural Networks (CNNs), covering their structure, components, and functionalities such as convolution layers, activation functions, pooling, and classification. It discusses the limitations of CNNs, including their inability to understand image content and relationships between objects, as well as techniques like transfer learning and one-shot learning. Additionally, it highlights the importance of hyperparameters and the role of loss and dense layers in optimizing CNN performance.

Uploaded by

prathammalviya8
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Introduction to Deep Learning

The document provides an introduction to Convolutional Neural Networks (CNNs), covering their structure, components, and functionalities such as convolution layers, activation functions, pooling, and classification. It discusses the limitations of CNNs, including their inability to understand image content and relationships between objects, as well as techniques like transfer learning and one-shot learning. Additionally, it highlights the importance of hyperparameters and the role of loss and dense layers in optimizing CNN performance.

Uploaded by

prathammalviya8
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 47

Introduction to Deep Learning

1
Topics to be covered…
 Convolutional neural network
 Convolution layer, ReLU Activation Function
 Padding, Stride
 Pooling layer
 Flattening, Subsampling
 Loss layer, Dense layer
 1x1 convolution, Input channels
 Inception network
 Transfer learning, One shot learning
 Dimension reductions
 Implementation of CNN with TensorFlow, Keras
2
Introduction to Convolution Neural Network
 Convolutional Neural Networks are very similar to ordinary Neural
Networks.
 They are made up of neurons that have learnable weights and biases.
 Convolutional neural networks, also called ConvNets, were first
introduced in the 1980s by Yann LeCun, a postdoctoral computer
science researcher.
 The early version of CNNs, called LeNet (after LeCun), could recognize
handwritten digits. CNNs found a niche market in banking and postal
services and banking, where they read zip codes on envelopes and
digits on checks.
 But despite their ingenuity, ConvNets remained on the sidelines of
computer vision and artificial intelligence because they faced a serious
problem: They could not scale.
 In 2012, AlexNet showed that perhaps the time had come to revisit
deep learning, the branch of AI that uses multi-layered neural
3

networks.
Introduction to Convolution Neural Network

4
Introduction to Convolution Neural Network

5
Introduction to Convolution Neural Network
 Convolution Neural Network has input layer, output layer, many
hidden layers and millions of parameters that have the ability to
learn complex objects and patterns.
 Convolutional Neural Networks are a bit different. First of all, the
layers are organized in 3 dimensions: width, height and depth.
 Further, the neurons in one layer do not connect to all the neurons
in the next layer but only to a small region of it. Lastly, the final
output will be reduced to a single vector of probability scores,
organized along the depth dimension.
 It sub-samples the given input by convolution and pooling
processes and is subjected to activation function, where all of these
are the hidden layers which are partially connected and at last end
is the fully connected layer that results in the output layer.
 The output retains the original shape similar to input image
dimensions. 6
Limitations of CNN
 Understanding the meaning of the contents of images, they
perform poorly.

7
Limitations of CNN
 But despite the vast repositories of images and videos they’re
trained on, they still struggle to detect and block inappropriate
content.
 In one case, Facebook’s content-moderation AI banned the photo of
a 30,000-year-old statue as nudity.
 Several studies have shown that CNNs trained on ImageNet and
other popular datasets fail to detect objects when they see them
under different lighting conditions and from new angles.
 A recent study by researchers at the MIT-IBM Watson AI Lab
highlights these shortcomings. It also introduces ObjectNet, a
dataset that better represents the different nuances of how objects
are seen in real life.
 CNNs don’t develop the mental models that humans have about
different objects and their ability to imagine those objects in
previously unseen contexts. 8
Limitations of CNN

9
Limitations of CNN
 Another problem with convolutional neural networks is their
inability to understand the relations between different objects.
Consider the following image, which is known as a “Bongard
problem”.

10
Limitations of CNN
 Adversarial attacks have become a major source of concern as
deep learning and especially CNNs have become an integral
component of many critical applications such as self-driving cars.

11
Convolution Layer
 Convolution is one of the main building blocks of a CNN. The term
convolution refers to the mathematical combination of two
functions to produce a third function. It merges two sets of
information.

 In the case of a CNN, the convolution is performed on the input


data with the use of a filter or kernel (these terms are used
interchangeably) to then produce a feature map.

 We execute a convolution by sliding the filter over the input. At


every location, a matrix multiplication is performed and sums the
result onto the feature map.

12
Convolution Layer
 In the animation below, you can see the convolution operation. You
can see the filter (the green square) is sliding over our input (the
blue square) and the sum of the convolution goes into the feature
map (the red square).
 The area of our filter is also called the receptive field, named after
the neuron cells! The size of this filter is 3x3.

13
Convolution Layer
 In reality convolutions are performed in 3D. Each image is namely
represented as a 3D matrix with a dimension for width, height, and
depth. Depth is a dimension because of the colors channels used in
an image (RGB).

14
Convolution Layer
 We perform numerous convolutions on our input, where each
operation uses a different filter. This results in different feature
maps.
 In the end, we take all of these feature maps and put them
together as the final output of the convolution layer.
 Just like any other Neural Network, we use an activation function to
make our output non-linear. In the case of a Convolutional Neural
Network, the output of the convolution will be passed through the
activation function. This could be the ReLU activation function, y =
max(0, x).
 For example:
 In a given matrix(M), M= [ [ -3, 19, 5 ], [ 7, -6, 12 ], [ 4, -8, 17 ] ]
ReLU converts it as: [ [ 0, 19, 5 ], [ 7, 0, 12 ], [ 4, 0, 17 ] ]
 ReLU is used more often as it works fast, allows the network 15to
converge very quickly and is computationally efficient.
Stride
 Stride is the size of the step the convolution filter moves each time.
A stride size is usually 1, meaning the filter slides pixel by pixel.
 By increasing the stride size, your filter is sliding over the input
with a larger interval and thus has less overlap between the cells.
 More the value of stride, smaller will be the resulting output and
vice versa.

16
Padding
 Because the size of the feature map is always smaller than the
input, we have to do something to prevent our feature map from
shrinking. This is where we use padding.
 A layer of zero-value pixels is added to surround the input with
zeros, so that our feature map will not shrink. Padding also
improves performance and makes sure that the kernel and stride
size will fit in the input.

17
Pooling
 After a convolution layer, it is common to add a pooling layer in
between CNN layers. The function of pooling is to continuously
reduce the dimensionality to reduce the number of parameters and
computation in the network. This shortens the training time and
controls overfitting.
 There can be many number of convolution, ReLU and pooling
layers. Initial layers of convolution learns generic information and
last layers learn more specific/complex features.
 Pooling can be done in following ways :
 Max-pooling : It selects maximum element from the feature
map. The resulting max-pooled layer holds important features
of feature map. It is the most common approach as it gives
better results.
 Average pooling : It involves average calculation for each patch
of the feature map. 18
Pooling

19
Example

20
Classification
 After the convolution and pooling layers, our classification part
consists of a few fully connected layers. However, these fully
connected layers can only accept 1 Dimensional data. To convert
our 3D data to 1D, we use the function flatten in Python. This
essentially arranges our 3D volume into a 1D vector.
 The last layers of a Convolutional NN are fully connected layers.
Neurons in a fully connected layer have full connections to all the
activations in the previous layer. This part is in principle the same
as a regular Neural Network.
 Since all the parameters are occupied into fully-connected layer, it
causes overfitting. Dropout is one of the techniques that reduces
overfitting.
 Dropout is an approach used for regularization in neural networks.
It is a technique where randomly chosen nodes are ignored in
network during training phase at each stage. 21
Classification
 Soft-max is an activation layer normally applied to the last layer of
network that acts as a classifier. Classification of given input into
distinct classes takes place at this layer. The soft max function is
used to map the non-normalized output of a network to a
probability distribution.

 The output from last layer of fully connected layer is directed to


soft max layer, which converts it into probabilities.

 Here soft-max assigns decimal probabilities to each class in a multi-


class problem, these probabilities sum equals 1.0.

 This allows the output to be interpreted directly as a probability.


22
Hyperparameters in CNN
 Thus when using a CNN, the four important hyperparameters we
have to decide on are:
 Kernel size
 Filter count (that is, how many filters do we want to use)
 Stride (how big are the steps of the filter)
 Padding

23
Loss Layer
 In the context of an optimization algorithm, the function used to evaluate
a candidate solution (i.e., a set of weights) is referred to as the objective
function.
 Typically, with neural networks, we seek to minimize the error. As such,
the objective function is often referred to as a cost function or a loss
function and the value calculated by the loss function is referred to as
simply “loss.”
 The cost or loss function has an important job in that it must faithfully
distill all aspects of the model down into a single number in such a way
that improvements in that number are a sign of a better model.
 The cost function reduces all the various good and bad aspects of a
possibly complex system down to a single number, a scalar value, which
allows candidate solutions to be ranked and compared.
 It is important, therefore, that the function faithfully represent our design
goals. If we choose a poor error function and obtain unsatisfactory
results, the fault is ours for badly specifying the goal of the search. 24
Loss Layer

25
Dense Layer
 Dense layer is the regular deeply connected neural network layer. It
is most common and frequently used layer. Dense layer does the
below operation on the input and return the output.
 output = activation(dot(input, kernel) + bias)
 where,
 Input represents the input data
 Kernel represents the weight data
 Dot represents dot product of all input and its corresponding weights
 Bias represents a biased value used in machine learning to optimize the model
 Activation represents the activation function.

26
1x1 Convolution
 Convolutions layers are lighter than fully connected ones. But they
still connect every input channels with every output channels for
every position in the kernel windows.
 A 1x1 convolution kernel acts as an embedding solution. It reduces
the size of the input vector, the number of channels. It makes it
more meaningful. The 1x1 convolutional layer is also called a Point
wise Convolution.
 In 1X1 Convolution simply means the filter is of size 1X1 (Yes —
that means a single number as opposed to matrix like, say 3X3
filter). This 1X1 filter will convolve over the ENTIRE input image
pixel by pixel.
 Staying with our example input of 64X64X3, if we choose a 1X1
filter (which would be 1X1X3), then the output will have the same
Height and Weight as input but only one channel — 64X64X1.
27
1x1 Convolution
 1X1 Convolution is effectively used for:
 Dimensionality Reduction/Augmentation
 Reduce computational load by reducing parameter map
 Add additional non-linearity to the network
 Create deeper network through “Bottle-Neck” layer
 Create smaller CNN network which retains higher degree of accuracy

28
Inception Network
 Inception Network is used in convolutional neural networks to allow
more efficient computation and deeper networks through a
dimensionality reduction with stacked 1×1 convolution.
 The modules were designed to solve the problem of computational
expense, as well as overfitting, among other issues. The solution, in
short, is to take multiple kernel filter sizes within the CNN, and
rather than stacking them sequentially, ordering them to operate
on the same level.
 The most simplified version of an inception module works by
performing a convolution on an input with not one, but three
different sizes of filters (1x1, 3x3, 5x5).
 Also, max pooling is performed. Then, the resulting outputs are
concatenated and sent to the next layer. By structuring the CNN to
perform its convolutions on the same level, the network gets
progressively wider, not deeper. 29
Inception Network

30
Transfer Learning
 Transfer learning is the idea of overcoming the isolated learning
paradigms and utilizing the knowledge acquired for one task to
solve related ones.
 In transfer learning we first train a base network on a base dataset
and task, and then we repurpose the learned features, or transfer
them, to a second target network to be trained on a target dataset
and task. This process will tend to work if the features are general,
that is, suitable to both base and target tasks, instead of being
specific to the base task.
 In practice, very few people train an entire Convolutional Network
from scratch because it is relatively rare to have a dataset of
sufficient size. Instead, it is common to pre-train a ConvNet on a
very large dataset (e.g. ImageNet, which contains 1.2 million
images with 1000 categories), and then use the ConvNet either as
an initialization or a fixed feature extractor for the task of interest.
31
Transfer Learning

32
One Shot Learning
 Deep Convolutional Neural Networks have become the state-of-the-art
methods for image classification tasks. However, one of the biggest
limitations is they require a lot of labeled data.
 One-shot learning is a classification task where one, or a few, examples are
used to classify many new examples in the future.
 This characterizes tasks seen in the field of face recognition, such as face
identification and face verification, where people must be classified
correctly with different facial expressions, lighting conditions, accessories,
and hairstyles given one or a few template photos.
 Modern face recognition systems approach the problem of one-shot
learning via face recognition by learning a rich low-dimensional feature
representation, called a face embedding that can be calculated for faces
easily and compared for verification and identification tasks.
 In a one shot classification, we require only one training example for each
class.
 Historically, embeddings were learned for one-shot learning problems using
a Siamese network. 33
Dimension Reduction
 In machine learning classification problems, there are often too many
factors on the basis of which the final classification is done. These
factors are basically variables called features.
 The higher the number of features, the harder it gets to visualize the
training set and then work on it.
 Sometimes, most of these features are correlated, and hence
redundant. This is where dimensionality reduction algorithms come
into play.
 Dimensionality reduction is the process of reducing the number of
random variables under consideration, by obtaining a set of principal
variables. It can be divided into feature selection and feature
extraction.
 A classification problem that relies on both humidity and rainfall can
be collapsed into just one underlying feature, since both of them are
correlated to a high degree. Hence, we can reduce the number of
features in such problems. 34
Dimensionality Reduction
 We can reduce the number of features in such problems. A 3-D
classification problem can be hard to visualize, whereas a 2-D one
can be mapped to a simple 2 dimensional space, and a 1-D
problem to a simple line.
 The below figure illustrates this concept, where a 3-D feature space
is split into two 1-D feature spaces, and later, if found to be
correlated, the number of features can be reduced even further.

35
Dimension Reduction
 There are two components of dimensionality reduction:
 Feature selection: In this, we try to find a subset of the original set of
variables, or features, to get a smaller subset which can be used to
model the problem. It usually involves three ways:
 Filter
 Wrapper
 Embedded
 Feature extraction: This reduces the data in a high dimensional space
to a lower dimension space, i.e. a space with lesser no. of dimensions.
 Methods of Dimensionality Reduction: The various methods used for
dimensionality reduction include:
 Principal Component Analysis (PCA)
 Linear Discriminant Analysis (LDA)
 Generalized Discriminant Analysis (GDA) 36
Principal Component Analysis (PCA)

 Imagine you're analyzing a dataset with dozens of features, like a


customer survey with age, income, purchase history, and website
behavior.

 While rich, this high dimensionality can be a curse: complex


models, slower training, and even irrelevant info hiding the good
stuff. That's where Principal Component Analysis (PCA) comes in as
your dimensionality reduction hero!

 The Gist: PCA takes your high-dimensional data and squishes it into
a lower-dimensional space, capturing the most important
information but ditching the redundancy. Think of it like
summarizing a long lecture into key points – you lose some detail,
but the core meaning remains.
37
Principal Component Analysis (PCA)

 For example, let’s assume that the scatter plot of our data set is as
shown below, can we guess the first principal component ?
 It’s approximately the line that matches the purple marks because it
goes through the origin and it’s the line in which the projection of the
points (red dots) is the most spread out. Or mathematically speaking, it’s
the line that maximizes the variance (the average of the squared
distances from the projected points (red dots) to the origin).

38
Principal Component Analysis (PCA): Working

 Imagine your data as points in a high-dimensional space. Each point


represents a data instance, and each dimension is a feature.

 PCA finds the directions of maximum variance in this space. Think of


these directions as lines or planes where most of the data points
“spread out”.

 It projects your data points onto these new, lower-dimensional


directions. These new directions are called principal components
(PCs), and they're like the "best lines" to summarize the data's
spread.

 You choose how many PCs to keep based on how much information
you want to retain. Usually, the first few PCs capture most of the
39

variance, so you can discard the rest without losing much.


Principal Component Analysis (PCA): Benefits

 Simpler models and faster training: Less data means less


computation, making your algorithms more efficient.

 Improved visualization: Lower dimensions are easier to visualize,


helping you understand the data's structure.

 Reduced overfitting: Fewer dimensions can help prevent models


from learning just the noise in your data.

 Feature extraction: PCs can reveal hidden patterns and


relationships between features, aiding feature engineering.

40
Principal Component Analysis (PCA): Applications

 PCA assumes linear relationships between features. For non-linear


data, consider alternatives like kernel PCA.
 You lose some information when reducing dimensions, so choose
the number of PCs carefully.
 Interpreting PCs can be tricky, as they are linear combinations of
the original features.

 Image compression: PCA reduces image dimensions while


preserving key features.
 Recommender systems: PCA helps identify user preferences based
on purchase history.
 Anomaly detection: Deviations from the principal components
might indicate anomalies in your data.
41
Implementation of CNN with TensorFlow
 Step 1: Import TensorFlow
 import tensorflow as tf
from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt
 Step 2: Download and prepare the CIFAR10 dataset
 The CIFAR10 dataset contains 60,000 color images in 10 classes, with
6,000 images in each class. The dataset is divided into 50,000 training
images and 10,000 testing images. The classes are mutually exclusive
and there is no overlap between them.
 (train_images, train_labels), (test_images, test_labels) =
datasets.cifar10.load_data()

# Normalize pixel values to be between 0 and 1


train_images, test_images = train_images / 255.0, test_images / 255.0

42
Implementation of CNN with TensorFlow
 Step 3: Verify the data
 class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
'dog', 'frog', 'horse', 'ship', 'truck']

plt.figure(figsize=(10,10))
for i in range(25):
plt.subplot(5,5,i+1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(train_images[i], cmap=plt.cm.binary)
# The CIFAR labels happen to be arrays,
# which is why you need the extra index
plt.xlabel(class_names[train_labels[i][0]])
plt.show()
 Step 4: Create the convolutional base
 model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu')) 43
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
Implementation of CNN with TensorFlow
 Step 5: Compile and train the model
 model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True
),metrics=['accuracy'])

history = model.fit(train_images, train_labels, epochs=10,


validation_data=(test_images, test_labels))
 Step 6: Evaluate the model
 plt.plot(history.history['accuracy'], label='accuracy')
plt.plot(history.history['val_accuracy'], label = 'val_accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.ylim([0.5, 1])
plt.legend(loc='lower right')

test_loss, test_acc = model.evaluate(test_images, test_labels,


verbose=2) 44
Implementation of CNN with Keras
 Step 1: Create a model
 Keras first creates a new instance of a model object and then add layers to it one
after another. It is called a sequential model API.
 # Importing the required Keras modules containing model and layers
 from keras.models import Sequential
from keras.layers import Dense, Conv2D, Dropout, Flatten, MaxPooling2D
 # Creating a Sequential Model and adding the layers
 model = Sequential()
model.add(Conv2D(32, kernel_size=(5,5), input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, kernel_size=(5,5), input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten()) # Flattening the 2D arrays for fully connected layers
model.add(Dense(1024, activation=tf.nn.relu))
model.add(Dropout(0.2))
model.add(Dense(10,activation=tf.nn.softmax))
 #Compile the model
45
 model.compile(optimizer=’adam’, loss=’sparse_categorical_crossentropy’,
metrics=[‘accuracy’])
Implementation of CNN with Keras
 Step 2: Train the model
 model.fit(x=x_train,y=y_train, epochs=10)

 Step 3: Test the model


 test_error_rate = model.evaluate(x_test, y_test, verbose=0)
print(“The mean squared error (MSE) for the test data set is:
{}”.format(test_error_rate))

 Step 4: Save and Load the model


 model.save(“trained_model.h5”)
 Your model will be saved in the Hierarchical Data Format (HDF) with .h5
extension. It contains multidimensional arrays of scientific data.
 We can load our previously trained model by calling the load model
function and passing in a file name. Then we call the predict function and
pass in the new data for predictions.
 model = keras.models.load_model(“trained_model.h5”) 46

predictions = model.predict(new_data)
47

You might also like