0% found this document useful (0 votes)
37 views56 pages

L09-10 DL and CNN

The document discusses convolutional neural networks (CNNs) and their use in computer vision tasks. It describes how CNNs learn features directly from input images through multiple processing layers including convolution, activation, pooling and fully connected layers. CNNs have achieved human-level performance on image classification tasks due to availability of large datasets and use of graphics processing units for efficient training. The document provides examples and illustrations of common CNN architectures and their individual components.

Uploaded by

Paulo Santos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views56 pages

L09-10 DL and CNN

The document discusses convolutional neural networks (CNNs) and their use in computer vision tasks. It describes how CNNs learn features directly from input images through multiple processing layers including convolution, activation, pooling and fully connected layers. CNNs have achieved human-level performance on image classification tasks due to availability of large datasets and use of graphics processing units for efficient training. The document provides examples and illustrations of common CNN architectures and their individual components.

Uploaded by

Paulo Santos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 56

•Lecturer: Paulo Santos

COMP2712 – NNML •[email protected]


•Office: 4.24
Convolutional NN
Deep Convolutional Neural Networks (CNN)
• Up to this point: patterns were organised in terms of feature vectors
• The form of these features are specified by a human designer
• Extracted from the images prior to being input to the NN
• Convolutional Neural Networks:
• Accept images as inputs
• Learn the features as well as the classification
DNN: Classical Pipeline
Domain Experts
Computer Vision ML Assistant
Blackbox
SVM

Preprocess Data
Clean Data SIFT/SURF
Hand Craft Features

Obtain Data
Deep Neural Networks: Why now?
• Data, Data, Data
• ImageNet (14,197,122 images, http://www.image-net.org/)
• AlexNet[7] achieved a top-5 error of 15.3% in the ImageNet 2012
• More than 10.8 percentage points lower than that of the runner up
• GPU Accelerated Computation
• Smart People
DNN: Why is it exciting?
• Deep-learning networks perform automatic feature extraction
without human intervention, unlike most traditional machine-
learning algorithms.
• Given that feature extraction is a task that can take teams of
data scientists years to accomplish, deep learning is a way to
circumvent the chokepoint of limited experts.
• It augments the powers of small data science teams, which by
their nature do not scale.
DNN: “New” Pipeline
Domain Experts
Computer Vision
DeepLearning
I’m a Feature
Engineer
Blackbox

Preprocess Data
Clean Data SIFT/SURF
Hand Craft Features

Obtain Data

ML Champion
Deep Neural Networks: Champions
Vanishing and Exploding Gradients
• Vanishing Gradient
• Error travels from the output layer towards the input layer.
• The gradients often get smaller and smaller and approach zero.
• Eventually leaves the weights of the initial or lower layers nearly
unchanged.
• As a result, the gradient descent never converges to the optimum
• Gradient Explosion
• Error gradients can accumulate during an update and result in very
large gradients
• result in large updates to the network weights
• in turn, an unstable network
https://www.analyticsvidhya.com/blog/2021/06/the-challenge-of-vanishing-exploding-gradients-in-deep-neural-networks/

Vanishing and Exploding Gradients


• Vanishing Gradient
• Saturates at 0 or 1 with a derivative
All the fun happens very close to zero
here
• Backpropagation has no gradients to
propagate!

No gradients to propagate
activation=tf.nn.relu

Vanishing and Exploding Gradients


• Better, non-saturating activation functions
• ReLU and leaky ReLU

Rectified Linear Unit


from tensorflow.keras.layers import Dropout

Regularization: Drop-out
• Avoids overspecialisation
• Not a real “layer”
• Randomly chooses percentage of
neurons on the preceding layer
• Temporarily disconnects their
inputs and outputs
• Removed from
• forward pass,
• backpropgation
• optimiser
from tensorflow.keras.layers import BatchNormalization

Regularization: BatchNorm
• Batch Normalisation
• Activation functions work best
within a small range around 0
• batchnorm does by scaling
and shifting all the outputs of
a layer together
• learns the parameters for this
scaling and shifting
CNN
CNN: Layer Architecture

Deep neural networks + convolutions


Basic CNN architecture
http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf http://mlss.tuebingen.mpg.de/2015/slides/fergus/Fergus_1.pdf

CNN: Layer Architecture


LeCun (1998) Gradient-Based Learning Applied to Document Recognition

• There are four main operations


• Convolution
• Non-Linearity (ReLU)
• Pooling or Sub Sampling
• Classification (Fully Connected or Dense Layer)
https://cs231n.github.io/convolutional-networks/

CNN: Layer Architecture


The architecture shown here is a tiny VGG Net http://www.robots.ox.ac.uk/~vgg/research/very_deep/

pre-trained weights: https://keras.io/api/applications/


What is a convolution
Find a vertical white stripe up the centre

a) 2D Filter (weights)
b) Random Image
c) Image convolved with Filter
d) Threshold to maximum filter value
e) Highlighted maximum values and surrounds

https://colab.research.google.com/drive/1ZAmIKkU-enU8YDY4d3PyS1Xn0Mp-gqax?usp=sharing
• https://setosa.io/ev/image-kernels/
Basics of a CNN operation
• The type of neighbourhood processing in a CNN is spatial convolution
• Computes a sum of products between pixels and a set of kernel weights
• At every spatial location in the input image
• The result at each (x,y) is a scalar value
• This scalar value is the output of a neuron
• Adding a bias passing the result through an activation function
•  we have our good old NN!
https://cs231n.github.io/convolutional-networks/

CNN: Convolution
• Neighbourhoods  Receptive Fields (RF)
• The receptive fields move over the image executing convolution
• The set of weights, arranged as a receptive field, is a kernel
• Number of spatial increments of RF: strides
• To each convolution value we add a bias
• Then pass the result through an activation function to generate a single value
• This value is fed to the corresponding location in the input of next layer
• This is repeated to all locations in the input image, resulting in a 2D set of values stored in the next layer
as a 2D array called feature map
•  the role of convolution here is to extract features, such as edges, points, blobs
• Convolutional layer:
• three features maps, obtained from three distinct kernels!
• After convolution and activation:
• Subsampling (or pooling):
• Produces pooled features maps: Pooling Layer
• Reduction in spatial resolution:
• responsible for translational invariance
• Reduces the volume of data
• Done by subdividing the feature maps into a set of small (typically 2x2) regions:
• Pooling neighbourhoods
• Replacing all the values of that neighbourhood by a single value
• Common pooling methods:
• Average pooling: substitute by the average
• Max-pooling: substitute by the max value
• L2 pooling: substitute by the square root of the sum
https://cs231n.github.io/convolutional-networks/

CNN: Pooling or Sub-sampling


• Convolution:
• Filtered images
• Pooling:
• Filtered images of lower resolution
• The pooled filter maps in the first layer become the inputs to the next layer
• But we now have multiple pooled feature maps
• As convolution is a linear operation (remember assignment 1??)
• The values can be combined into a single one by superposition

• The ultimate goal is classification:


• The final pooled feature maps are fed into a Fully Connected Neural Net
• As we’ve seen before  the input should be vectorized.
Example

• Think of each element of a 2D array in the top row as a


neuron
• The outputs of these neurons are pixel values, creating
feature maps
• The neurons in the feature map of the 1st layer have
output values generated by convolving with the input
image a kernel, whose size and shape are the same as the
receptive field
• And whose coefficients are learned during training
• To each convolution value we add a bias and pass the
result through an activation function to generate the
output value of the corresponding neuron in the feature
map
• The output values of neurons in the pooled feature maps
are generated by pooling the output values of neurons in
the feature maps
• The kernel weights (shown as intensity values) are
learned from sample images using backpropagation
• Therefore, the nature of the learned features is determined
by the learned kernel coefficients
Graphical illustration of the functions
performed by the components of a CNN
Feature Pooled Feature Pooled Neural
maps feature maps feature net
maps maps
0

Vector
5

9
Teaching a CNN to recognise simple images
Teaching a CNN to recognise simple images

Training Image Set Test Image Set


CNN to recognise handwritten numerals
(MNIST dataset)
• 60,000 training images
• 10,000 test images
• Grayscale images of size 28x28 pixels
Architecture of the CNN trained to recognise
ten digits in the MNIST dataset
Kernels
Same architecture as before
Results of a forward pass
CNN: Visualisation
• http://www.cs.cmu.edu/~aharley/vis/
• http://www.cs.cmu.edu/~aharley/vis/conv/
Limitations
https://arxiv.org/pdf/1312.6199.pdf

CNN: Intriguing properties

correct diff ostrich


Deep Learning: Pros and Cons
Pros Cons
• Best performing method in • Need huge amount of data
many CV tasks • Hard to design and tune
• No need for hand-crafting • Difficult to analyse/understand
• Robust to natural variation • SVMs easy to deploy and get
• Many different applications good results
• Large-scale problems • Tend to learn everything
• Improves with more data
• Easy parallelization on GPUs
Deep Learning: Best Practices
• Check/clean your data
• Shuffle the training samples
• Split your data into training and testing samples
• Use validation data as well
• Never train on test data
• Start with an existing network and adapt it
• Start smallish, keep adding layers and nodes
• Check that you can achieve zero loss on a tiny subset

https://jeffmacaluso.github.io/post/DeepLearningRulesOfThumb/
https://towardsdatascience.com/17-rules-of-thumb-for-building-a-neural-network-93356f9930af
Deep Learning: When to use?

• You have large amount of data with good quality

• You are modelling image/audio/language/time-series data

• Excels in tasks where the basic unit (pixel, word) has very little
meaning in itself, but their combination has a useful meaning

• You need a model that is less reliant on handmade features and


instead can learn features from the data
The need for more data…
https://www.tensorflow.org/tutorials/images/data_augmentation

Data Augmentation
• Input images can be cropped, rotated, or rescaled to create new
examples with the same labels as the original training set

https://colab.research.google.com/drive/1dLVBk9E94tOLc5BdN3ClZ2tEPUHe2TMt?usp=sharing
https://keras.io/guides/transfer_learning/ https://cs231n.github.io/transfer-learning/

Transfer Learning
• Big networks needs lots of data and lots
of compute power
• Free Google Colab is not going to work!
• Transfer learning is using previously
trained models as
• a starting point for further refinement
and/or
• a front-end feature extractor for a classifier
(with a little refinement)
Transfer Learning
• The most common transfer learning workflow:
1. Take layers from a previously trained model.
2. Freeze them, so as to avoid destroying any of the information they
contain during future training rounds.
3. Add some new, trainable layers on top of the frozen layers. They will
learn to turn the old features into predictions on a new dataset.
4. Train the new layers on your dataset.
Transfer Learning – VGG16
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.applications.vgg16 import VGG16
from tensorflow.keras.applications.vgg16 import preprocess_input

# set the shape to the CIFAR-10 image size and the number of classes
input_shape = (32, 32, 3)
classes = 10

# load the VGG16 model with the imagenet weights, but without the final 1000 output layer
base_model = VGG16(
weights="imagenet", # Load weights pre-trained on ImageNet.
input_shape=input_shape,
include_top=False, # Do not include the ImageNet classifier at the top.
)
# Freeze the base_model
base_model.trainable = False
include_top=False
Removes these layers
Transfer Learning

vgg16

Non-trainable params are part of vgg16


Transfer Learning – Fine Tuning
• The most common transfer learning workflow:
1. Take layers from a previously trained model.
2. Freeze them, so as to avoid destroying any of the information they
contain during future training rounds.
3. Add some new, trainable layers on top of the frozen layers. They will
learn to turn the old features into predictions on a new dataset.
4. Train the new layers on your dataset.
[optional]
5. Fine-tuning, which consists of unfreezing the entire model and re-
training it on the new data with a very low learning rate.
Transfer Learning – Fine-tuning
base_model.trainable = True Allow all weights to be trained
model.summary()

learning_rate = 1e-5 Small learning rate

model.compile( Must re-compile the model for trainable to take affect


optimizer=tf.optimizers.Adam(learning_rate=learning_rate),
loss=keras.losses.categorical_crossentropy,
metrics=keras.metrics.categorical_crossentropy
)

epochs = 5
model.fit(X_train,y_train, epochs=epochs)
Transfer Learning – Fine-tuning

vgg16

Vgg16 params now trainable!


Further important reading:
• Bias in Machine Learning
https://towardsdatascience.com/june-edition-bias-in-the-machine-994eadbccec2

You might also like