Convolutional Neural Networks
& Computer Vision
with
Where can you get help?
“If in doubt, run the code”
• Follow along with the code
• Try it for yourself
• Press SHIFT + CMD + SPACE to read the docstring
• Search for it
• Try again
• Ask (don’t forget the Discord chat!)
(yes, including the “dumb”
questions)
“What is a computer vision
problem?”
Example computer vision problems
“Is this a photo of sushi, steak or pizza?”
Binary classi cation Multiclass classi cation Object detection
n g o r a n o t h e r ) (more than one thing or e th i ng w e ’r e
(one thi (wh er e’s t h
another) looking for?)
fi
fi
What we’re going to cover
(broadly)
• Getting a dataset to work with (pizza_steak 🍕🥩)
• Architecture of a convolutional neural network (CNN) with TensorFlow
• An end-to-end binary image classi cation problem
• Steps in modelling with CNNs
• Creating a CNN, compiling a model, tting a model, evaluating a model
• An end-to-end multi-class image classi cation problem
• Making predictions on our own custom images
👩🍳 👩🔬
(w e’ ll be co ok ing u p lots of co d e! )
How:
fi
fi
fi
Computer vision inputs and outputs
224
W = 224 224 Sushi 🍣
H = 224 Steak 🥩
C=3 Pizza 🍕
(c = colour channels, R, G, B) Actual output
This is often a
convolutional neural network (CNN)!
🍣 🥩 🍕
[[0.31, 0.62, 0.44…], [[0.97, 0.00, 0.03],
[0.92, 0.03, 0.27…], [0.81, 0.14, 0.05],
[0.25, 0.78, 0.07…], [0.03, 0.07, 0.90],
…, (normalized pixel valu …,
es)
Numerical
Predicted output
encoding (often already ex
ists, if not,
you can build on (comes from looking at lots
e) of these)
Input and output shapes
(for an image classification example) We’re going to be building CNNs
to do this part!
224
[[0.31, 0.62, 0.44…], 🍣 🥩 🍕
224 [0.92, 0.03, 0.27…], [0.00, 0.97, 0.03]
[0.25, 0.78, 0.07…], i o n p r ob ab i l i t i e s )
(predict
…,
(gets represented as a tens
or)
[batch_size, width, height, colour_channels] Shape = [3]
Shape = [None, 224, 224, 3]
or
Shape = [32, 224, 224, 3] These will vary depending on the
(32 is a v e ry c o m m o n b a t c h problem you’re working on.
size)
Steps in modelling with TensorFlow
1. Turn all data into numbers (neural networks can’t handle images)
2. Make sure all of your tensors are the right shape
3. Scale features (normalize or standardize, neural networks tend to prefer normalization)
“What is a convolutional neural
network (CNN)?”
(typical)*
Architecture of a CNN
(what we’re working towa
rds
building)
Steak 🥩
Pizza 🍕
*Note: there are almost an unlimited amount of ways you could stack together a convolutional neural network, this slide demonstrates only one.
Let’s code!
Architecture of a CNN
(col o ur e d b l o c k e d it i o n )
Simple CNN
Deeper CNN
Breakdown of Conv2D layer
Example code: tf.keras.layers.Conv2D(filters=10, kernel_size=(3, 3), strides=(1, 1), padding=“same”)
Example 2 (same as above): tf.keras.layers.Conv2D(filters=10, kernel_size=3, strides=1, padding=“same”)
Hyperparameter name What does it do? Typical values
Decides how many lters should pass over an
10, 32, 64, 128 (higher values lead to more
Filters input tensor (e.g. sliding windows over an complex models)
image).
Determines the shape of the lters (sliding 3, 5, 7 (lowers values learn smaller features,
Kernel size (also called lter size) higher values learn larger features)
windows) over the output.
Pads the target tensor with zeroes (if “same”)
to preserve input shape. Or leaves in the
Padding “same” or “valid”
target tensor as is (if “valid”), lowering
output shape.
The number of steps a lter takes across an
Strides image at a time (e.g. if strides=1, a lter 1 (default), 2
moves across an image 1 pixel at a time).
📖 Resource: For an interactive demonstration of the above hyperparameters, see the CNN explainer website.
fi
fi
fi
fi
fi
What is overfitting?
Over tting — when a model over learns patterns in a particular dataset and isn’t able to
generalise to unseen data.
For example, a student who studies the course materials too hard and then isn’t able to perform
well on the nal exam. Or tries to put their knowledge into practice at the workplace and nds
what they learned has nothing to do with the real world.
Under tting Balanced Over tting
(goldilocks zone)
fi
fi
fi
fi
fi
Improving a model (from a model’s perspective)
Smaller model
Common ways to improve a deep model:
• Adding layers Larger model
• Increase the number of hidden units
• Change the activation functions
• Change the optimization function
• Change the learning rate (because you can alter each of
• Fitting on more data these, they’re hyperparameters)
• Fitting for longer
Improving a model (from a data perspective)
Method to improve a model
What does it do?
(reduce over tting)
Gives a model more of a chance to learn patterns between samples
More data (e.g. if a model is performing poorly on images of pizza, show it more
images of pizza).
Increase the diversity of your training dataset without collecting more
data (e.g. take your photos of pizza and randomly rotate them 30°).
Data augmentation
Increased diversity forces a model to learn more generalisation
patterns.
Not all data samples are created equally. Removing poor samples
Better data from or adding better samples to your dataset can improve your
model’s performance.
Take a model’s pre-learned patterns from one problem and tweak
Use transfer learning them to suit your own problem. For example, take a model trained on
pictures of cars to recognise pictures of trucks.
fi
What is data augmentation?
Looking at the same image but from di erent perspective(s)*.
Original Rotate Shift Zoom
*Note: There are many more di erent kinds of data augmentation such as, cropping, replacing, shearing. This slide only demonstrates a few.
ff
ff
Popular & useful computer vision
architectures
Release Use in
Architecture Paper When to use
Date TensorFlow
Find pre-trained versions A good backbone for
ResNet (residual
2015 https://arxiv.org/abs/1512.03385 on TensorFlow Hub or many computer vision
networks) tf.keras.applications problems
Find pre-trained versions Typically now better than
E cientNet(s) 2019 https://arxiv.org/abs/1905.11946 on TensorFlow Hub or ResNets for computer
tf.keras.applications vision
Find pre-trained versions Lightweight architecture
MobileNet(s) 2017 https://arxiv.org/abs/1704.04861 on TensorFlow Hub or suitable for devices with
tf.keras.applications less computing power
ffi
Steps in modelling with TensorFlow
The machine learning explorer’s
motto
“Visualize, visualize, visualize”
Data
Model It’s a good idea to visualize
these as often as possible.
Training
Predictions
The machine learning practitioner’s
motto
“Experiment, experiment, experiment”
👩🍳 👩🔬
(try lots of things an
d see what
tastes good)