VGG Image Classification Practical
VGG Image Classification Practical
The goal of this session is to get basic practical experience with classification. It
includes: (i) training a visual classifier for five different image classes (airplanes,
motorbikes, people, horses and cars); (ii) assessing the performance of the
classifier by computing a precision-recall curve; (iii) training set and testing set
augmentation; and (iv) obtaining training data for new classifiers using Bing image
search and using the classifiers to retrieve images from a dataset.
Stage 1.G: Learn a classifier for the other categories and assess
its performance
Stage 1.H: Vary the image representation
Stage 1.I: Vary the CNN representation
Stage 1.J: Visualize class saliency
Part 2: Training an Image Classifier for Retrieval using Bing images
Links and further work
Acknowledgements
History
Getting started
Read and understand the requirements and installation instructions. The download
links for this practical are:
After the installation is complete, open and edit the script exercise1.m in the
MATLAB editor. The script contains commented code and a description for all
steps of this exercise, relative to Part I of this document. You can cut and paste this
code into the MATLAB window to run it, and will need to modify it as you go
through the session. Other files such as exercise2.m , contain the code for
other parts of the practical, as indicated below.
Note: the student packages contain only the code required to run the practical. The
complete package, including code to preprocess the data, is available on GitHub.
www.robots.ox.ac.uk/~vgg/practicals/category-recognition-cnn/index.html 2/11
1/7/2019 VGG Practical
First, the network is pre-trained on the ImageNet dataset to classify an image into
one of a thousand categories. This determines all the parameters of the CNN, such
as the weights of the convolutional filters. Then, for a new image, the trained
network is used to generate a descriptor vector from the response of a layer in the
architecture with this image as input. For this practical we will use the VGG-M-128
network, which produces a 128 dimensional descriptor vector at the last fully-
connected layer before classification, called fc7. We also consider computationally
cheaper but weaker features extracted from the convolutional layers. In particular,
we consider the following encodings:
www.robots.ox.ac.uk/~vgg/practicals/category-recognition-cnn/index.html 3/11
1/7/2019 VGG Practical
Task: Look through example images of the motorbike class and the
background images by browsing the image files in the data directory.
The motorbike training images will be used as the positives, and the background
images as negatives. The classifier is a linear Support Vector Machine (SVM).
We will first assess qualitatively how well the classifier works by using it to rank all
the training images.
www.robots.ox.ac.uk/~vgg/practicals/category-recognition-cnn/index.html 4/11
1/7/2019 VGG Practical
Questions:
Is classifying the average vector for the test image the same as
classifying each vector independently and then averaging the classifying
score?
When would you expect flipping augmentation to be detrimental to
performance?
www.robots.ox.ac.uk/~vgg/practicals/category-recognition-cnn/index.html 5/11
1/7/2019 VGG Practical
Note: when learning the SVM, to save training time we are not changing the C
parameter. This parameter influences the generalization error and should be
relearnt on a validation set when the training setting is changed (see stage Stage
1.F). However, in this case the influence of C is small as can be verified
experimentally.
Questions:
Task: Edit exercise1.m to vary the C parameter in the range 0.1 to 1000
(the default is C = 10 ), and plot the AP on the training and test data as C
varies.
Now repeat Stage B and C for each of the other two object categories: airplanes
and people. To do this you can simply rerun exercise1.m after changing the
dataset loaded at the beginning in stage (A). Remember to change both the
training and test data. In each case record the AP performance measure.
′ ′
K (h, h ) = ∑ hi h
i
i=1
to measure the similarity between pair of objects h and h′ (in this case pairs of
CNN descriptors).
Question: What can you say about the self-similarity,K (h, h), of a descriptor
h that is L2 normalized?
descriptors h and h′
www.robots.ox.ac.uk/~vgg/practicals/category-recognition-cnn/index.html 7/11
1/7/2019 VGG Practical
Questions:
A useful rule of thumb is that better performance is obtained if the vectors that are
ultimately fed to a linear SVM (after any intermediate processing) are L2
normalized.
Tasks:
Question: How much does the choice of feature depth affect classification
performance?
Now make the setup even more extreme by considering the so-called one-shot
learning problem, i.e. learning an image classifier from a single training image.
Thus, set numPos=1 and rerun the experiments.
Questions:
Can you get good performance using a single training images and the
deeper features?
If so, how is it possible that the system can learn to recognize an object
from a single example?
www.robots.ox.ac.uk/~vgg/practicals/category-recognition-cnn/index.html 8/11
1/7/2019 VGG Practical
Use this function first to visualize the class saliency using vggm128-fc7 features
for an image containing the object category.
Question: Do the areas correspond to the regions that you would expect to be
selected?
Tasks:
The MATLAB code exercise2.m provides the following functionality: it uses the
images in the directory data/myImages and the default negative list
data/background_train.txt to train a classifier and rank the test images. To get
started, we will train a classifier for horses:
www.robots.ox.ac.uk/~vgg/practicals/category-recognition-cnn/index.html 9/11
1/7/2019 VGG Practical
Use Bing image search with ''horses'' as the text query (you can also set the photo
option on)
Pick 5 images and drag and drop (save) them into the directory data/myImages .
These will provide the positive training examples.
Tasks:
Run the code exercise2.m and view the ranked list of images. Note,
since feature vectors must be computed for all the training images, this
may take a few moments.
Now, add in 5 more images and retrain the classifier.
The test data set contains 148 images with horses. Your goal is to train a classifier
that can retrieve as many of these as possible in a high ranked position. You can
measure your success by how many appear in the first 36 images (this
performance measure is `precision at rank-36'). Here are some ways to improve
the classifier:
Tasks:
Note: all images are automatically normalized to a standard size, and descriptors
are saved for each new image added in the data/cache directory. The test data
also contains the category car. Train classifiers for it and compare the difficulty of
this and the horse class.
www.robots.ox.ac.uk/~vgg/practicals/category-recognition-cnn/index.html 10/11
1/7/2019 VGG Practical
Acknowledgements
Funding from ERC grant "Integrated and Detailed Image Understanding", and
the EPSRC Programme Grant "SeeBiByte".
History
Used in the Oxford AIMS CDT, 2017-18.
Used in the Oxford AIMS CDT, 2016-17.
First used in the Oxford AIMS CDT, 2015-16.
Replaces the Image Categorization practical based on hand-crafted features.
www.robots.ox.ac.uk/~vgg/practicals/category-recognition-cnn/index.html 11/11