0% found this document useful (0 votes)
142 views

Stanford Dog Classification Using Convolutional Neural Network (CNN)

This document describes a project to classify dog breeds using convolutional neural networks. The team developed a model using Inception v3 as the pre-trained network. They modified the fully connected layers to output 120 classes for the 120 dog breeds in the Stanford Dogs dataset. The model was trained for 20 epochs and evaluated on a test set, achieving an accuracy of X%. The team also tested new dog images on their model to classify the breeds and top 5 probabilities. They conducted experiments comparing different pre-trained models and hyperparameters.

Uploaded by

Wenhuan Song
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
142 views

Stanford Dog Classification Using Convolutional Neural Network (CNN)

This document describes a project to classify dog breeds using convolutional neural networks. The team developed a model using Inception v3 as the pre-trained network. They modified the fully connected layers to output 120 classes for the 120 dog breeds in the Stanford Dogs dataset. The model was trained for 20 epochs and evaluated on a test set, achieving an accuracy of X%. The team also tested new dog images on their model to classify the breeds and top 5 probabilities. They conducted experiments comparing different pre-trained models and hyperparameters.

Uploaded by

Wenhuan Song
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Stanford Dog Classification using Convolutional Neural

Network(CNN)
Research- Application- ✓
based based

Name Song Wen Huan Hooi Ka Jing Chuang Gao Jie


(Leader)
Programme CS CS CS
ID 1602672 1605520 1605122
Contribution 1/3 1/3 1/3

1. INTRODUCTION
Today, pet dogs had been given the title of human’s best friend among most of the
animals in the world. It might not be a difficult task for a dog owner to tell the dog
breed of their pet dog. However, they can’t really recognize the exact dog breeds as
the breeds of dog is more complicated than you know. According to the World Canine
Organisation, which is best known by its French title Federation Cynologique
Internationale (FCI) the largest registry of dog breeds, there are about 339 breeds of
dogs. They are divided into 10 groups based on the dog’s appearance or size. Each
group is then divided into subgroups of dog breeds and each has been assigned a
country or region of origin. Hence, dog breeds classification is an interesting and
worth discoverable task that shows us how well can machine solves our problem. The
development of this classification model also provides an efficient application that
recognizes the breeds of dogs.

The dog breed classification falls under the problem of fine-grained image
classification (FGIC). Different approaches have been explored to solve FGIC, where
most of them used Convolutional Neural Networks (CNNs), a deep learning technique
that had high accuracy and performance on solving classification problems. They
basically differ at using CNNs on different architectures such as LeNet, GoogLeNet
or VGG19.

The main challenges in this project are to classified between some very similar
dog breeds, such as differentiating between a miniature poodle and toy poodle, or
Samoyed dog and Eskimo dog. These dogs are basically differed by their size or
height, where the traditional classifying method is not effective enough to solve this
problem. Other than that, time management is another challenging task for us, where
training large datasets requires a very long training time, that limit us by comparing
the accuracy of different CNNs architecture models.

The scope of this project is to classify the images from Stanford Dogs Dataset
download from Kaggle, which contains 120 breeds of dogs. Furthermore, Google
Colab was used as a development environment in this project. Python language and
Pytorch library were used to code this project.
The purpose of this project is to develop a that is able to solve the problem
statement mentioned before. It is expected to bring out an application that is able to
classify the dog in an image to their category with high accuracy and to assist human
users to find out the dog`s breed especially dogs that are quite similar to each other
but belong to a different breed.

2. RELATED WORK
In this paper Transfer Learning for Image Classification of various dog breeds,
Pratik Devikar (2016) has used transfer learning to perform image classification on
various dog breeds. They use Google`s Inception-V3 model to train the images which
are similar to our system as we also use inception-V3. The features of different kinds
of dog breeds are retrained by the last layer as their case of using transfer learning.
Besides, 11 datasets consist of 25 images for every dataset is used in this paper, which
is a relatively small dataset. Also, all images have the same dimension, with the width
and length of 100x100 pixels. Our system is different compared to this paper as we
use quite a huge dataset, 20580 images that contain 120 different breeds of dog and
every image has a different dimension.

Furthermore, in this paper, the next-to-last layer is retrieved as a feature vector to


extract features for each image. The result is that these outputs can train other model`s
classification. In transfer learning, the size of the new dataset and similarity of both
testing and training datasets are the most important elements. In this system, the new
dataset`s size is small, however, the content is very different from the original dataset.
Thus, a linear classifier is used to train the dataset since it is small.
3. SYSTEM DESIGN

Figure 1.1 shows the block diagram of dog classifier

This dog classification model was developed using Google Colab and Pytorch
as our machine learning framework. We were using a pre-trained convolutional neural
network(CNN) as our model and modified the layer inside the pre-trained model to fit
our datasets. As the development environment for this project is Google Colab, so the
dataset was uploaded to Google Drive and then mount with Google Colab to access.
This project will be described based on the stages of the machine learning pipeline.

1. Stanford Dogs Dataset(Data Collection) 


The Stanford Dogs Dataset used for this project was downloaded from Kaggle. This
dataset contains high-resolution images of 120 categories of breeds and a total of
20580 numbers of images of dogs inside this dataset. This dataset has been built using
images and annotations from ImageNet for the task of fine-grained image
categorization. So there is an image folder and annotation folder where images of
different dog breed classes were placed in the respective folder inside the image
folder. Furthermore, inside the annotations file contains also different dog breeds
folder and each folder with a different XML file which has information of width,
height, depth, name, x value and y value of bounding boxes of the images. 

 
2. Data Preprocessing 
In data preprocessing, first, the bounding boxes of each image in different dog breed
folders are being found and cropped using the information given inside the annotation
file and the cropped image is saved into a new folder. Then, the transformation
pipeline is being defined for a train set, valid set, and test set images. The
transformation of the train set, valid set, and test set includes converge the image into
Pytorch tensor and also normalize the data into the mean of (0.485, 0.456, 0.406),
(0.229, 0.224, 0.225) which is an ImageNet standard value. Data augmentation
techniques like random rotation, color jitter, the center crop also being used in the
training set to increase the diversity of the data available. Datasets are then being
retrieved using ImageFolder function and split into the train, valid and test set using
random_split function. The defined transform is then being assigned to train set, valid
set and test set. Next, the transformed train, valid and test set were passed into
Dataloader function to iterate over the dataset with a batch size of 128, shuffle equals
to True and num_workers equals 4. Then iterator is being performed on the data
loader to check if the data loader output the correct dimension. 

3. Network Architecture for Dog Classification Model


We choose inception-v3 as our pre-trained model as it performs better for our dataset
compares to other pre-trained models. Then we will freeze the early layers except for
the fully connected layers. Then we modified the fully connected layer by add on an
extra classifier using nn.Sequential on the fully connected layer which then the final
output will be 120 classes and will be training on this classifier layer only.

4. Implementation details
4.1 Training network
The model will be trained in 20 epoch, Adam optimizer with a learning rate of
0.0005 and also evaluate the loss using nn.NLLLoss function. In each epoch, inputs
inside the batch of train and test data loader are thrown into the model to perform
output, then the loss is computed and prediction is being performed. Only the train
batch needed to perform backpropagation and update the value of w. Then, the
training loss, train accuracy, validation loss, and validation accuracy is being
calculated and displayed in each epoch and also stored in an array. The best validation
accuracy is being stored in a variable and display and the best model also being
deepcopy and stored into a variable. After training, information on the trained model
is being saved using torch.save into a .pt file.  

4.2 Testing network  


After that, the trained model will be evaluated by testing on test set which also
throws input into the model and predicts the image label, the accuracy of each class
and the overall predicted accuracy were being shown. 

5. Model Deployment
To test out if our model is good enough, new dog images will be used in this
project. First, the image is being processed by resized and normalized and shift the
color channel‘s dimension from the 3rd dimension to the 1st dimension as Pytorch
expects it to be in the 1st dimension. Then, the image is being converted from NumPy
to PyTorch tensor and unsqueeze the image by adding a batch size 1 in the first
argument so that it can be passed into the model or network. Then the probability of
the image is being calculated and then obtained the top 5 probabilities, labels and also
their classes. Finally, the result is being a plot by showing the predicted image above
and the results of predicted classes and probabilities are being shown using a
horizontal bar chart. 

EXPERIMENT AND EVALUATION


In this phase, 2 experiment have been conducted which is compare between different
pre-trained model and also fine tuning the hyper parameter to check if which model
and what hyper-parameter is best suitable in classifying dog images.

Experiment 1

In experiment 1, we will use different pre-trained CNN models on our


Stanford dog dataset. The accuracy of evaluating the test set using the trained model
is being compared and the highest accuracy pre-trained model will be chosen as the
model to be used in this project. So we will test on 3 type of pre-trained CNN models
which is Inception_v3, ResNet18, and GoogLeNet. All the data loaded into the model
are the same which is the batch size of 128 and the output of each of the models is
being modified according to our number of the breed to predict.

All the models are trained conditions of including pre-trained model and every
layer is being frozen except the fully connected layer, and extra activation function
relu, dropout, linear transformation, and logSoftmax are added into the fully
connected layer to further improve the classifier. Furthermore, the hyper-parameter
optimizer used is Adam optimizer with a learning rate of 0.0005.

N0. CNN Model Accuracy


1 Inception_v3 77.99%
2 ResNet18 74.05%
3 GoogLeNet 72.93%
Table 1.1

Figure 1.1 above shows the accuracy of the 3 different pre-trained models used with
Stanford dog dataset. Inception_v3 performed the best accuracy which is 77.99%
compare to other 2 models.
Experiment 2

For the second experiment, both Adam optimizer and Stochastic Gradient
Descent(SGD) optimizer are being used for the Inception_v3 model to see which
optimizer performs better for inception_v3 model, with again learning rate of 0.0005
for Adam Optimizer and learning rate of 0.0005 and momentum of 0.9 for SGD
optimizer.

CNN Model Accuracy with SGD Accuracy with Adam


optimizer optimizer
Inception_v3 63.80% 77.99%
Table 1.2

As according to Table 1.2, the results shows that Accuracy with using Adam
optimizer is higher compare to using SGD optimizer. So the Adam Optimizer is the
better hyper-parameter for Inception_v3 model.

Model and Hyper-parameter

So the final model and hyper-parameter used are Inception_v3 model, Adam
optimizer and learning rate of 0.0005. The model was trained with 20 epochs and the
training loss decrease throughout each epochs according to Figure 1.2 below.

Figure 1.2

Evaluation

In the evaluation part, accuracy for each dog breed which is target is being
display out as shown in Figure 1.3. As there are 120 type of dog breeds so Figure 1.3
only shows some of the dog breeds accuracy. The test set accuracy is 77.99% which is
quite good as the gap with training accuracy is not that big.

Figure 1.3

Figure 1.4 and Figure 1.5 below show the training and validation accuracy and loss in
a graph. The model didn’t overfit as training accuracy and validation accuracy both
perform good and the difference between is small.

Figure 1.4

Figure 1.5

CONCLUSION
In conclusion, classifying dog breeds using Stanford Dogs Dataset taken from Kaggle
is proposed in this project using CNN. Three different type of pre-trained model are
being tested and compared, we managed to achieve 77.99% accuracy with
Inception_v3. Then Inception_v3 model is further compared by change the hyper-
parameter using two optimizers, which are Adam optimizer and Stochastic Gradient
Descent(SGD) optimizer. As a result, Inception_v3 model works best with Adam
optimizer with a higher accuracy, and new images are imported to further to the test
of our model.

CITATION
Devikar, P., 2016. Transfer Learning for Image Classification of various dog
breeds. International Journal of Advanced Research in Computer Engineering &
Technology (IJARCET), 5(12), pp.2707-2715. [Accessed 5 April 2020].

Hsu, D., 2015. Using convolutional neural networks to classify dog breeds. CS231n:
Convolutional Neural Networks for Visual Recognition [course webpage], 2.
[Accessed 5 April 2020].

Dog Breed Classification. 2020. CS109 Final Project: Dog Superbreed Classification.


[online] Available at: <https://hljames.github.io/dog-breed-classification/?
fbclid=IwAR2UBcn5boDHxWiitVWLq7dd7zSzZmYw3cpvIow9EgMmKkCPs0DG
GJ40bKU> [Accessed 5 April 2020].

Kaggle.com. 2020. Stanford Dogs Dataset. [online] Available at:


<https://www.kaggle.com/jessicali9530/stanford-dogs-dataset> [Accessed 5 April
2020].

Medium. 2020. Deep Learning With Pytorch. [online] Available at:


<https://medium.com/@josh_2774/deep-learning-with-pytorch-9574e74d17ad>
[Accessed 5 April 2020].

You might also like