IET Computer Vision - 2018 - Hiary - Flower Classification Using Deep Convolutional Neural Networks
IET Computer Vision - 2018 - Hiary - Flower Classification Using Deep Convolutional Neural Networks
Research Article
Abstract: Flower classification is a challenging task due to the wide range of flower species, which have a similar shape,
appearance or surrounding objects such as leaves and grass. In this study, the authors propose a novel two-step deep learning
classifier to distinguish flowers of a wide range of species. First, the flower region is automatically segmented to allow
localisation of the minimum bounding box around it. The proposed flower segmentation approach is modelled as a binary
classifier in a fully convolutional network framework. Second, they build a robust convolutional neural network classifier to
distinguish the different flower types. They propose novel steps during the training stage to ensure robust, accurate and real-
time classification. They evaluate their method on three well known flower datasets. Their classification results exceed 97% on
all datasets, which are better than the state-of-the-art in this domain.
representation, which is further combined with visual the VGG-16 model [19] while the classification CNN is initialised
representations to form a hierarchical deep semantic model. In by segmentation FCN.
work related to ours, a CNN-based method to perform flower
classification was proposed in [43]. They used luminance and 3.1 Network initialisation via transfer-learned ImageNet
saliency map approaches to select the flower region. The method features
was evaluated on a flower dataset.
A convolutional fusion networks (CFNs) model to fuse multi- Although kernels in the CNNs can be initialised randomly, most
scale deep representations was proposed in [44]. This model adds deep learning methods utilise the existence of pre-trained models
more parameters to generate new side branches from the on large datasets such as ImageNet for initialising, i.e. transfer
intermediate layers, and learns adaptive weights for these branches. learning, their models. This helps train networks for problems with
However, the accuracy reported on flower classification is limited small numbers of training examples since many image
compared with our work and other published work such as [40]. classification applications share similar low-level features, e.g.
Chakraborti et al. in [45] proposed a collaborative representation- edges, blobs etc.
based classification (CRC) approach, which represents the image We initialise the proposed FCN from the VGG-16 model [19],
as a weighted collaboration of features over all classes. They which provided robust results for classifying images from the
extracted features using different descriptors including CNN-based ImageNet dataset. The trained FCN is then used to initialise the
features and used these features in the classification task. classification CNN. The VGG-16 model consists of five
An approach based on the CNN Inception model was proposed convolutional blocks followed by three fully connected layers.
in [46] for flower classification. The method was applied to the Each convolutional block consists of two or three convolutional
Oxford 17 and Oxford 102 datasets and achieved good results. A layers and ReLU. At the end of each convolutional block, a max
selective convolutional descriptor aggregation (SCDA) approach pooling layer is used to downsample the feature maps which makes
based on unsupervised fine-grained image retrieval in different the features translation and scale invariant. Fig. 2 shows a detailed
applications including flowers was proposed in [47]. No annotation description of the VGG-16 model alongside the parameters of the
was needed to cluster the objects as the proposed method relies on proposed FCN.
detecting the main object in an image to create deep descriptors for Although there are new published models such as [33, 34] that
image categorisation. Finally, a fine-grained recognition approach exceeded the VGG-16 ImageNet CA, we have chosen this model to
based on local parts and global discrimination CNN (LG-CNN) initialise our FCN and consequently the CNN model because it
was proposed in [48]. The method was applied to different sets better suits the flower classification task. Deeper models such as
including Oxford 102. The proposed CNN consists of two resNet [33] are generally too complex to handle this task because
networks with shared weights such that one network is focused on the number of parameters is an overkill. In fact, we show here how
the local parts of the input image while the second on the global we initialise our models by a reduced version of the VGG-16
geometry of the image. model with no compromise on accuracy.