0% found this document useful (0 votes)
39 views

IET Computer Vision - 2018 - Hiary - Flower Classification Using Deep Convolutional Neural Networks

This document describes a study that proposes a two-step classifier to distinguish between a wide range of flower species using deep convolutional neural networks. The first step automatically segments the flower region from the image to localize it. The second step builds a robust convolutional neural network classifier to distinguish different flower types. The proposed method achieves over 97% classification accuracy on three flower datasets, outperforming state-of-the-art methods.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

IET Computer Vision - 2018 - Hiary - Flower Classification Using Deep Convolutional Neural Networks

This document describes a study that proposes a two-step classifier to distinguish between a wide range of flower species using deep convolutional neural networks. The first step automatically segments the flower region from the image to localize it. The second step builds a robust convolutional neural network classifier to distinguish different flower types. The proposed method achieves over 97% classification accuracy on three flower datasets, outperforming state-of-the-art methods.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

IET Computer Vision

Research Article

Flower classification using deep ISSN 1751-9632


Received on 12th March 2017

convolutional neural networks


Revised 28th March 2018
Accepted on 10th April 2018
E-First on 10th May 2018
doi: 10.1049/iet-cvi.2017.0155
www.ietdl.org

Hazem Hiary1 , Heba Saadeh1, Maha Saadeh1, Mohammad Yaqub2


1Computer Science Department, The University of Jordan, Amman, Jordan
2Department of Engineering Science, Institute of Biomedical Engineering, University of Oxford, Oxford, UK
E-mail: [email protected]

Abstract: Flower classification is a challenging task due to the wide range of flower species, which have a similar shape,
appearance or surrounding objects such as leaves and grass. In this study, the authors propose a novel two-step deep learning
classifier to distinguish flowers of a wide range of species. First, the flower region is automatically segmented to allow
localisation of the minimum bounding box around it. The proposed flower segmentation approach is modelled as a binary
classifier in a fully convolutional network framework. Second, they build a robust convolutional neural network classifier to
distinguish the different flower types. They propose novel steps during the training stage to ensure robust, accurate and real-
time classification. They evaluate their method on three well known flower datasets. Their classification results exceed 97% on
all datasets, which are better than the state-of-the-art in this domain.

1 Introduction Deep learning techniques, especially convolutional neural


networks (CNNs), have recently gained wide interest due to
Unlike simple object classification such as distinguishing cats from superior accuracy compared with classical machine learning
dogs, flower recognition and classification is a challenging task methods, which rely on hand-crafted features. In addition, the
due to the wide range of flower classes that share similar features: advance of hardware capabilities, particularly with the use of
several flowers from different types share similar colour, shape and graphics processing units (GPUs), sped up the processing time of
appearance. Furthermore, images of different flowers usually deep learning techniques significantly [18, 19].
contain similar surrounding objects such as leaves, grass etc. There In this work, we show how we utilise recent development of
are more than 250,000 known species of flowering plants classified deep learning methods such as CNN alongside the existence of
into about 350 families [1]. A wide range of various applications reasonable size flower datasets to tackle the flower classification
including content-based image retrieval for flower representation task robustly. Our automatic method detects the region around the
and indexing [2], plants monitoring systems, floriculture industry flower in an image, and then uses the cropped images to learn a
[3], live plant identification and educational resources on flower strong CNN classifier to distinguish different flower classes. The
taxonomy [4] depend on successful flower classification. Manual detection is performed by finding the minimum bounding box
classification is possible but time consuming and tedious to use around an automatically segmented flower. The segmentation is
with a large number of images and potentially erroneous in some achieved as a binary classification task within a fully convolutional
flower classes especially when the image background is complex. network (FCN) [20] framework. Our robust method is evaluated on
Thus, robust techniques of flower segmentation, detection and different known flower datasets and results show that the proposed
classification have great value. technique achieves at least 97% classification accuracy (CA) on all
Conventional flower classification techniques use a datasets.
combination of features extracted from the flower images with the The rest of this paper is organised as follows: in Section 2 we
aim of improving classification performance [5–7]. Colour, texture, present the background and related work. Section 3 presents the
shape and some statistical information are among the main sources proposed method. The experimental setup is described in Section 4,
of features that are widely used to identify the different flower followed by results and comparisons in Section 5. We then
species [5, 8–10]. Some methods rely on human interaction to conclude our work in Section 6.
further enhance the classification results [7, 11, 12]. In addition,
support vector machines (SVMs) are among the most commonly
used types of classifiers [5, 13, 14]. Many flower classification 2 Related work
techniques rely on learning their features from a segmented flower In this section, we describe CNN and its application in image
region to improve accuracy [5, 15–17]. classification and segmentation. We then present related work
Hand-crafted traditional discriminative features that can be used which addresses the flower segmentation and classification task.
in a classification task such as histogram of oriented gradients We generally split the techniques to deep learning and non-deep
(HOGs), scale-invariant feature transform (SIFT), speeded up learning-based techniques.
robust features etc. cannot be easily applied to the flower
classification problem due to the problem complexity as well as the 2.1 CNN for image classification and segmentation
numerous flower classes. In addition, the robustness of a flower
classification technique applied to one flower dataset is not A CNN consists of a number of convolutional and subsampling
guaranteed on a different flower dataset. This is mainly because layers optionally followed by fully connected layers. For the sake
conventional methods rely heavily on specific hand-made features, of this work, we focus on two-dimensional (2D) CNNs which
which might not be generalisable to other flower images or similar typically work on 2D images; though 1D or higher-dimensional
flower images with different conditions such as change of CNNs have similar concepts.
lightning, flower pose or variation of surrounding objects. The input to a convolutional layer is an (r × c × n) image I,
where r is the number of rows, c is the number of columns and n is

IET Comput. Vis., 2018, Vol. 12 Iss. 6, pp. 855-862 855


© The Institution of Engineering and Technology 2018
17519640, 2018, 6, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/iet-cvi.2017.0155 by SEA ORCHID (Thailand), Wiley Online Library on [08/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
the number of channels. The convolutional layer is meant to learn model and boundary shape of flowers. They also extracted colour
K filters (or kernels) of size (kr × kc × kn). and shape features from flower centre area. However, this method
In addition, padding the input image with p (pr, pc) pixels requires manual user interaction.
permits convolution for pixels at the border of the image. p is Other approaches in flower classification have been proposed
typically set as half of the kernel size (kr /2, kc /2). Furthermore, a such as pairwise rotation invariant co-occurrence local binary
stride value s defines the kernel movement over the image. After pattern (PRICoLBP) [16]; metric forests with GMM [25];
the convolution of an input image with K kernels, the resultant K generalised max pooling (GMP) with FV and power normalisation
feature maps have the following size: [26]; visual adjectives (VAs) with SIFT and improved FV [27];
saliency driven image multi-scale nonlinear diffusion filtering [17];
r − kr + 2 × pr heterogeneous co-occurrence features [28]; generalised hierarchical
Mrk = +1 (1) matching (GHM) with saliency map (LocSaliency) [14]; contextual
sr
exemplar classifier (CEC) [29]; Fisher discrimination dictionary
learning (FDDL) with frequent local histograms (FLHs) [8]; grid-
where Mrk is the number of rows in the kth feature map. The specific bag-of-FLH (GRID-FLH) [30]; colour attention-based
number of columns in a feature map Mck is similarly derived. A bag-of-words [9]; Harr-like transformation of local features [31];
non-linearity transformation, e.g. rectified linear unit (ReLU), is and graph-regularised robust late fusion (GRLF) [32]. All the
typically applied to all feature maps after the convolutional layer to aforementioned methods rely on hand-crafted features, which use
speed up the training process [18]. Moreover, each feature map is classical classifiers such as SVM.
then downsampled typically with a pooling step which reduces the On the other hand, deep learning techniques, especially CNN-
size of the feature maps and allows the next convolutional layer to based, were proposed to tackle the flower classification task. CNNs
work on a larger receptive field compared with the first one. This have recently gained a lot of interest in solving several learning
helps the network learn features at multiple scales. The generated problems due to superior accuracy compared with classical
feature maps after the first convolutional, non-linearity and pooling methods. They have been recently used in several natural image
layers are then passed as input to the next block of layers to classification tasks [18, 19, 33, 34].
compute the next set of feature maps and so on. Optionally, fully There are a handful of works in the literature which use CNN to
connected layers can be used at the end of the CNN to determine address the flower classification problem [10, 35–48]. For instance,
which features most correlate to a particular class. The output of the work in [35] approached the problem using a two-level
the last layer is an N-dimensional vector, where N is the number of hierarchical feature learning (HFL) that used a deep CNN. They
classes in a given problem. first used a transfer-learning method HFL to initialise a pre-trained
During the training stage, a loss function such as the mean deep CNN model for the new target dataset. The deep feature
square error in 2 is used to compute the difference between the extractors at different levels were then trained. This method
actual (y) and predicted (y~) labels effectively increases the CA in comparison with other classification
methods.
1 A combined online nearest-neighbour estimation (ONE)
y − y~
2∑
(2)
2
err = algorithm was proposed for both image classification and retrieval
[36]. Manual object definition, regional description and nearest-
The use of the CNN is expanded to allow image segmentation and neighbour search of extracted CNN features were involved in this
object detection. Image segmentation using CNN can be performed algorithm by computing similarity between the query and each
using a concept called FCN for semantic segmentation [20]. In category or image candidate. Results show state-of-the-art
addition, methods have been proposed to allow the CNN to do accuracy in a wide range of image classification and retrieval
object detection such as region proposals with CNN (R-CNN) [21], datasets with reasonable computational overheads. The work in
fast R-CNN [22], faster R-CNN [23] and YOLO [24]. Overall, [37] addressed different recognition tasks including flower
these techniques and the FCN method provide similar results on classification using standard CNN representation called OverFeat.
benchmarked models such as AlexNet [18] and VGG-16 [19]. In The experimental study shows significant results in the different
this paper, we focus on the FCN model to segment and then detect classification tasks on various datasets.
the flower region mainly because it can be easily reused in the Xie et al. in [10], on the other hand, proposed reversal-invariant
classification model as described in Section 3. deep features (RI-Deep), and RI convolution (RI-Conv) layers to
FCN can be considered as a special type of CNN in which de- increase the CNN capacity without affecting the model complexity.
convolutional layer(s) can be used to upsample and fuse the feature On various image classification tasks, this approach shows an
maps from some convolutional layer(s) such that a segmentation improvement in CA including scene understanding, fine-grained
mask can be learnt. Typically, the segmentation mask is the same object recognition, and large-scale visual recognition. Qian et al. in
size as the input image which provides a pixel-wise classification [38] proposed an approach to extract deep convolutional activation
for each pixel. features (DeCAF) to use with a k-NN classifier. Their empirical
study shows that the proposed method can yield an improved
accuracy and performance compared with state-of-the-art
2.2 Flower segmentation and classification approaches.
Various approaches have been proposed to classify flower images. A task-driven pooling (TDP) model to learn pooled
The majority of researchers have used machine learning-based representation implicitly from data was presented in [39]. TDP was
methods. For instance, the work in [5] segmented and classified used to replace average or max pooling in CNN models to achieve
flowers using SVM and multiple kernel learning. They extracted a better pooled representation. The proposed method was extended
features from SIFT, HOG and the hue, saturation and value (HSV) to multi-task (mTDP) classification to maximise the accuracy on a
colour model. This work has later been improved in [15] and then flower dataset. In different work, guidelines on how to properly
further advanced in [13]. In [15], they used the concept of bi-level transfer CNN features to solve a specific task were discussed in
co-segmentation (BiCoS) and BiCoS-multi-task (BiCoS-MT) in an [40]. Their evaluation showed state-of-the-art improvement on
SVM classifier, whereas in [13] they used tri-level CoS (TriCoS) to different datasets including a flower dataset.
tackle the flower segmentation and classification; an SVM model Recently, a method to speed up the computational time of the
was used with SIFT, Lab colour model, principal component CNN forward and backward propagation steps using winner takes
analysis, Fisher vector (FV) and Gaussian mixture model (GMM). all (WTA) hashing was described in [41]. In a different approach, a
A user-interactive method computer-assisted visual interactive hierarchical deep semantic representation (H-DSR) which
recognition (CAVIAR) was proposed in [7] which extracts shape combines semantic context modelling with visual features was
features from a rose curve model, and hue and saturation colour proposed in [42]. Deep CNN features were extracted from spatially
moments. A classification approach is proposed in [11] using fixed image grids to detect a response map using pre-learned
weighted Euclidean distance with features from the HSV colour classifiers. The response map was then used to extract semantic

856 IET Comput. Vis., 2018, Vol. 12 Iss. 6, pp. 855-862


© The Institution of Engineering and Technology 2018
17519640, 2018, 6, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/iet-cvi.2017.0155 by SEA ORCHID (Thailand), Wiley Online Library on [08/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Fig. 1 Flow diagram of the proposed flower segmentation and classification methods

representation, which is further combined with visual the VGG-16 model [19] while the classification CNN is initialised
representations to form a hierarchical deep semantic model. In by segmentation FCN.
work related to ours, a CNN-based method to perform flower
classification was proposed in [43]. They used luminance and 3.1 Network initialisation via transfer-learned ImageNet
saliency map approaches to select the flower region. The method features
was evaluated on a flower dataset.
A convolutional fusion networks (CFNs) model to fuse multi- Although kernels in the CNNs can be initialised randomly, most
scale deep representations was proposed in [44]. This model adds deep learning methods utilise the existence of pre-trained models
more parameters to generate new side branches from the on large datasets such as ImageNet for initialising, i.e. transfer
intermediate layers, and learns adaptive weights for these branches. learning, their models. This helps train networks for problems with
However, the accuracy reported on flower classification is limited small numbers of training examples since many image
compared with our work and other published work such as [40]. classification applications share similar low-level features, e.g.
Chakraborti et al. in [45] proposed a collaborative representation- edges, blobs etc.
based classification (CRC) approach, which represents the image We initialise the proposed FCN from the VGG-16 model [19],
as a weighted collaboration of features over all classes. They which provided robust results for classifying images from the
extracted features using different descriptors including CNN-based ImageNet dataset. The trained FCN is then used to initialise the
features and used these features in the classification task. classification CNN. The VGG-16 model consists of five
An approach based on the CNN Inception model was proposed convolutional blocks followed by three fully connected layers.
in [46] for flower classification. The method was applied to the Each convolutional block consists of two or three convolutional
Oxford 17 and Oxford 102 datasets and achieved good results. A layers and ReLU. At the end of each convolutional block, a max
selective convolutional descriptor aggregation (SCDA) approach pooling layer is used to downsample the feature maps which makes
based on unsupervised fine-grained image retrieval in different the features translation and scale invariant. Fig. 2 shows a detailed
applications including flowers was proposed in [47]. No annotation description of the VGG-16 model alongside the parameters of the
was needed to cluster the objects as the proposed method relies on proposed FCN.
detecting the main object in an image to create deep descriptors for Although there are new published models such as [33, 34] that
image categorisation. Finally, a fine-grained recognition approach exceeded the VGG-16 ImageNet CA, we have chosen this model to
based on local parts and global discrimination CNN (LG-CNN) initialise our FCN and consequently the CNN model because it
was proposed in [48]. The method was applied to different sets better suits the flower classification task. Deeper models such as
including Oxford 102. The proposed CNN consists of two resNet [33] are generally too complex to handle this task because
networks with shared weights such that one network is focused on the number of parameters is an overkill. In fact, we show here how
the local parts of the input image while the second on the global we initialise our models by a reduced version of the VGG-16
geometry of the image. model with no compromise on accuracy.

3 Proposed method 3.2 FCN for semantic flower segmentation


We propose a two-step approach for the flower classification Flower images usually contain wide surrounding clutter, which
problem. The first step localises the flower by detecting the makes the problem of automatic flower classification challenging.
minimum bounding box around it. The localisation is performed by Therefore, we propose an automatic step which allows the
segmenting the flower region using an FCN method [20]. The detection of the flower within the image by segmenting the flower
second step learns a CNN to accurately classify the different flower region only using FCN [20]. We formulate the segmentation task as
classes. Fig. 1 shows the overall framework for the proposed a binary classification problem; i.e. 0 for background and 1 for the
method. Here, we show how the segmentation FCN is initialised by flower region(s).

IET Comput. Vis., 2018, Vol. 12 Iss. 6, pp. 855-862 857


© The Institution of Engineering and Technology 2018
17519640, 2018, 6, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/iet-cvi.2017.0155 by SEA ORCHID (Thailand), Wiley Online Library on [08/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Our proposed FCN consists of several convolutional layers and kr /2 kc /2
three de-convolutional layers. The network is initialised by the first f i, j = ∑ ∑ ka, b × Ii + a, j + b (3)
five blocks from the VGG-16 model (Fig. 2). Each convolutional a = − kr /2 b = − kc /2
layer learns K kernels and produces K feature maps by sliding each
2D kernel over the input image from the previous layer [49]. The The three de-convolutional layers upsample the feature maps
2D feature map value at the position (i, j) is computed as generated from the fifth block gradually. The first and second de-
convolutional layers use a stride of two while the final de-
convolutional layer uses a stride of eight. This is usually referred to
as FCN-8s [20]. We use bi-linear interpolation upsampling to
ensure smooth reconstruction of edges of the flower. In addition,
the (2, 2, 8) upsampling also allows fine-grained interpolation
which permits the segmentation of fine-detailed structures. Note
that our model size is ∼80% smaller than the VGG-16 model
because we dropped the fully connected layers from the model,
which are shown as blocks with a dashed border in Fig. 2.
We use backpropagation to train the FCN. However, we initially
fix the kernels on blocks 1 and 2 (i.e. use the ImageNet kernels)
and only let the model learn the kernels on blocks 3–5. This allows
it to learn the mid-to-global feature maps without optimising the
low-level kernels. When the validation accuracy saturates, we stop
training and then restart again starting from the last learned model
to let the FCN learn the kernels in the first two blocks. This permits
the model to learn local features. We found experimentally that this
process improves segmentation accuracy compared with learning
all kernels in one step. In addition, due to having a small dataset,
we augment the images during training by allowing small rotation,
horizontal flipping, random small cropping of the image border, or
any combination of these transformations. This helps create a more
robust FCN model and avoids overfitting.
During testing an unseen flower image, the output of the model
is a mask of the same size, as shown in Fig. 3b. Before the masked
flower is fed to the next step, i.e. flower classification, we perform
two pre-processing steps. First, we find the largest-connected
component in the segmentation mask, as in the white region in Fig.
3b, to keep the largest segmented flower region. This is important
only when multiple flowers exist in one image; keeping only one
flower region is sufficient and possibly less confusing for the
classification task. Second, we use the minimum bounding box
around the largest-connected component (red box in Fig. 3c) to
crop the original flower image while keeping the objects near the
flower (as in Fig. 3d). These objects are mostly leaves, and it turns
out that keeping them in the cropped image provides discriminative
features when training the flower classifier since they retain
important context for the flower.
Fig. 2 Proposed FCN model and its detailed parameters shown in the
boxes with solid border. The blocks with dashed border show the VGG-16
3.3 CNNs for flower classification
blocks which we excluded After generating cropped flower images, the task is simplified
since the highly discriminative regions are mainly kept while other
possible misleading regions are removed. In this work, we address
the flower classification problem as a multi-class CNN
classification of N classes. The problem is simply formulated as a
function F which predicts the class c of an image x such as
c = F(x).
We propose a CNN which initialises its first five blocks from
the FCN model which was already initialised by the VGG-16
model. However, instead of using 3 fully connected layers in
blocks 6–8 (recall Fig. 2), we use 3 convolutional layers with 512
feature maps. The kernel size of the convolutional layer in block 6
is 7 × 7, while the number of output parameters from the
convolutional layer in block 8 is N.
We use a multi-class Softmax loss function as a measure of the
quality of a particular set of parameters based on how well the
predicted outcomes match the ground truth labels in the training
data. Softmax computes the probabilistic distribution over N
different possible outcomes. We also use stochastic gradient
descent (SGD) to optimise and update the set of parameters aiming
Fig. 3 FCN for flower segmentation to minimise the loss function. SGD and Softmax loss are
(a) Original image, (b) Mask of automatic segmentation (white region is the largest- commonly used in other CNN-based applications such as [18, 19,
connected component and red regions are the small regions, which are ignored during 33, 34]. Our loss function takes as input an N-dimensional vector X
cropping the images), (c) Masked image that shows the minimum bounding box and outputs an N-dimensional vector Y of real values between 0
around the largest segmented flower region, (d) Cropped image region which is used and 1. This function is a normalised exponential and is defined as
in the classification step

858 IET Comput. Vis., 2018, Vol. 12 Iss. 6, pp. 855-862


© The Institution of Engineering and Technology 2018
17519640, 2018, 6, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/iet-cvi.2017.0155 by SEA ORCHID (Thailand), Wiley Online Library on [08/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
eX j
σ(X) j = N (4)
∑ e Xn
n=1

where j = 1, …, N. We have noted that the loss function was not


performing well initially during CNN parameter optimisation.
There are several possible reasons for this but it is mainly due to
the complexity of the multi-class classification problem compared
with the learnt weights in the binary segmentation FCN. In
addition, the function we are learning is not convex, not smooth
and has many local minima with flat regions. Therefore, we
propose two novel steps to improve the convergence of the
algorithm.
First, since we have more convolutional layers in the CNN than
the FCN, we propose to learn kernel parameters in a three-step
approach. First, we let the CNN learn the kernels in the
convolutional layers at blocks 6–8 while fixing the first five blocks.
We then allow the CNN to learn the parameters in blocks 3–5. Fig. 4 Sample images from the three datasets to show the complex
Finally, we let all parameters from all blocks be learned variability
simultaneously. This provides better convergence for the CNN. (a), (b) Zou–Nagy, (c) Oxford 17 (cropped for better visualisation), (d) Oxford 102
Second, because of the existence of a large number of flat local
minima, the optimiser is prevented from reaching a good solution. For flower segmentation, we use pixel overlap score which is
Therefore, it is important to allow the optimiser in some scenarios also known as intersection over union (IoU). IoU measures the
to restart its search while finding a good minimum. To address this percentage overlap of the intersected manual and automatic
issue, we propose a multi-step training approach during which we segmentation over their union 5. We only measure the overlap for
force the learning rate to decrease in each step and then make a the foreground object, i.e. the flower; and ignore background
sudden large increase. The increase in the learning rate allows the pixels. Overlap score has been used in some flower segmentation
optimiser to ‘restart’ itself to allow searching for other nearby methods such as [51, 52]. The overlap value ranges from 0 to 1
solutions. More details about different approaches to restart SGD such that the higher the value the more accurate the segmentation
are described in [50].
In addition, thanks to the flower detection step described in IoU =
manual ∩ automatic
(5)
Section 3.2, a wider range of augmentation can be used. For manual ∪ automatic
instance, a larger range of rotation angles and vertical flipping are
used here than in the FCN model because the image is already where . is the cardinality of a set. Since our classification
cropped around the flower and large rotation does not affect the method is not sensitive to very accurate segmentation because it
overall appearance of the whole cropped image. However, relies on a rough detection of the flower region, we evaluated the
performing a large rotation on the whole (non-cropped) flower accuracy of flower detection. We propose a box overlap metric
image may create a completely unrealistic image. Finally, with between the minimum bounding boxes around the manual and
possible augmentation, the generated CNN can be more robust to a automatic segmentations. We compute the box IoU (BIoU), which
wider range of transformations especially object rotation. measures the box overlap between the manual and detected boxes.
In addition, to decide the most acceptable threshold for BIoU, i.e.
4 Experimental setup the IoU threshold above which the two boxes are considered
4.1 Datasets overlapped enough, we find Boverlap th
6 which computes the
percentage of images having BIoU greater or equal to an IoU
Three datasets are used to test the proposed method; the Oxford threshold th. The value of th is varied between 0 (no overlap) and 1
102 [5], the Oxford 17 [6] and Zou–Nagy [7]. Oxford 102 and (complete overlap)
Oxford 17 are two publicly available sets of flowers that have been
widely used. The images have large-scale, pose and light images: BIoU ≥ th
variations. The former set contains 8189 images from 102 flower
th
Boverlap = (6)
images
categories with 40–258 images per category of various image sizes,
while the latter consists of 1360 flower images from 17 categories, CA (acc) is measured as the number of correctly classified images
with 80 images in each category of various image sizes. Some over the total number of images such as
flowers from the Oxford 17 are part of the Oxford 102. The third
dataset which was compiled by Zou and Nagy consists of 612 images: predicted class = manual class
flower images from 102 categories. Each category consists of six acc = (7)
images
images and each image size is 300 × 240 pixels.
The variability of flower appearance, pose, zoom and To understand the importance of the data augmentation step, we
surrounding objects is large in the Oxford images compared with report the results of the evaluation metrics with and without data
Zou–Nagy. Flower images in the latter were consistently taken augmentation.
from a specific range of camera angle and distance. Therefore, this Finally, we perform cross-fold validation to ensure that our
allows the flower images in this dataset to be more consistent and reported result is complete. Owing to the difference in the number
easier to distinguish (Figs. 4a and b show some random but of images in each dataset, we use three-fold cross-validation for the
representative examples). On the other hand, Oxford flower images Oxford 17 and Nagy datasets and five-fold cross-validation for the
have a greater variation as shown in Figs. 4c and d. Oxford 102 dataset.

4.2 Evaluation metrics 4.3 Implementation details


We propose several metrics to provide an insight into the accuracy In the segmentation FCN and classification CNN, we resize all
of segmentation, detection and classification methods. Moreover, images to 224 × 224 × 3 to provide a unified and normalised set of
our accuracy measures allow us to do a direct comparison with images to pass through the networks. This also allows faster
other published results. computation of convolutions and pooling [19]. All kernel sizes at
the first five convolutional blocks in the FCN and CNN were 3 × 3

IET Comput. Vis., 2018, Vol. 12 Iss. 6, pp. 855-862 859


© The Institution of Engineering and Technology 2018
17519640, 2018, 6, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/iet-cvi.2017.0155 by SEA ORCHID (Thailand), Wiley Online Library on [08/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Table 1 Overall mean (μ) and standard deviation (σ) of
different flower segmentation (IoU) and detection (BIoU)
results on the different flower datasets, with and without data
augmentation
Dataset Augmentation Segmentation Detection
(Y/N) μ ± σ IoU, % μ ± σ BIoU, %
Oxford 102 N 73.6 ± 15.1 81.0 ± 16.4
Y 80.3 ± 14.7 85.9 ± 15.5
Oxford 17 N 70.1 ± 11.2 78.7 ± 12.8
Y 79.7 ± 6.5 82.2 ± 11.5
Zou–Nagy N 71.4 ± 16.6 77.3 ± 16.7
Y 79.0 ± 17.2 81.3 ± 19.3

Fig. 6 Segmentation and detection example


(a) Original image, (b) Manual segmentation, (c) Automatic segmentation, (d)
Minimum bounding box of (b), (e) Minimum bounding box of (c)

Table 2 Flower CA on Zou–Nagy dataset


Method Segmentation (Y/N) CA, %
FCN–CNN w/augmentation Y 99.0
(proposed)
FCN–CNN w/o augmentation Y 95.4
CNN only N 96.1
CAVIAR [7] Y 93.0
Hsu et al. [11] Y 77.8
Fig. 5 Box detection accuracy at different thresholds Saitoh et al. [54] Y 65.5

to provide fine-detailed features at multiple scales as suggested by


Simonyan and Zisserman [19] and Szegedy et al. [34]. In the evaluation. Fig. 6 shows an example case where the detection is
classification CNN, a 7 × 7 kernel size is used at blocks 6 and 7 to accurate though the segmentation accuracy is not perfect. This
generate 1 × K feature maps which are then mapped at block 8 to suggests that our detection and consequently the classification
1 × N feature vector to represent the probabilistic values for the N methods are generally not sensitive to partial segmentation errors.
classes. The value of N is 17 in the Oxford 17 dataset, and 102 in
both Oxford 102 and Zou–Nagy datasets. 5.2 Flower classification
All our implementation code is written in C++ and uses the
Caffe deep learning framework [53]. Training was performed on a Our CNN classifiers provide excellent results on all datasets. We
GTX Titan X graphics processing unit (GPU) with 12 GB while achieved a CA of 99.0, 98.5 and 97.1% on Zou–Nagy, Oxford 17
testing was performed on a GPU and central processing unit (CPU) and Oxford 102 respectively. Tables 2–4 show the accuracy we
to report performance measures. Training an FCN model varies achieved on the different datasets alongside recent state-of-the-art
between 4 and 8 h depending on dataset size while training a results from other groups. For each entry, we report if a
classification CNN ranges between 16 and 36 h. Testing an unseen segmentation step is used first to localise the flower region before
image on the FCN takes approximately the same time to do the the classification takes place. We also show results with and
classification CNN. Overall, the system processes ∼15 unseen without data augmentation to demonstrate the importance of this
images in 1 s on the GPU and 2 s per image on the CPU (Intel® step.
Core i7 4 GHz). To demonstrate the effect of the flower detection step on the
accuracy of the proposed method, we report the result of flower
classification using CNN with no segmentation step in the Tables.
5 Results
The classification CNN ‘CNN only’ has been trained and tested on
5.1 Flower segmentation and detection the original images and the method shows reasonable results.
However, the accuracy of the proposed FCN–CNN outperforms the
Experiments were conducted over the three datasets. Table 1 shows
‘CNN Only’ method. Furthermore, having a flower segmentation
the mean ± standard deviation of the segmentation IoU and the
and detection step is more important in classifying flower images
detection accuracy (BIoU), with and without the data augmentation from the Oxford datasets than the Zou–Nagy dataset; it improves
step. The accuracy has improved with this step by an average of the accuracy in the former dataset by 7%, and 3% in the latter.
7.5% in segmentation and 4.1% in detection. CA has improved in all datasets when using data augmentation
We show in Fig. 5 box overlap (Boverlap) accuracy at different as demonstrated in Tables 2–4. The improvement in CA is 3.6, 4.5
thresholds on the different datasets, which show no substantial and 2.8% in Zou–Nagy, Oxford 17 and Oxford 102, respectively.
difference between the datasets. We achieve 95% if we consider Oxford 102 has the least improvement because it is the dataset with
50% threshold while a stricter threshold such as 80% achieves the largest number of images per class. Oxford 17 is the dataset
81%. No image has had any BIoU < 10% which means that our which benefits the most though Zou–Nagy has fewer images per
segmentation model will always locate part of the flower in all class. This could have happened because the variability in the
images we used. Therefore, because of the robustness of the FCN, Oxford 17 dataset is much larger and hence data augmentation
all images were used in the next step, i.e. the classification CNN. helps make the classification of this dataset more robust.
Our mean IoU segmentation accuracy is ∼80% and is consistent The proposed method fails in a few cases. The main reasons
with the different datasets. Other published flower segmentation might be due to incorrect manual annotation, close similarity in
methods tested their work on Oxford 17 only such as [54–57] and appearance of different flower classes, and the large difference in
presented a larger mean IoU. On the other hand, the work in [52] flower appearance compared with other images from the same
achieved larger mean IoU but they dropped the most complex four class. Fig. 7 shows two images from two classes, where one image
flower classes in their testing. Furthermore, we have experimented is correctly classified while the other fails. It is clear that images
on three datasets and all images have been included in our

860 IET Comput. Vis., 2018, Vol. 12 Iss. 6, pp. 855-862


© The Institution of Engineering and Technology 2018
17519640, 2018, 6, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/iet-cvi.2017.0155 by SEA ORCHID (Thailand), Wiley Online Library on [08/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Table 3 Flower CA on Oxford 17 dataset
Method Segmentation (Y/N) CA, %
FCN–CNN w/augmentation Y 98.5
(proposed)
FCN–CNN w/o augmentation Y 93.8
CNN only N 91.4
Nilsback and Zisserman [5] Y 88.33
FDDL–FLH [8] Y 97.8
colour attention [9] N 95.0
GHM LocSaliency [14] N 93.5
BiCoS [15] Y 91.1
multi-scale fusion [17] Y 91.39
heterogeneous co-occurrence N 94.19
features [28]
CEC [29] N 93.7
GRID-FLH [30] N 94.0
Harr-like transformation [31] N 91.87
GRLF [32] N 91.7
mTDP [39] N 94.8
H-DSR [42] N 87.1
inception-v3 [46] N 95.0
Fig. 7 Example of correctly and incorrectly classified images
(a), (b) Same flower class, (c), (d) Another class, (a),(c) are correctly classified while
Table 4 Flower CA on Oxford 102 dataset (b), (d) are misclassified. (c) Cropped for better visualisation
Method Segmentation (Y/N) CA, %
FCN–CNN w/augmentation Y 97.1 results compared with other approaches can be summarised as
(proposed) follows: first, the use of CNN allows a more robust classifier
FCN–CNN w/o augmentation Y 94.3 because it allows learning better features compared with hand-
crafted features used in classical approaches. Second, localisation
CNN only N 90.6
of the flower simplifies the classification task which means that a
Nilsback and Zisserman [5] Y 72.8 two-step approach is better than a one-step classification in such
CNN–RI-Deep [10] N 94.01 applications. Third, the transferred weights from the pre-trained
TriCoS [13] Y 85.2 model such as VGG-16 and consequently from the segmentation
PRICoLBP [16] Y 84.2 FCN to the classification CNN allows faster convergence and a
metric forests [25] N 93.51 more accurate solution when optimising the weights. Fourth,
GMP [26] N 84.6
gradual CNN learning and avoiding local minima provide a
progressive learning of the classification CNN via (i) learning low-
VA [27] N 86.31
level, mid-level and high-level layers independently, then
CNN–HFL [35] N 83.35 optimising all layers and (ii) an automatic restart of the optimiser
ONE–SVM [36] N 86.82 by suddenly increasing the learning rate a few times during
CNNaug–SVM [37] N 86.8 training. Finally, the proposed data augmentation step makes the
msML +  [38] N 89.45 CNN more robust, as demonstrated by the results. This step
Zheng et al. [40] N 95.6 improves the CNN classification because it adds rotation-aware
information to the CNN and it allows robust learning when the
WTA [41] N 83.2
variability of flower shape, pose and appearance is huge.
Liu et al. [43] N 84.0 We developed a good binary segmentation method for the
CFN [44] N 82.6 flower region, though the main aim of this work is to propose an
Pro-CRC [45] N 94.8 accurate and robust classification method. The CA approaches
inception-v3 [46] N 94.0 perfection on the three datasets. The proposed method is very
SCDA [47] N 92.1 accurate and only 168 out of more than 10,000 images were
LG-CNN [48] Y 96.6 misclassified from all datasets. Finally, although we show the
applicability of the proposed method on the flower classification
problem, our method can be applied to other applications, which
share similar challenges with flower classification. In addition, our
from the same class could vary significantly in appearance, shape
proposed method might be suitable for use in applications which
and pose.
allow sharing, annotating and organising meaningful content in
images such as Visipedia [58].
6 Conclusion
A deep learning-based method to segment, detect and classify 7 Acknowledgments
flower images is presented in this paper. Novel ideas are
We thank the research groups (Oxford University VGG group and
demonstrated in this work which make the method robust and
Rensselaer Polytechnic Institute) who provided the datasets and the
successful on a variety of datasets. Unlike other methods which
manual ground truth. Dr. Mohammad Yaqub is funded by the
rely on hand-crafted features, the proposed method learns the most
Innovate UK (Project 101684) and the UK Engineering and
discriminative features within a deep learning framework.
Physical Sciences Research Council (EP/L505316/1).
Segmentation and detection of the minimal flower region allow for
a more accurate classification because it allows the classification
CNN to focus on the region of interest while excluding non-
discriminative regions.
To our knowledge, this work demonstrates the best flower CA
to date. The main contributions which helped in achieving superior

IET Comput. Vis., 2018, Vol. 12 Iss. 6, pp. 855-862 861


© The Institution of Engineering and Technology 2018
17519640, 2018, 6, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/iet-cvi.2017.0155 by SEA ORCHID (Thailand), Wiley Online Library on [08/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
8 References [30] Fernando, B., Fromont, E., Tuytelaars, T.: ‘Mining mid-level features for
image classification’, Int. J. Comput. Vis., 2014, 108, (3), pp. 186–203
[1] Kenrick, P.: ‘Botany: the family tree flowers’, Nature, 1999, 402, (6760), pp. [31] Zhang, C., Liu, J., Liang, C., et al.: ‘Image classification using Haar-like
358–359 transformation of local features with coding residuals’, Signal Process., 2013,
[2] Das, M., Manmatha, R., Riseman, E.: ‘Indexing flower patent images using 93, (8), pp. 2111–2118
domain knowledge’, IEEE Intell. Syst. Appl., 1999, 14, (5), pp. 24–33 [32] Ye, G., Liu, D., Jhuo, I., et al.: ‘Robust late fusion with rank minimization’.
[3] Larson, R. (Ed.): ‘Introduction to floriculture’ (Academic Press, San Diego, Proc. IEEE Conf. Computer Vision and Pattern Recognition, Providence, RI,
CA, USA, 1992, 2nd edn.) June 2012, pp. 3021–3028
[4] Chi, Z.: ‘Data management for live plant identification’, in Feng, D.,, Siu, [33] He, K., Zhang, X., Ren, S., et al.: ‘Deep residual learning for image
W.C.,, Zhang, H.J. (ED.): ‘Mutimedia information retrieval and Management’ recognition’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las
(Springer, Berlin Heidelberg, 2003), pp. 432–457 Vegas, NV, June 2016, pp. 770–778
[5] Nilsback, M., Zisserman, A.: ‘Automated flower classification over a large [34] Szegedy, C., Liu, W., Jia, Y., et al.: ‘Going deeper with convolutions’. Proc.
number of classes’. Proc. Sixth Indian Conf. Computer Vision, Graphics & IEEE Conf. Computer Vision and Pattern Recognition, Boston, MA, June
Image Processing, Bhubaneswar, India, December 2008, pp. 722–729 2015, pp. 1–9
[6] Nilsback, M., Zisserman, A.: ‘A visual vocabulary for flower classification’. [35] Song, G., Jin, X., Chen, G., et al.: ‘Two-level hierarchical feature learning for
Proc. IEEE Conf. Computer Vision and Pattern Recognition, New York, NY, image classification’, Front. Inf. Technol. Electron. Eng., 2016, 17, (9), pp.
June 2006, 2, pp. 1447–1454 897–906
[7] Zou, J., Nagy, G.: ‘Evaluation of model-based interactive flower recognition’. [36] Xie, L., Hong, R., Zhang, B., et al.: ‘Image classification and retrieval are
Proc. Int. Conf. Pattern Recognition, Cambridge, UK, August 2004, 2, pp. ONE’. Proc. Fifth ACM on Int. Conf. Multimedia Retrieval, Shanghai, China,
311–314 June 2015, pp. 3–10
[8] Yang, M., Zhang, L., Feng, X., et al.: ‘Sparse representation based Fisher [37] Razavian, A., Azizpour, H., Sullivan, J., et al.: ‘CNN features off-the-shelf:
discrimination dictionary learning for image classification’, Int. J. Comput. an astounding baseline for recognition’. Proc. IEEE Conf. Computer Vision
Vis., 2014, 109, (3), pp. 209–232 and Pattern Recognition Workshops, Columbus, OH, June 2014, pp. 512–519
[9] Khan, F., van de Weijer, J., Vanrell, M.: ‘Modulating shape features by color [38] Qian, Q., Jin, R., Zhu, S., et al.: ‘Fine-grained visual categorization via multi-
attention for object recognition’, Int. J. Comput. Vis., 2012, 98, (1), pp. 49–64 stage metric learning’. Proc. IEEE Conf. Computer Vision and Pattern
[10] Xie, L., Wang, J., Lin, W., et al.: ‘Towards reversal-invariant image Recognition, Boston, MA, June 2015, pp. 3716–3724
representation’, Int. J. Comput. Vis., 2017, 123, (2), pp. 226–250 [39] Xie, G., Zhang, X., Shu, X., et al.: ‘Task-driven feature pooling for image
[11] Hsu, T., Lee, C., Chen, L.: ‘An interactive flower image recognition system’, classification’. Proc. IEEE Int. Conf. Computer Vision, Santiago, Chile,
Multimedia Tools Appl., 2011, 53, (1), pp. 53–73 December 2015, pp. 1179–1187
[12] Mottos, A., Feris, R.: ‘Fusing well-crafted feature descriptors for efficient [40] Zheng, L., Zhao, Y., Wang, S., et al.: ‘Good practice in CNN feature transfer’,
fine-grained classification’. Proc. IEEE Int. Conf. Image Processing, Paris, arXiv preprint arXiv:1604.00133, 2016
France, October 2014, pp. 5197–5201 [41] Bakhtiary, A., Lapedriza, A., Masip, D.: ‘Winner takes all hashing for
[13] Chai, Y., Rahtu, E., Lempitsky, V., et al.: ‘TriCoS: a tri-level class- speeding up the training of neural networks in large class problems’, Pattern
discriminative co-segmentation method for image classification’. Proc. Recognit. Lett., 2017, 93, pp. 38–47
European Conf. Computer Vision, Florence, Italy, October 2012, I, pp. 794– [42] Zhang, C., Li, R., Huang, Q., et al.: ‘Hierarchical deep semantic
807 representation for visual categorization’, Neurocomputing, 2017, 257, pp. 88–
[14] Chen, Q., Song, Z, Hua, Y., et al.: ‘Hierarchical matching with side 96
information for image classification’. Proc. IEEE Conf. Computer Vision and [43] Liu, Y., Tang, F., Zhou, D., et al.: ‘Flower classification via convolutional
Pattern Recognition, Providence, RI, June 2012, pp. 3426–3433 neural network’. Proc. IEEE Int. Conf. Functional-Structural Plant Growth
[15] Chai, Y., Lempitsky, V., Zisserman, A.: ‘BiCoS: a Bi-level co-segmentation Modeling, Simulation, Visualization and Applications, Qingdao, China,
method for image classification’. Proc. Int. Conf. Computer Vision, November 2016, pp. 110–116
Barcelona, Spain, November 2011, pp. 2579–2586 [44] Liu, Y., Guo, Y., Lew, M.: ‘On the exploration of convolutional fusion
[16] Qi, X., Xiao, R., Li, C., et al.: ‘Pairwise rotation invariant co-occurrence local networks for visual recognition’, Proc. Int. Conf. MultiMedia Modeling,
binary pattern’, IEEE Trans. Pattern Anal. Mach. Intell., 2014, 36, (11), pp. Reykjavik, Iceland, January 2017, pp. 227–289
2199–2213 [45] Chakraborti, T., McCane, B., Mills, S., et al.: ‘Collaborative representation
[17] Hu, W., Hu, R., Xie, N., et al.: ‘Image classification using multiscale based fine-grained species recognition’. Proc. Int. Conf. Image and Vision
information fusion based on saliency driven nonlinear diffusion filtering’, Computing New Zealand, Palmerston North, New Zealand, November 2016,
IEEE Trans. Image Process., 2014, 23, (4), pp. 1513–1526 pp. 1–6
[18] Krizhevsky, A., Sutskever, I., Hinton, G.: ‘ImageNet classification with deep [46] Xia, X., Xu, C., Nan, B.: ‘Inception-v3 for flower classification’. Proc. Int.
convolutional neural networks’, in Pereira, F., Burges, C., Bottou, L., et al. Conf. Image, Vision and Computing (ICIVC), Chengdu, China, June 2017,
(ED.): ‘Advances in neural information processing systems’ (Curran pp. 783–787
Associates, Inc., Red Hook, NY, USA, 2012), pp. 1097–1105 [47] Wei, X., Luo, J., Wu, J., et al.: ‘Selective convolutional descriptor aggregation
[19] Simonyan, K., Zisserman, A.: ‘Very deep convolutional networks for large- for fine-grained image retrieval’, IEEE Trans. Image Process., 2017, 26, (6),
scale image recognition’. Proc. Int. Conf. Learning Representations, San pp. 2868–2881
Diego, CA, May 2015, arXiv preprint arXiv:1409.1556 [48] Xie, G., Zhang, X., Yang, W., et al.: ‘LG-CNN: from local parts to global
[20] Shelhamer, E., Long, J., Darrell, T.: ‘Fully convolutional networks for discrimination for fine-grained recognition’, Pattern Recognit., 2017, 71, pp.
semantic segmentation’, IEEE Trans. Pattern Anal. Mach. Intell., 2017, 39, 118–131
(4), pp. 640–651 [49] Shapiro, L., Stockman, G.: ‘Computer Vision’ (Prentice-Hall, Upper Saddle
[21] Girshick, R., Donahue, J., Darrell, T., et al.: ‘Rich feature hierarchies for River, NJ, USA, 2001), pp. 53–54
accurate object detection and semantic segmentation’. Proc. IEEE Conf. [50] Loshchilov, I., Hutter, F.: ‘SGDR: stochastic gradient descent with warm
Computer Vision and Pattern Recognition, Columbus, OH, June 2014, pp. restarts’. Proc. Int. Conf. Learning Representations, Toulon, France, April
580–587 2017, arXiv preprint arXiv:1608.03983
[22] Girshick, R.: ‘Fast R-CNN’. Proc. IEEE Int. Conf. Computer Vision, [51] Nilsback, M., Zisserman, A.: ‘Delving into the whorl of flower segmentation’.
Santiago, Chile, December 2015, pp. 1440–1448 Proc. British Machine Vision Conf., Warwick, UK, September 2007, pp.
[23] Ren, S., He, K., Girshick, R., et al.: ‘Faster R-CNN: towards real-time object 54.1–54.10
detection with region proposal networks’, IEEE Trans. Pattern Anal. Mach. [52] Nilsback, M., Zisserman, A.: ‘Delving deeper into the whorl of flower
Intell., 2017, 39, (6), pp. 1137–1149 segmentation’, Image Vis. Comput., 2010, 28), (6), pp. 1049–1062
[24] Redmon, J., Divvala, S., Girshick, R., et al.: ‘You only look once: unified, [53] Jia, Y., Shelhamer, E., Donahue, J., et al.: ‘Caffe: convolutional architecture
real-time object detection’. Proc. IEEE Conf. Computer Vision and Pattern for fast feature embedding’. Proc. 22nd ACM Int. Conf. Multimedia, Orlando,
Recognition, Las Vegas, NV, June 2016, pp. 779–788 FL, November 2014, pp. 675–678
[25] Xu, Y., Zhang, Q., Wang, L.: ‘Metric forests based on Gaussian mixture [54] Saitoh, T., Aoki, K., Kaneko, T.: ‘Automatic recognition of blooming
model for visual image classification’, Soft Comput., 2018, 22, (2), pp. 499– flowers’. Proc. Int. Conf. Pattern Recognition, Cambridge, UK, August 2004,
509 1, pp. 27–30
[26] Murray, N., Perronnin, F.: ‘Generalized max pooling’. Proc. IEEE Conf. [55] Aydin, D., Uğur, A.: ‘Extraction of flower regions in color images using ant
Computer Vision and Pattern Recognition, Columbus, OH, June 2014, pp. colony optimization’, Procedia Comput. Sci., 2011, 3, pp. 530–536
2473–2480 [56] Visin, F., Romero, A., Cho, K., et al.: ‘Reseg: a recurrent neural network-
[27] Xie, L., Wang, J., Zhang, B., et al.: ‘Incorporating visual adjectives for image based model for semantic segmentation’. Proc. IEEE Conf. Computer Vision
classification’, Neurocomputing, 2016, 182, pp. 48–55 and Pattern Recognition Workshops (CVPRW), Las Vegas, NV, June 2016,
[28] Ito, S., Kubota, S.: ‘Object classification using heterogeneous co-occurrence pp. 426–433
features’. Proc. European Conf. Computer Vision, Heraklion, Crete, Greece, [57] Liu, F., Lin, G., Qiao, R., et al.: ‘Structured learning of tree potentials in CRF
September 2010, V, pp. 701–714 for image segmentation’, IEEE Trans. Neural Netw. Learn. Syst., 2017, pp. 1–
[29] Zhang, C., Huang, Q., Tian, Q.: ‘Contextual exemplar classifier based image 7, doi: 10.1109/TNNLS.2017.2690453
representation for classification’, IEEE Trans. Circuits Syst. Video Technol., [58] Belongie, S., Perona, P.: ‘Visipedia circa 2015’, Pattern Recognit. Lett., 2016,
2017, 27, (8), pp. 1691–1699 72, pp. 15–24

862 IET Comput. Vis., 2018, Vol. 12 Iss. 6, pp. 855-862


© The Institution of Engineering and Technology 2018

You might also like