Paper 34-Birds Identification System Using Deep Learning
Paper 34-Birds Identification System Using Deep Learning
net/publication/351361220
CITATIONS READS
3 4,429
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Suleyman Al-Showarah on 03 June 2021.
Abstract—Identifying birds is one of challenging role for bird algorithms and methods [1][3][4][7][14], but this study differs
watchers due to the similarity of the birds’ forms/image from others in using the following operations: combine
background and the lack of experience for watchers. So, it needs between the fc6/fc7, max between fc6/fc7, min between
a computer system based images to help birdwatchers in order to fc6/fc7, and the average for fc6/fc7 based on VGG-19. Hence,
identify birds. This study aims at investigating the use of deep the field of birdwatching needs more investigations to develop
learning for birds’ identification using convolutional neural systems with new technique that help to identify birds.
network for extracting features from images. The investigation
was performed on database contained 4340 images that collected As the database of images were collected from Jordan, and
by the paper author from Jordan. The Principal Component the statistics number of birds in Jordan as stated in [13] are
Analysis (was applied on layer 6 and 7, as well as on the 434 species belonging to 66 families.
statistical operations of merging the two layers like: average,
minimum, maximum and combine of both layers. The datasets This study aims at investigating the use of deep learning
were investigated by the following classifiers: Artificial neural for birds’ identification using VGG-19 for extracting features
networks, K-Nearest Neighbor, Random Forest, Naïve Bayes and from images. In order to achieve this aim, the investigation for
Decision Tree. Whereas, the metrics used in each classifier are: the performance of different classifiers were performed on the
accuracy, precision, recall, and F-Measure. The results of following classifiers: (KNN, Decision Tree, Random Forest,
investigation include and not limited to the following, the PCA and ANN) on the collected reliable database of birds images
used on the deep features does not only reduce the that available in Jordan.
dimensionality, and therefore, the training/testing time is
reduced significantly, but also allows for increasing the VGG-19 considered as one of the most important models
identification accuracy, particularly when using the Artificial of Convolutional Neural Networks (CNN). Therefore, CNN is
Neural Networks classifier. Based on the results of classifiers; considered as the strongest technique for deep learning used in
Artificial neural networks showed high classification accuracy image identification [9].
(70.9908), precision (0.718), recall (0.71) and F-Measure (0.708)
The main reason of using VGG-19 is to provide high
compared to other classifiers.
precision by finding features with distinctive details in the
Keywords—Birds identification; deep learning convolutional image like the difference in lighting conditions and other
neural networks (CNN); VGG-19; principal component analysis objects surrounding the birds [3]. Moreover, PCA could be
(PCA) employed as dimensionality reduction tools with these
features that would help to reduce number of features that will
I. INTRODUCTION make the training time less.
Many people are interested in observing and studying The motivation to conduct this study represented by:
wildlife, especially in birdwatching. The role of birdwatching 1) The shortage in the field of identifying birds based on
is to preserve the nature by observing bird’s behavior and images. 2) To the best of our knowledge, we have not come
migration pattern. The challenge for bird watchers in across to any study conducted using VGG-19 for identifying
identifying birds based images remains difficult due to the birds. 3) There is shortage in database available in the world
similarity of the birds’ forms/ image background and the lack except these two databases that available in [1] [18]. This case
of experience in this field for watchers [1]. is applicable to Jordan, as there is no database of images for
As mentioned in [17] that birds Voice or Videos were used birds, and there is no program was developed to identify birds.
in earlier technique to predict it species, but this technique Based on the extracted features using VGG-19, the
have many challenges to give an accurate result due to other contribution of this study can provide a research fields with a
background of birds/animal voices. So, images can be best comparison between the results of different aforementioned
choice to be used to identify birds’ species. To implement this classifiers.
technique, the images for all birds’ species need to be trained
to generate a model. Then deep learning algorithm will This study organized into six sections. Section II
convert uploaded image into gray scale format and apply that introduces the overview of previous studies on all related
image on train model to predict best match species name for subjects. Section III describes the used database. Section IV
the uploaded image. discusses the model design and the methodology for the
experiment. Then Section V discusses the results of the
Also, during the previous years, artificial intelligence is experimental, and finally, Section VI presents paper
used in the field of bird watching based images using different conclusion.
251 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 4, 2021
II. RELATED WORK available in [1]. In their study, their database consisted of 300-
Machine learning (ML) represents a set of techniques that 400 different images consists of number of bird species. In
allow systems to discover the required representations to their study, the algorithm used to extract image features is
features detection or classification from the raw data. The AlexNet and then classified by using a SVM classifier. The
performance of works in the classification system depends on results of accuracy is 85%.
the quality of the features. As such of this study can be The researchers in [11] used multiple pre-CNN networks
categorize under the field of ML; this is to make a search in algorithms like: (AlexNet, VGG19 and GoogleNet) on birds
this area for the studies that belong to birds’ identification. dataset that is called (Caltech-UCSD Birds-200-2011). Based
In the literature review, there are number of studies on approach of combining between the aforementioned
conducted in field of identifying birds. But they were algorithms together, the results showed that this approach
conducted in different algorithms and methods, as follows: improved the accuracy that reached to 81.91%, when applied
on Caltech-UCSD Birds-200-2011 dataset compared to other
There are number of studies conducted for identifying datasets used in the same study.
birds based audio/ video like [4][11][6][10]. While other
studies conducted to identify birds based images using AI Another study conducted by [4] in field of database birds
algorithms [1][3][14], but not in what was conducted in this based images and birds identification system. Their study
study. This study used different operations like: MAX, MIN, aimed to classification the birds during flight from video clips.
AVERAGE, and Combine between the layers fc6/fc7 based They approximately collected 952 clips and extracted about
on VGG-19 algorithm. 16,1907 frame photos of 13 birds’ species. In order to improve
the accuracy, the researchers used the two features:
In field of birds database-based images and birds appearance and motion features. Then, they compared their
identification system, the researchers in [19] conducted study proposed method with the classifiers (VGG, MobileNet). The
on data collected mostly from North American of 200 bird proposed method achieved a 90% correct classification rate
species, where they called it: (Caltech-UCSD Birds 200 when using Random forest classifier.
(CUB-200)). They conducted their study based on two simple
features: image sizes and color histograms. In the case of In field of birds’ identification system, the researchers in
image sizes, they represented each image by its width and [3] applied different methods like: 1) softmax regression using
height in pixels. But in case the color histograms, they used 10 manually features on the Caltech-UCSD-Birds-200 dataset
bins per channel, where an applied Principal Component [19]. 2) A multi-class SVM was applied on HOG and RGB on
Analysis was applied. Their results showed how the features extracted from images. 3) A CNN was applied using
performance of the NN classifier degrades as the number of transfer learning algorithm to classify birds. The results of
classes in the dataset is increased, as in [18]. The performance comparing the three methods 46% when using CNN.
of the image size features are close to chance at 0.6% for the In the next section, the database content, number of
200 classes, while the color histogram features increase the images, source of images, and the challenges to classify
performance to 1.7%. Another example of studies that images are explained.
conducted in field of database for birds based images and
birds’ identification system, the researchers in [18] increased III. DATABASE DESIGN
the number of images to 11788 images; as it was 6033 in [19]. The database of birds images were collected from Jordan,
Where they used RGB color histograms and histograms of and it consists of 4340 images of 434 bird species. The
vector-quantized SIFT descriptors with a linear SVM. The database images were obtained from scientific sources and
results obtained of their study for the classification accuracy is were approved by Jordanian Bird Watching Association based
17.3%. on their scientific names [13].
Also, in the field of birds’ identification system, the The images have different backgrounds, where some of
researchers in [14] proposed a new feature to distinguish the them were taken in shadow condition, lightening background,
types of birds. In their study, they used the ratio of the and some of them have other objects in the images as
distance from the eye to the beak root, and the beak width. background. This has added a huge challenge to the
This feature was integrated in the decision tree, and then in researchers to extract features, and to provide high accuracy.
SVM. This proposal was applied to the database that called
(CUB-200-2011 dataset) that mentioned in [18]. The results IV. PROPOSED METHOD
achieved for correct classification rate is 84%. This section presents the procedures that used for the
Another study conducted on birds-identification. Their proposed method in identifying birds using VGG-19. Fig. 1
database was collected in India by the researchers that shows the proposed model.
252 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 4, 2021
The following steps explains the proposed model of this V. EXPERIMENTAL RESULTS AND DISCUSSIONS
study, as follows:
This section presents the performance evaluation results
Step 1): The feature vectors will be extracted form images for the study dataset, which includ the accuracy , F-measure,
automatically using MATLAB for Pretrained VGG-19 to recall, precision and training time for each classifier as
build dataset that includes (feature factors: fc6 and fc7). Each follows:- 1KNN, 3KNN, 5KNN, ANN, Naïve Bayes, Random
dataset (e.g. fc6) contains 4096 columns (representing feature Forest and Decision Tree.
vectors) and 4340 rows (representing the number of samples
(images). The results of this study are displayed as follows:
Step 2): The statistical operations like: (min, max, average, A. Results of both Orginal/Pure fc6/fc7 Datasets Separately
and combined them together) were performed on the Table I shows the results of both orginal of fc6 and fc7
original/pure of fc6 and fc7 layers, this is to obtain new datasets. Naive Bayes has achieved the highest accuracy
dataset to be used in the next stage (step 3) of using classifiers. results for fc6 and fc7 which are (59.002) and (56.106).. While
Explanation on statistical operations, as follows: for the time spend to conduct the test and training dataset,
Decision Tree has spend large time (1406.69s), but KNNs
• Max: used to find the largest value between the two spend less time (0s) compared to other classifers. This is
values in fc6 and fc7 and put value in a new group. because it has no training model; where the test example is
• Min: used to find the less value between the two values compared directly to other examples in the training set, and
in fc6 and fc7 and put value in a new group. that why it is slow in testing, particularly when used a large
number of examples in the training [8][16]. This results match
• Average: used to find average the two values in fc6 and with the results in [5] [12].
fc7 and put value in a new group.
B. Results of the Statitsical Operations on fc6 and fc7
• Combined them together: used to combine the first Datasets
group (4096) next to the second group (4096). This is The section show the results of three dataset by applying
to have a new group that contains 8192 features in this statistical operations(avgerage, maximum, minmum) between
study. the fc6 and fc7 layers.
Step 3): A PCA will be applied on the original/pure of fc6, Table II shows results of the statitsical operations on
fc7, the dataset that obtained from the previous stage (step 2); fc6/fc7 datasets , where Naive Bayes has achieved the highest
this is to produce a new datasets. accuracy results for AVERAGE, MAX, and MIN, which are
The data obtained using the pre-trained VGG-19, is very (57.30), (60.99) , and (57.60) respectively. Despite of the
large (4096), therefore, the PCA was implemented to reduce Naive Bayes have scored acceptable accuracy, F-measure,
the number of features. In PCA, there were set of percentages recall, and precision that outperformed all classifiers, but also
used to show the variance of the data in the results, which are: it was achieved with acceptable training time. This result dis-
95%, 97% and 99% variance of the data (the 4096 features). match with other studies [2] [15].
Step 4): The results were performed based on applying set C. Results of Combine between (Original fc6/ fc7) Dataset
of classifiers on the datasets that obtained from (step 2 and A new dataset was obtained called combine by combining
step 3). of fc6 (4096)and fc7 (4096), which contained 8192 feature
vector, and accordingly will obtained the results:
253 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 4, 2021
Table III shows the birds identification results where In Table IV, the classifer ANN was not used in the
Naive Bayes has achieved the highest accuracy results in previous Tables I to III. This can be explained as follows:
combine (59.4009) of accuracy. The second high result of ANN is the best classifier to be used for deep features, if and
accuracy is 1KNN that has achieved accuracy of 50.2074. only if it is provided with a smaller number of deep features,
While for the time spend to conduct the test and training otherwise, i.e. if it is applied on the original/pure deep
dataset, Decision Tree spend large time (|2484.01s), but KNNs features, which obtained from the VGG-19 layer 6 or 7 or any
spend less time (0s) compared with other classifiers. merging of them both, the training time would be
unacceptably long [2] [15][11].
D. Results of Both Original/pure fc6/fc7 after Applying PCA
Tables IV to V shows the idntification results for each
classifier after applying PCA (95%,97%,and 99%).
254 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 4, 2021
TABLE III. IDENTIFICATION RESULTS OF COMBINE BETWEEN (ORIGINAL FC6/ FC7) DATASET
TABLE IV. IDENTIFICATION RESULTS OF ORIGINAL/PURE FC6 AFTER APPLYING PCA (95%,97%,99%)
Applying PCA has influnced on the training time for fc6 Table V shows the birds identification results for fc7
that made it less for all classifers in Table IV-after applying where the highest accuracy resultant from applying PCA of
PCA compared to the training times in Tables I to III, before (95%,97% and 99%) are in favors of ANN with (65.2995,
applying PCA, especially for Random Forest and Naïve 65.2995 and 67.9493), respectively.
Bayes. The highest accuracy resultant from applying PCA of
(95%, 97% and 99%) is in favor of ANN with (68.8018, 70 The second high accuracy resultant from applying PCA of
and 62.3733%), respectively, which can be attributed to the all percentage of (95%, 97% and 99%) is Naïve Bayes, has
reduced feature vector. achieved accuracy of (58.3641, 56.9585 and 56.3825%),
respectively.
So, it is worth mentioning that the ANN classifier was not
used with other sets except those obtained after applying the E. Results of the Statistical Operations on (fc6 and fc7) after
PCA, this is because of its unacceptable training time. This Applying PCA
results matches with previous studies that stated the training This section presents the identification results of the
time for ANN spend large compared with other classifers statistical operations on each of (average, maximum and
[2][15]. minimum) between the fc6 and fc7 after applying PCA
255 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 4, 2021
(95%,97%,99%), as well as the results of training time for While for the time spend to conduct the test and training
each classifier, as follws: dataset, ANN spend large time 54151.88s.
Table VI shows the birds identification results in (average Table VIII shows the birds identification results in
between (fc6 and fc7)) where the highest accuracy resultant (minimum between (fc6 and fc7)) where the highest accuracy
from applying PCA of (95%, 97% and 99%) are in favors of resultant from applying PCA of (95%) are in favors of ANN
ANN with (69.5622, 69.9078 and 65.5069) respectively. The with (70.8295). It is noted that the result of the ANN is
second-high accuracy resultant from applying PCA of all appeared only for the PCA (95%), but not for the percentage
percentage of (95%, 97% and 99%) is Naïve Bayes that has (97%, and 99%). This is because the large number of features
achieved accuracy of (53.3871, 49.7926 and 39.8157%) for each of PCA (97% and 99%) that reached to (1205 and
respectively. While the time spend to conduct the test and 1910) features respectively. Also, due to its unacceptable
training dataset, ANN spend large time 58379.22s , where that training time (that takes days to provide the results. While
PCA 95 spend less time compared to PCA 97and PCA99. Naïve Bayes achieved accuracy resultant from applying PCA
of all percentage of (95%, 97% and 99%), they are as follows
Table VII shows the birds identification results in (48.7327, 44.1014 and 35%), respectively. While for the time
(maximum between (fc6 and fc7)) where the highest accuracy spend to conduct the test and training dataset, ANN spend
resultant from applying PCA of (95%) are in favors of ANN large time 42677.02s.
with (66.9816) . It is noted that the results of the ANN is
appeared only for PCA (95%), but not for the percentage of F. Results of Combining Feature Vector after Applying PCA
(97%, and 99%). This is because the large number of features This section shows the results of combining between fc6
for each of PCA (97% and 99%) that reached to (1428, and (4096) and fc7 (4096) that reached 8192, but this number of
2117) features, respectively. Therefore there will not be results features have been reduced after appling PCA (95%, 97%,
when using ANN, due to its unacceptable training time (that 99%) that become (250, 440 and 1080) features, respectively.
takes days to provide the results. The results of combine, as follows:
256 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 4, 2021
TABLE VI. IDENTIFICATION RESULTS OF AVERAGE BETWEEN (FC6 AND FC7) AFTER APPLYING OF PCA (95%,97%,99%)
TABLE VII. IDENTIFICATION RESULTS OF MAXIMUM BETWEEN (FC6 AND FC7) AFTER APPLYING OF PCA (95%,97%,99%)
257 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 4, 2021
Table IX shows the birds identification results in (combine 2) Some others of previous studies conducted on dataset
between fc6 and fc7) where the highest accuracy resultant containing a large number of images in training dataset
from applying PCA of (95%,97% and 99%) are in favors of (examples) like in [4], [3], [14] that used (161907), (11788),
ANN with (69.5392, 70.9908 and 67.9263), respectively. The (11788) examples recpectively compared to this study which
second high accuracy resultant from applying PCA of all
contained a few images (4340 examples). Few number of
percentage of (95%, 97% and 99%) is Naïve Bayes that has
achieved accuracy of (57.235, 54.1475 and 43.7558%), images (examples) for each bird usually leads to low accuracy
respectively. compared to the large examples, but in constant it was not.
This leads to make more covident in the results of this study.
While for the time spend to conduct the test and training 3) There were studies conducted for identifying birds
dataset, ANN spend large time (56279.29s). Comparison using different algorithms and methods based audio/ video
between the proposal work and previous researchers’ works.
like [4][11][6][10], while other studies conducted to identify
Table X compares the results of the proposed approach birds based images using AI algorithms [1][3][17]. This is less
with three similar approaches for birds identification. in what was conducted in this study that used deep-learning
Table X has approved that the output of our proposal can algorithms and different statistical operations like: MAX,
be considered as one of the interesting study comapred to the MIN, AVERAGE, and combine between the layers fc6/fc7
previous researchs, for several reasons: based on VGG-19 algorithm.
4) This study conducted on different methods like:
1) Some of previous studies were conducted on small
combine between the fc6/fc7, max of fc6/fc7, min of fc6/fc7,
dataset birds (categories) like in [4], [7] that used (13), (16)
and the average for fc6/fc7 based on VGG-19.
categories recpectively, compared to this study that used
(434).
TABLE VIII. IDENTIFICATION RESULTS OF MINIMUM BETWEEN (FC6 AND FC7) AFTER APPLYING OF PCA (95%,97%,99%)
258 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 4, 2021
TABLE IX. IDENTIFICATION RESULTS OF COMBINE ON (FC6 AND FC7) AFTER APPLYING PCA (95%, 97%, 99%)
259 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 4, 2021
training/testing time is reduced significantly, but also allows [9] Hijazi, Samer, Rishi Kumar, and Chris Rowen. 2015. “Using
for increasing the identification accuracy, particularly when Convolutional Neural Networks for Image Recognition.” . IP Group,
Cadence. Retrieved from https://ip.cadence.com/uploads/901/cnn_wp-
using the ANN classifier. Based on the results of classifiers; pdf.
ANN showed high classification accuracy (70.9908), precision [10] Incze, A., Jancsó, H. B., Szilágyi, Z., Farkas, A., & Sulyok, C. 2018.
(0.718), recall (0.71) and F-Measure (0.708) compared to Bird sound recognition using a convolutional neural network. In 2018
other classifiers. IEEE 16th International Symposium on Intelligent Systems and
Informatics (SISY) :295-300 IEEE.
It is recommended to conduct more investigation to [11] Korzh, Oxana, Mikel Joaristi, and Edoardo Serra B. 2018.
improve accuracy results and to reduce training time using “Convolutional Neural Network Ensemble Fine-Tuning for Extended
different algorthms. Transfer.” In International Conference on Big Data, 110–23. Retrieved
from http://dx.doi.org/10.1007/978-3-319-94301-5_9.
REFERENCES
[12] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E Hinton. 2012.
[1] Tayal, Madhuri, Atharva Mangrulkar, Purvashree Waldey, and Chitra “ImageNet Classification with Deep Convolutional Neural Networks.”
Dangra. 2018. “Bird Identification by Image Recognition.” Helix 8(6): 25: 1–9. Technologies.
4349–4352.
[13] The Royal Society For The Conservation Of Nature 2017.
[2] Albustanji, Abeer. 2019. “Veiled-Face Recognition Using Deep “Birdwatching in Jordan”. Retrieved from https://migratorysoaringbirds.
Learning.” Mutah University. birdlife.org/sites/default/files/jordan_birding_brochure.pdf.
[3] Alter, Anne L, and Karen M Wang. 2017. “An Exploration of Computer [14] Qiao, Baowen, Zuofeng Zhou, Hongtao Yang, and Jianzhong Cao. 2017.
Vision Techniques for Bird Species Classification.”. “Bird Species Recognition Based on SVM Classifier and Decision
[4] Atanbori, John et al. 2018. “Classification of Bird Species from Video Tree.” In 2017 First International Conference on Electronics
Using Appearance and Motion Features” Ecological Informatics 48: 12– Instrumentation & Information Systems 1–4.
23. [15] Sarayrah, Bayan mahmoud. 2019. “Finger Knuckle Print Recognition
[5] Brownlee, Jason. 2016. “How To Use Classification Machine Learning Using Deep Learning.” Mutah University.
Algorithms in Weka.” Retrieved from https://machinelearningmastery. [16] S. Al-Showarah et. al. 2020. “The Effect of Age and Screen Sizes on the
com/use-classification-machine-learning-algorithms-weka/. Usability of Smartphones Based on Handwriting of English Words on
[6] Cai, J., Ee, D., Pham, B., Roe, P., & Zhang, J. (2007, December). Sensor the Touchscreen”, Mu’tah Lil-Buhuth wad-Dirasat, Natural and Applied
network for the monitoring of ecosystem: Bird species recognition. In Sciences series, Vol. 35, No. 1, 2020. ISSN: 1022-6812.
2007 3rd international conference on intelligent sensors, sensor [17] Triveni, G., Malleswari, G. N., Sree, K. N. S., & Ramya, M (2020). Bird
networks and information 293-298. IEEE. Species Identification using Deep Fuzzy Neural Network Int. J. Res.
[7] Kumar, A., & Das, S. D. 2018. “Bird Species Classification Using Appl. Sci. Eng. Technol.(IJRASET), 8: 1214-1219.
Transfer Learning with Multistage Training”. In Workshop on Computer [18] C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie 2011. “The
Vision Applications 28-38. Springer, Singapore. Caltech-UCSD Birds-200-2011 Dataset.” Technical Report CNS-TR-
[8] Hassanat, A. (2018). “Furthest-pair-based binary search tree for 2011-001, California Institute of Technology.
speeding big data classification using k-nearest neighbors”. Big Data, [19] Welinder, Peter et al. 2010. “Caltech-UCSD Birds 200.” Technical
6(3): 225-235. ReportCNS-TR-2010-001, California Institute of Technology, 2010.
260 | P a g e
www.ijacsa.thesai.org
View publication stats