Comparison of faster region-based convolutional network for algorithms for grape leaves classification
Comparison of faster region-based convolutional network for algorithms for grape leaves classification
Moechammad Sarosa1, Puteri Nurul Ma’rifah1, Mila Kusumawardani1, Dimas Firmanda Al Riza2
1
Department of Electrical Engineering, State Polytechnic of Malang, Malang, Indonesia
2
Department of Biosystems Engineering, Faculty of Agricultural Technology, University of Brawijaya, Malang, Indonesia
Corresponding Author:
Moechammad Sarosa
Department of Electrical Engineering, State Polytechnic of Malang
St. Soekarno-Hatta 09, Malang 65141, Indonesia
Email: [email protected]
1. INTRODUCTION
Grapes belong to the Vitaceae family [1], which are known to have a number of health benefits
[2], [3]. It is imperative to understand the grape variant to determine the best cultivation technique, possible
quality, and commercial value [4], [5]. Grape growers are working on ensuring a precise identification of grapes
varieties, as well as determining how to grow cuttings based on each variety and calculating its supply price.
The varieties of grapes can be distinguished on the basis of their leaf shapes [6]. Grape growers are trying to
find the best way to find a precise identification of grape varieties, as well as to determine the best cultivation
technique for different types of grape variants with the best commercial values. Various studies have designed
different methods of classifying leaves of various types of plants over the last few years. Some of these methods
are mask algorithm region-based convolutional neural network (R-CNN) and VGG16 used to distinguish leaf
shapes [7], the convolutional neural network (CNN) technique [8], CNN to analyze leaf disease [9], [10], and
the standard ResNet-50 CNN model’s attention residual learning strategy (AResNet-50) [11].
Deep learning techniques, which help classify objects more accurately, are used by most studies to
classify plant leaves. Using deep learning methods for image recognition and also classification, has widely
spread in research [12]–[14]. Another classification method, CNN is one of the most common and widely used
deep learning models that have been proven to have good performance due to their excellent capability of
learning properties of an object using a large number of network architectures [15], [16]. Meanwhile, a new
method suggested by Liu et al. [17], Faster R-CNN, is currently being developed. Grape leaf variants are
identified using a deep learning method with the Faster R-CNN algorithm combined with the Inception
ResNet V2, ResNet-152, ResNet-101, and ResNet-50 network architecture and use pre-trained COCO weights.
This research employs five types of grape leaves, namely academic, jupiter, local, taldun, and transfiguration.
2. METHOD
2.1. Research data collection
In the data collection process, images of grape leaves are taken and used to generate a dataset. The
dataset contains 500 images, with 100 for each grape leaf variety. This research focuses on five grape varieties,
which are academics, jupiters, local, taldun, and transfiguration. Figures 1(a) to 1(e) shows the grape leaf in
one of our researchers' gardens. The resolution of the images is adjusted to the conditions set out in the
preconditioning weight, namely the COCO pretrained method. The adjustment is made after collecting data on
grape plant leaf images. COCO has a pre-trained weight of 640×640. Figure 1, which represents each class,
shows examples of the images that are applied to this dataset. Class here refers to the type of grape variety.
Figure 1. Datasets (a) academic, (b) jupiter, (c) local, (d) taldun, and (e) transfiguration
Subsequently, the data is divided into several categories, such as testing and training data, once all the
images have been annotated and labeled. Table 1 shows the partition scheme for the datasets. In training, the
Comparison of faster region-based convolutional neural network for algorithms … (Moechammad Sarosa)
224 ISSN: 2252-8938
result of image annotation and labeling may not be directly used. To be used in training and modeling activities,
the resulting annotation and label files that are generated with an extension format '.xml' have to be converted
into '.csv' file formats. Table 2 illustrates the conversion results for the file to '.csv'.
2.3. Modeling
Faster R-CNN is an improved CNN method that is developed from R-CNN and Fast R-CNN
[18]–[20]. What sets this method apart from its predecessors is the upgrade from the selective search feature
to a region proposal network (RPN) [21]. This study employs the Faster R-CNN algorithm with Inception
ResNet V2, ResNet-152, ResNet-101, and ResNet-50 architecture. A modeling exercise was carried out before
the experiment using Google Collaboratory tools. Figure 3 shows a model of the Faster R-CNN algorithm.
The Faster R-CNN architectural model begins with inputting an image and the leaf features which
then are recovered using backbone CNN. The CNN known as a technology that uses a set of convolutionals to
retrieve features in order [22] to create layers of care on the final result at each phase of training [23]. This
method can identify the items represented in an image [24]. The research also tested the organizing capacity of
COCO, computer vision, ImageNet, and natural language processing (NLP) to use large image classification
data.
The CNN backbones used in this study are Inception ResNet V2, ResNet-152, ResNet-101, and
ResNet-50. ResNet was one od the best deep neural networks in the 2015 classification competition know as
IMAGENET large scale visual recognition competition (ILSVRC2015). ResNet-18, ResNet-50, 101, 110, 152,
164, and ResNet-1202 are all identical variants with a variable number of layers [25]. The last number in the
ResNet architecture name indicates the total layers of the architecture for feature extraction in an image.
Beginning while the Inception convolution layer in ResNet V2 combines the Inception also ResNet layers,
ResNet V2 is a blend of both layers. Inception ResNet V2 is a mixture of the Inception and ResNet layers,
whereas the Inception convolution layer ResNet V2 is a combination of the Inception and ResNet layers [26].
The RPN, which requires future map output from a backbone network, is the next section. To display
the "Anchors" set for each location in the Output feature map, RPN is performed by putting it on an input
image. The anchors show different sizes and ratios of objects shown by the images. For PASCAL VOC, the
anchors have three scale box sizes (128², 256², 512²) and three aspect ratios (1:1, 1:2, and 2:1), so there are
nine possible anchors placed on the image input to the output feature map [27]. The output of this process
determines the probability that any of the nine anchor points on the backbone feature map contains objects at
that point [28].
The third part is the region of interest (ROI) polling layer, which uses a maximum polling operation
to collect features from the feature map and to change their size to a fixed size. A single-dimensional feature
vector containing. The ROI input for the layer that fully connected will be the polling layer’s output organized
as a single-dimensional feature vector. After going through the layers that fully connected, the features are also
fed into the regression and classification branches in the final section, which predicts the object’s correct match.
This way, it is possible, for example, to generate an image of objects with the designation of bounding boxes
and a possible classification result [28].
2.4. Measurement
True positive (TP), false positive (FP), and false negative (FN) values were obtained from the
measurement method used to test the grape leaf classification system. FP denores that bounding box identified
objects but failed to identify grape leaves, while FN indicates that the bounding box did not contain any objects
in the provided figure. TP denotes that the bounding box detected grape plan leaves successfully [29], [30]. F1
scores, recall, and precision are computed using these parameters. Recall is the degree of detection success,
whereas precision is the accuracy of the detection results. The F1 score was found to have a balance between
recall and precision. The following formula was used to determine the F1 scores, recall, and precision [29].
TP
Precision = TP+FP (1)
TP
Recall = TP+FN (2)
Precision × Recall
F1 Score = 2 × (3)
Precision+Recall
Comparison of faster region-based convolutional neural network for algorithms … (Moechammad Sarosa)
226 ISSN: 2252-8938
3.4. True positive, false positive, and false negative test results
The TP, FP, and FN test results show the system's ability to identify and classify grape leaf objects.
The test results in Table 6 on the Inception ResNet V2 architecture show a higher success rate in detecting
grape leaves with higher TP values and lower FP and FN compared to other architectures. This indicates that
this model is able to recognize objects more accurately, although it requires more computation time, making it
a better choice for applications with high precision requirements.
an example of the detection accuracy of this research test. In addition, the image detection accuracy has reached
100%, which shows that the system is able to analyze the model accurately and detect and categorize it
correctly. Detection error due to the dataset not being varied and the lighting when taking the picture.
(a) (b)
Figure 4. Effects of training on the F1 score (a) first trial and (b) second trial
Figure 6. Detection results of (a) academic, (b) jupiter, (c) local, (d) taldun, and (e) transfiguration
4. CONCLUSION
This study describes techniques for classifying and making use of the Faster R-CNN algorithm to
identify grape leaves also with the ResNet V2, ResNet-152, ResNet-101, and ResNet-50 network architectures,
all of which serve as trained networks for feature extraction. The experiment results show that the average
F1 scores are 93%, 90%, 88%, and 86% respectively, with the best F1 score on the Inception ResNet V2
network architecture (an average loss of 0.943). However, the time needed for training and testing is far more
intensive than the other network architectures. While the architecture with the fastest computing time for
training or testing is the ResNet-50 network architecture, the F1 score of this architecture is the lowest
compared to others. Furthermore, the training step limit for the Inception ResNet V2 is at step 3,000, while
ResNet 152 and ResNet-101 network architectures have 4,000 step training step limit. Meanwhile, the training
step limit for the ResNet-50 architecture, given that the F1 score continues to increase as the exercise step
increases, is unknown. The research establishes a general limit on the number of training steps as the F1 score
tends to stabilize or decline at some point during an exercise. It can be concluded that the Faster RCNN-based
detection and classification system for grape leaf can analyze object properties more effectively if total losses
during training are lowered.
ACKNOWLEDGEMENTS
The researchers would like to express gratitude towards the Directorate General of Research
Enhancement and Development, Ministry of Education, Culture, Research, and Technology, for the grant
provided with Decree Number 173/SPK/D.D4/PPK.01.APTV/VI/2023 and Agreement/Contract Number
12882/PL2.1/HK/2023.
REFERENCES
[1] M. Sarosa, P. N. Maa’rifah, M. Kusumawardani, and D. F. Al Riza, “Vitis Vinera L. leaf detection using faster R-CNN,” BIO Web
Conference, vol. 117, pp. 1–9, 2024, doi: 10.1051/bioconf/202411701021.
[2] A. Rahemi, J. C. D. Peterson, and K. T. Lund, “Grape Species,” in Grape Rootstocks and Related Species, Springer, Cham, 2022,
pp. 5–21, doi: 10.1007/978-3-030-99407-5_2.
[3] A. M. Walker, C. Heinitz, S. Riaz, and J. Uretsky, “Grape taxonomy and germplasm,” in The Grape Genome, Springer, Cham,
2019, pp. 25–38, doi: 10.1007/978-3-030-18601-2_2.
[4] G. Hasanaliyeva et al., “Effects of production region, production systems and grape type/variety on nutritional quality parameters
of table grapes; results from a UK retail survey,” Foods, vol. 9, no. 12, 2020, doi: 10.3390/foods9121874.
[5] B. Suter, A. D. Irvine, M. Gowdy, Z. Dai, and C. V. Leeuwen, “Adapting wine grape ripening to global change requires a multi-
trait approach,” Frontiers in Plant Science, vol. 12, pp. 1–17, 2021, doi: 10.3389/fpls.2021.624867.
[6] Y. Khan et al., “Antioxidant potential in the leaves of grape varieties (Vitis vinifera L.) grown in different soil compositions,”
Arabian Journal of Chemistry, vol. 14, no. 11, 2021, doi: 10.1016/j.arabjc.2021.103412.
[7] K. Yang, W. Zhong, and F. Li, “Leaf segmentation and classification with a complicated background using deep learning,”
Agronomy, vol. 10, no. 11, 2020, doi: 10.3390/agronomy10111721.
[8] N. A. Othman, N. S. Damanhuri, N. M. Ali, B. C. C. Meng, and A. A. A. Samat, “Plant leaf classification using convolutional
neural network,” 2022 8th International Conference on Control, Decision and Information Technologies (CoDIT), Istanbul, Turkey,
pp. 1043–1048, 2022, doi: 10.1109/CoDIT55151.2022.9804121.
[9] J. Hang, D. Zhang, P. Chen, J. Zhang, and B. Wang, “Classification of plant leaf diseases based on improved convolutional neural
network,” Sensors, vol. 19, no. 19, pp. 1–14, 2019, doi: 10.3390/s19194161.
[10] Y. Toda and F. Okura, “How convolutional neural networks diagnose plant disease,” Plant Phenomics, vol. 2019, 2019, doi:
10.34133/2019/9237136.
[11] A. Pandey and K. Jain, “Plant leaf disease classification using deep attention residual network optimized by opposition-based
symbiotic organisms search algorithm,” Neural Computing and Applications, vol. 34, pp. 21049–21066, 2022, doi: 10.1007/s00521-
022-07587-6.
[12] P. N. Ma’rifah, M. Sarosa, and E. Rohadi, “Garbage classification using Faster R-CNN,” 2023 International Conference on
Electrical and Information Technology (IEIT), Malang, Indonesia, pp. 196–201, 2023, doi: 10.1109/IEIT59852.2023.10335519.
[13] Q. Lv, S. Zhang, and Y. Wang, “Deep learning model of image classification using machine learning,” Advances in Multimedia,
vol. 2022, 2022, doi: 10.1155/2022/3351256.
[14] Z. Li et al., “A high-precision detection method of hydroponic lettuce seedlings status based on improved Faster RCNN,” Computers
and Electronics in Agriculture, vol. 182, 2021, doi: 10.1016/j.compag.2021.106054.
[15] M. M. Taye, “Theoretical understanding of convolutional neural network : concepts, architectures, applications, future directions,”
Computation, vol. 11, no. 52, 2023.
[16] X. Wang, Y. Zhao, and F. Pourpanah, “Recent advances in deep learning,” International Journal of Machine Learning and
Cybernetics, vol. 11, no. 4, pp. 747–750, 2020, doi: 10.1007/s13042-020-01096-5.
[17] S. Liu, H. Ban, Y. Song, M. Zhang, and F. Yang, “Method for detecting Chinese texts in natural scenes based on improved faster
R-CNN,” International Journal of Pattern Recognition and Artificial Intelligence, vol. 34, no. 2, 2020, doi:
10.1142/S021800142053002X.
[18] L. Jiang, J. Chen, H. Todo, Z. Tang, S. Liu, and Y. Li, “Application of a Fast RCNN based on upper and lower layers in face
recognition,” Computational Intelligence and Neuroscience, vol. 2021, 2021, doi: 10.1155/2021/9945934.
[19] P. N. M. Ma’rifah, M. Sarosa, and E. Rohadi, “Comparison of Faster R-CNN ResNet-50 and ResNet-101 methods for recycling
waste detection,” International Journal of Computer Applications Technology and Research, vol. 12, no. 12, pp. 26–32, 2023, doi:
10.7753/ijcatr1212.1006.
[20] W. Liu, S. Liao, W. Hu, X. Liang, and X. Chen, “Learning efficient single-stage pedestrian detectors by asymptotic localization fitting,”
Proceedings of the European Conference on Computer Vision (ECCV), pp. 643–659, 2018, doi: 10.1007/978-3-030-01264-9_38.
[21] W. Zou, Z. Zhang, Y. Peng, C. Xiang, S. Tian, and L. Zhang, “SC-RPN: A strong correlation learning framework for region
proposal,” IEEE Trans. Image Process., vol. 30, pp. 4084–4098, 2021, doi: 10.1109/TIP.2021.3069547.
[22] L. Fan, T. Zhang, and W. Du, “Optical-flow-based framework to boost video object detection performance with object
enhancement,” Expert Systems with Applications, vol. 170, 2020, 2021, doi: 10.1016/j.eswa.2020.114544.
[23] A. Ajit, K. Acharya, and A. Samanta, “A review of convolutional neural networks,” 2020 International Conference on Emerging
Trends in Information Technology and Engineering (ic-ETITE), Vellore, India, pp. 1–5, 2020, doi: 10.1109/ic-
ETITE47903.2020.049.
[24] J. Li et al., “Automatic detection and classification system of domestic waste via multimodel cascaded convolutional neural
Comparison of faster region-based convolutional neural network for algorithms … (Moechammad Sarosa)
230 ISSN: 2252-8938
network,” IEEE Transactions on Industrial Informatics, vol. 18, no. 1, pp. 163–173, 2022, doi: 10.1109/TII.2021.3085669.
[25] L. Ichim and D. Popescu, “Melanoma detection using an objective system based on multiple connected neural networks,” IEEE
Access, vol. 8, pp. 179189–179202, 2020, doi: 10.1109/ACCESS.2020.3028248.
[26] J. Wang, X. He, S. Faming, G. Lu, H. Cong, and Q. Jiang, “A real-time bridge crack detection method based on an improved
Inception ResNet V2 structure,” IEEE Access, vol. 9, pp. 93209–93223, 2021, doi: 10.1109/ACCESS.2021.3093210.
[27] Y. P. Chen, Y. Li, and G. Wang, “An enhanced region proposal network for object detection using deep learning method,” PLoS
One, vol. 13, no. 9, pp. 1–26, 2018, doi: 10.1371/journal.pone.0203897.
[28] W. Gu et al., “High accuracy thyroid tumor image recognition based on hybrid multiple models optimization,” IEEE Access, vol.
8, pp. 128426–128439, 2020, doi: 10.1109/ACCESS.2020.3008401.
[29] N. A. Prasetyo, Pranowo, and A. J. Santoso, “Automatic detection and calculation of palm oil fresh fruit bunches using faster R-
CNN,” International Journal of Applied Science and Engineering, vol. 17, no. 2, pp. 121–134, 2020, doi:
10.6703/IJASE.202005_17(2).121.
[30] M. Sarosa, N. Muna, and E. Rohadi, “Performance of faster R-CNN to detect plastic waste,” International Journal of Advanced
Trends in Computer Science and Engineering, vol. 9, no. 5, pp. 7756–7762, 2020, doi: 10.30534/ijatcse/2020/120952020.
BIOGRAPHIES OF AUTHORS
Dimas Firmanda Al Riza got a Doctoral degree from Kyoto University, Japan in
the field of Bio-sensing Engineering in 2019. The doctoral study was completed with the LPDP
presidential scholarship. He has published more than 70 scientific papers until 2022 including
dozens of them in Q1 reputable international journals. Recently, he received Young Researcher's
Academic Encouragement Award 2021 from The Japanese Society of Agricultural Machinery
and Food Engineers (JSAM). Currently, he is the Head of the Mechatronics Laboratory of Agro-
industry Tools and Machinery, Department of Agricultural Engineering, Faculty of Agricultural
Technology, Universitas Brawijaya. He can be contacted at email: [email protected].