0% found this document useful (0 votes)
3 views

Comparison of faster region-based convolutional network for algorithms for grape leaves classification

The shapes of leaves distinguish the Indonesian grape variants. The grape leaves might look the same at first glance, but there are differences in leaf shapes and characteristics when observed closely. This research uses a deep learning method combined with the faster region-based convolutional neural network (R-CNN) algorithm with the Inception network architecture, ResNet V2, ResNet-152, ResNet-101, and ResNet-50, and uses COCO weights trained to classify five grape varieties through leaf images. The study collected 500 images to be used as an independent dataset. The results show that network improvements can effectively improve operating efficiency. There are also limitations to training scores because the F1 score value tends to stabilize or decrease at a certain point. In the Inception ResNet V2 architecture, with the highest average F1 score of 92%, the average computing time for training and testing is longer than other network architectures. This suggests that the algorithm can classify types of grapes based on their leaves.

Uploaded by

IAES IJAI
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Comparison of faster region-based convolutional network for algorithms for grape leaves classification

The shapes of leaves distinguish the Indonesian grape variants. The grape leaves might look the same at first glance, but there are differences in leaf shapes and characteristics when observed closely. This research uses a deep learning method combined with the faster region-based convolutional neural network (R-CNN) algorithm with the Inception network architecture, ResNet V2, ResNet-152, ResNet-101, and ResNet-50, and uses COCO weights trained to classify five grape varieties through leaf images. The study collected 500 images to be used as an independent dataset. The results show that network improvements can effectively improve operating efficiency. There are also limitations to training scores because the F1 score value tends to stabilize or decrease at a certain point. In the Inception ResNet V2 architecture, with the highest average F1 score of 92%, the average computing time for training and testing is longer than other network architectures. This suggests that the algorithm can classify types of grapes based on their leaves.

Uploaded by

IAES IJAI
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

IAES International Journal of Artificial Intelligence (IJ-AI)

Vol. 14, No. 1, February 2025, pp. 222~230


ISSN: 2252-8938, DOI: 10.11591/ijai.v14.i1.pp222-230  222

Comparison of faster region-based convolutional network for


algorithms for grape leaves classification

Moechammad Sarosa1, Puteri Nurul Ma’rifah1, Mila Kusumawardani1, Dimas Firmanda Al Riza2
1
Department of Electrical Engineering, State Polytechnic of Malang, Malang, Indonesia
2
Department of Biosystems Engineering, Faculty of Agricultural Technology, University of Brawijaya, Malang, Indonesia

Article Info ABSTRACT


Article history: The shapes of leaves distinguish the Indonesian grape variants. The grape
leaves might look the same at first glance, but there are differences in leaf
Received Dec 15, 2023 shapes and characteristics when observed closely. This research uses a deep
Revised Jul 2, 2024 learning method combined with the faster region-based convolutional neural
Accepted Jul 26, 2024 network (R-CNN) algorithm with the Inception network architecture,
ResNet V2, ResNet-152, ResNet-101, and ResNet-50, and uses COCO
weights trained to classify five grape varieties through leaf images. The study
Keywords: collected 500 images to be used as an independent dataset. The results show
that network improvements can effectively improve operating efficiency.
Grape plant There are also limitations to training scores because the F1 score value tends
Grape varieties to stabilize or decrease at a certain point. In the Inception ResNet V2
Inception ResNet architecture, with the highest average F1 score of 92%, the average computing
Leaf detection time for training and testing is longer than other network architectures. This
ResNet suggests that the algorithm can classify types of grapes based on their leaves.
This is an open access article under the CC BY-SA license.

Corresponding Author:
Moechammad Sarosa
Department of Electrical Engineering, State Polytechnic of Malang
St. Soekarno-Hatta 09, Malang 65141, Indonesia
Email: [email protected]

1. INTRODUCTION
Grapes belong to the Vitaceae family [1], which are known to have a number of health benefits
[2], [3]. It is imperative to understand the grape variant to determine the best cultivation technique, possible
quality, and commercial value [4], [5]. Grape growers are working on ensuring a precise identification of grapes
varieties, as well as determining how to grow cuttings based on each variety and calculating its supply price.
The varieties of grapes can be distinguished on the basis of their leaf shapes [6]. Grape growers are trying to
find the best way to find a precise identification of grape varieties, as well as to determine the best cultivation
technique for different types of grape variants with the best commercial values. Various studies have designed
different methods of classifying leaves of various types of plants over the last few years. Some of these methods
are mask algorithm region-based convolutional neural network (R-CNN) and VGG16 used to distinguish leaf
shapes [7], the convolutional neural network (CNN) technique [8], CNN to analyze leaf disease [9], [10], and
the standard ResNet-50 CNN model’s attention residual learning strategy (AResNet-50) [11].
Deep learning techniques, which help classify objects more accurately, are used by most studies to
classify plant leaves. Using deep learning methods for image recognition and also classification, has widely
spread in research [12]–[14]. Another classification method, CNN is one of the most common and widely used
deep learning models that have been proven to have good performance due to their excellent capability of
learning properties of an object using a large number of network architectures [15], [16]. Meanwhile, a new
method suggested by Liu et al. [17], Faster R-CNN, is currently being developed. Grape leaf variants are

Journal homepage: http://ijai.iaescore.com


Int J Artif Intell ISSN: 2252-8938  223

identified using a deep learning method with the Faster R-CNN algorithm combined with the Inception
ResNet V2, ResNet-152, ResNet-101, and ResNet-50 network architecture and use pre-trained COCO weights.
This research employs five types of grape leaves, namely academic, jupiter, local, taldun, and transfiguration.

2. METHOD
2.1. Research data collection
In the data collection process, images of grape leaves are taken and used to generate a dataset. The
dataset contains 500 images, with 100 for each grape leaf variety. This research focuses on five grape varieties,
which are academics, jupiters, local, taldun, and transfiguration. Figures 1(a) to 1(e) shows the grape leaf in
one of our researchers' gardens. The resolution of the images is adjusted to the conditions set out in the
preconditioning weight, namely the COCO pretrained method. The adjustment is made after collecting data on
grape plant leaf images. COCO has a pre-trained weight of 640×640. Figure 1, which represents each class,
shows examples of the images that are applied to this dataset. Class here refers to the type of grape variety.

(a) (b) (c) (d) (e)

Figure 1. Datasets (a) academic, (b) jupiter, (c) local, (d) taldun, and (e) transfiguration

2.2. Annotation and labeling


The process of marking the grape leaf in a picture is called annotation and labeling. The initial step is
to draw a box on the leaf surface of the grape plant and mark each of the boxes with labels "Academic",
"Jupiter", "Local", "Taldun", and "Transfiguration". A file with an extension '.xml' is generated by image
annotation and labeling in the PASCAL VOC format. The annotation and labeling phase produces a baseline
truth which is then used to calculate the regression loss of bounding box detection points on objects detected
during training. The process employs image software. An example of the type of annotation and labeling is
shown in Figure 2.

Figure 2. Labeling process using software called LabelImg

Subsequently, the data is divided into several categories, such as testing and training data, once all the
images have been annotated and labeled. Table 1 shows the partition scheme for the datasets. In training, the

Comparison of faster region-based convolutional neural network for algorithms … (Moechammad Sarosa)
224  ISSN: 2252-8938

result of image annotation and labeling may not be directly used. To be used in training and modeling activities,
the resulting annotation and label files that are generated with an extension format '.xml' have to be converted
into '.csv' file formats. Table 2 illustrates the conversion results for the file to '.csv'.

Table 1. Table of dataset splits


Exp schematic- Total training data Total test data
1 400 50
2 450 50

Table 2. Conversion results of XML to CSV files


File name Wide Tall Class Xmin Ymin Xmax Ymax
Academic5.jpg 640 640 academic 55 142 543 622
Jupiter6.jpg 640 640 jupiter 87 6 615 596
Local4.jpg 640 640 local 6 9 590 512
Taldun3.jpg 640 640 taldun 22 7 507 537
Transfiguration2.jpg 640 640 transfiguration 114 85 635 545

2.3. Modeling
Faster R-CNN is an improved CNN method that is developed from R-CNN and Fast R-CNN
[18]–[20]. What sets this method apart from its predecessors is the upgrade from the selective search feature
to a region proposal network (RPN) [21]. This study employs the Faster R-CNN algorithm with Inception
ResNet V2, ResNet-152, ResNet-101, and ResNet-50 architecture. A modeling exercise was carried out before
the experiment using Google Collaboratory tools. Figure 3 shows a model of the Faster R-CNN algorithm.

Figure 3. Model of the Faster R-CNN algorithm illustration

The Faster R-CNN architectural model begins with inputting an image and the leaf features which
then are recovered using backbone CNN. The CNN known as a technology that uses a set of convolutionals to
retrieve features in order [22] to create layers of care on the final result at each phase of training [23]. This
method can identify the items represented in an image [24]. The research also tested the organizing capacity of
COCO, computer vision, ImageNet, and natural language processing (NLP) to use large image classification
data.
The CNN backbones used in this study are Inception ResNet V2, ResNet-152, ResNet-101, and
ResNet-50. ResNet was one od the best deep neural networks in the 2015 classification competition know as
IMAGENET large scale visual recognition competition (ILSVRC2015). ResNet-18, ResNet-50, 101, 110, 152,
164, and ResNet-1202 are all identical variants with a variable number of layers [25]. The last number in the
ResNet architecture name indicates the total layers of the architecture for feature extraction in an image.

Int J Artif Intell, Vol. 14, No. 1, February 2025: 222-230


Int J Artif Intell ISSN: 2252-8938  225

Beginning while the Inception convolution layer in ResNet V2 combines the Inception also ResNet layers,
ResNet V2 is a blend of both layers. Inception ResNet V2 is a mixture of the Inception and ResNet layers,
whereas the Inception convolution layer ResNet V2 is a combination of the Inception and ResNet layers [26].
The RPN, which requires future map output from a backbone network, is the next section. To display
the "Anchors" set for each location in the Output feature map, RPN is performed by putting it on an input
image. The anchors show different sizes and ratios of objects shown by the images. For PASCAL VOC, the
anchors have three scale box sizes (128², 256², 512²) and three aspect ratios (1:1, 1:2, and 2:1), so there are
nine possible anchors placed on the image input to the output feature map [27]. The output of this process
determines the probability that any of the nine anchor points on the backbone feature map contains objects at
that point [28].
The third part is the region of interest (ROI) polling layer, which uses a maximum polling operation
to collect features from the feature map and to change their size to a fixed size. A single-dimensional feature
vector containing. The ROI input for the layer that fully connected will be the polling layer’s output organized
as a single-dimensional feature vector. After going through the layers that fully connected, the features are also
fed into the regression and classification branches in the final section, which predicts the object’s correct match.
This way, it is possible, for example, to generate an image of objects with the designation of bounding boxes
and a possible classification result [28].

2.4. Measurement
True positive (TP), false positive (FP), and false negative (FN) values were obtained from the
measurement method used to test the grape leaf classification system. FP denores that bounding box identified
objects but failed to identify grape leaves, while FN indicates that the bounding box did not contain any objects
in the provided figure. TP denotes that the bounding box detected grape plan leaves successfully [29], [30]. F1
scores, recall, and precision are computed using these parameters. Recall is the degree of detection success,
whereas precision is the accuracy of the detection results. The F1 score was found to have a balance between
recall and precision. The following formula was used to determine the F1 scores, recall, and precision [29].

TP
Precision = TP+FP (1)

TP
Recall = TP+FN (2)

Precision × Recall
F1 Score = 2 × (3)
Precision+Recall

3. RESULTS AND DISCUSSION


3.1. Data training process computing time
The computing time is the time needed by a computer to process an algorithm and train data. Table 3
shows the training time required for each experiment. Furthermore, the Inception ResNet V2 architecture takes
the longest to complete the training, while the ResNet-50 architecture has the fastest training time.

Table 3. Training process computing time


Network architecture Experiment to- Compute Time (minute)
3,000 4,000 5,000
Inception Resnet V2 1 69 69 73
2 81 85 91
ResNet-152 1 25 31 43
2 35 35 61
ResNet-101 1 25 31 41
2 31 31 49
ResNet-50 1 19 29 39
2 29 29 47

3.2. Data testing process computing time


Each network architecture for the testing process requires varying computational time depending on
the complexity of the model. Testing time on the Inception ResNet V2 architecture tends to be longer compared
to other architectures, with an average testing time reaching more than 100 seconds. Table 4 shows that the
computational cost of the Inception ResNet V2 architecture is much higher, making it an important
consideration in field applications, even though it has better performance in terms of detection accuracy.

Comparison of faster region-based convolutional neural network for algorithms … (Moechammad Sarosa)
226  ISSN: 2252-8938

Table 4. Testing process computing time


Network architecture Experiment to- Compute Time (second)
3,000 4,000 5,000
Inception Resnet V2 1 107 103 101
2 101 103 101
ResNet-152 1 78 78 80
2 80 73 73
ResNet-101 1 55 58 57
2 58 55 65
ResNet-50 1 48 45 45
2 44 43 42

3.3. Total loss while modeling


A key result of the modeling process for grape leaf shapes is the loss function. Weaknesses are found
at every stage of the modeling process. Table 5 shows where the difference between the performances of the
modeling process occurs. Regarding the average loss of the different architectures, Inception ResNet V2 shows
the lowest average loss, while the average loss of ResNet-50 is the highest.

Table 5. Loss of Faster R-CNN in the modeling process


Step Inception Resnet V2 ResNet-152 ResNet-101 ResNet-50
experiment to- experiment to- experiment to- experiment to-
1 2 1 2 1 2 1 2
3,000 0.0872 0.0723 0.1065 0.1016 0.1375 0.1302 0.1507 0.1464
4,000 0.1052 0.0859 0.0825 0.0767 0.1232 0.1114 0.1361 0.1346
5,000 0.1096 0.1055 0.1334 0.1287 0.1398 0.1328 0.1146 0.1096
Average 0.1007 0.0879 0.1074 0.1023 0.1335 0.1248 0.1338 0.1302
Average 0.0943 0.1049 0.1291 0.1320

3.4. True positive, false positive, and false negative test results
The TP, FP, and FN test results show the system's ability to identify and classify grape leaf objects.
The test results in Table 6 on the Inception ResNet V2 architecture show a higher success rate in detecting
grape leaves with higher TP values and lower FP and FN compared to other architectures. This indicates that
this model is able to recognize objects more accurately, although it requires more computation time, making it
a better choice for applications with high precision requirements.

Table 6. TP, FP, and FN test results


Network architecture Experiment to- Step TP FP FN
Inception ResNet V2 1 3,000 44 5 1
4,000 42 7 1
5,000 42 6 2
2 3,000 44 5 1
4,000 44 5 1
5,000 43 6 1
ResNet-152 1 3,000 39 9 2
4,000 41 7 2
5,000 40 8 2
2 3,000 40 8 2
4,000 42 6 2
5,000 42 7 1
ResNet-101 1 3,000 36 11 3
4,000 39 9 2
5,000 39 10 1
2 3,000 39 8 3
4,000 41 8 1
5,000 41 8 1
ResNet-50 1 3,000 34 13 3
4,000 37 11 2
5,000 39 9 2
2 3,000 36 11 3
4,000 38 10 2
5,000 41 7 2

Int J Artif Intell, Vol. 14, No. 1, February 2025: 222-230


Int J Artif Intell ISSN: 2252-8938  227

3.5. Precision, recall, and F1 score test results


The score for accuracy, recall, and F1 is calculated using the results of TP, FP, and FN in Table 6. For
the Faster R-CNN architectural network models, Table 7 contains accuracy, recall, and F1 scores. Using the
Faster R-CNN algorithm, the computations demonstrate the inception of ResNet V2, ResNet-152, ResNet-101,
and ResNet-50 network architectural models achieve and average F1 score of 93%, 90%, 88%, and 86%,
respectively. This indicates the Faster R-CNN algorithm, Inception ResNet V2 can detect and classify objects
more effectively.

Table 7. Precision, recall, and F1 score test results


Network architecture Experiment to- Step Precision (%) Recall (%) F1 score (%)
Inception ResNet V2 1 3,000 90 98 94
4,000 86 98 91
5,000 88 95 91
Average 92
2 3,000 90 98 94
4,000 90 98 94
5,000 88 98 92
Average 93
Average 92
ResNet-152 1 3,000 81 95 88
4,000 85 95 90
5,000 83 95 89
Average 89
2 3,000 83 95 89
4,000 88 95 91
5,000 86 98 91
Average 90
Average 90
ResNet-101 1 3,000 77 92 84
4,000 81 95 88
5,000 80 98 88
Average 86
2 3,000 83 93 88
4,000 84 98 90
5,000 84 98 90
Average 89
Average 88
ResNet-50 1 3,000 72 92 81
4,000 77 95 85
5,000 81 95 88
Average 85
2 3,000 77 92 84
4,000 79 95 86
5,000 85 95 90
Average 87
Average 86

3.6. Effects on F1 score results of the entire number of training steps


Figures 4(a) and 4(b) shows the effect of the total number of training steps as a function of F1 score
results in both initial and subsequent trials. There is a training step limit of 3,000 steps on the Inception
ResNet V2 network architecture. For ResNet-152 and ResNet-101 network architectures, the training step limit
is at 4,000 steps. For ResNet 50, the limit on training steps is not yet known because the F1 score constantly
rises as additional training steps are added. The overall limit of the training step is set since the F1 score tends
to be stable or decreasing at a given point in the exercise.

3.7. The effects of average loss on average F1 score


Figure 5 shows the mean impact of a complete loss on an F1 score. The model is considered to perform
better in examining the characteristics of an object because the general mean loss during training is lower.
Hence, we can see an increase in the average F1 score. A failure to detect grape leaf type is generally due to
an object being too short, undefinable, ambiguous, or due to the light affecting the image taken.

3.8. Detection results


Along with the results of the algorithm comparison that has been carried out, we made the choice to
use the Faster R-CNN ResNet-101 algorithm to test the detection accuracy of grape leaf images by considering
the error results during modeling, testing computing time, and the average F1 score. Figures 6(a) to 6(e) shows
Comparison of faster region-based convolutional neural network for algorithms … (Moechammad Sarosa)
228  ISSN: 2252-8938

an example of the detection accuracy of this research test. In addition, the image detection accuracy has reached
100%, which shows that the system is able to analyze the model accurately and detect and categorize it
correctly. Detection error due to the dataset not being varied and the lighting when taking the picture.

(a) (b)

Figure 4. Effects of training on the F1 score (a) first trial and (b) second trial

Figure 5. The effect of average loss on average F1 score

(a) (b) (c) (d) (e)

Figure 6. Detection results of (a) academic, (b) jupiter, (c) local, (d) taldun, and (e) transfiguration

4. CONCLUSION
This study describes techniques for classifying and making use of the Faster R-CNN algorithm to
identify grape leaves also with the ResNet V2, ResNet-152, ResNet-101, and ResNet-50 network architectures,
all of which serve as trained networks for feature extraction. The experiment results show that the average
F1 scores are 93%, 90%, 88%, and 86% respectively, with the best F1 score on the Inception ResNet V2
network architecture (an average loss of 0.943). However, the time needed for training and testing is far more
intensive than the other network architectures. While the architecture with the fastest computing time for

Int J Artif Intell, Vol. 14, No. 1, February 2025: 222-230


Int J Artif Intell ISSN: 2252-8938  229

training or testing is the ResNet-50 network architecture, the F1 score of this architecture is the lowest
compared to others. Furthermore, the training step limit for the Inception ResNet V2 is at step 3,000, while
ResNet 152 and ResNet-101 network architectures have 4,000 step training step limit. Meanwhile, the training
step limit for the ResNet-50 architecture, given that the F1 score continues to increase as the exercise step
increases, is unknown. The research establishes a general limit on the number of training steps as the F1 score
tends to stabilize or decline at some point during an exercise. It can be concluded that the Faster RCNN-based
detection and classification system for grape leaf can analyze object properties more effectively if total losses
during training are lowered.

ACKNOWLEDGEMENTS
The researchers would like to express gratitude towards the Directorate General of Research
Enhancement and Development, Ministry of Education, Culture, Research, and Technology, for the grant
provided with Decree Number 173/SPK/D.D4/PPK.01.APTV/VI/2023 and Agreement/Contract Number
12882/PL2.1/HK/2023.

REFERENCES
[1] M. Sarosa, P. N. Maa’rifah, M. Kusumawardani, and D. F. Al Riza, “Vitis Vinera L. leaf detection using faster R-CNN,” BIO Web
Conference, vol. 117, pp. 1–9, 2024, doi: 10.1051/bioconf/202411701021.
[2] A. Rahemi, J. C. D. Peterson, and K. T. Lund, “Grape Species,” in Grape Rootstocks and Related Species, Springer, Cham, 2022,
pp. 5–21, doi: 10.1007/978-3-030-99407-5_2.
[3] A. M. Walker, C. Heinitz, S. Riaz, and J. Uretsky, “Grape taxonomy and germplasm,” in The Grape Genome, Springer, Cham,
2019, pp. 25–38, doi: 10.1007/978-3-030-18601-2_2.
[4] G. Hasanaliyeva et al., “Effects of production region, production systems and grape type/variety on nutritional quality parameters
of table grapes; results from a UK retail survey,” Foods, vol. 9, no. 12, 2020, doi: 10.3390/foods9121874.
[5] B. Suter, A. D. Irvine, M. Gowdy, Z. Dai, and C. V. Leeuwen, “Adapting wine grape ripening to global change requires a multi-
trait approach,” Frontiers in Plant Science, vol. 12, pp. 1–17, 2021, doi: 10.3389/fpls.2021.624867.
[6] Y. Khan et al., “Antioxidant potential in the leaves of grape varieties (Vitis vinifera L.) grown in different soil compositions,”
Arabian Journal of Chemistry, vol. 14, no. 11, 2021, doi: 10.1016/j.arabjc.2021.103412.
[7] K. Yang, W. Zhong, and F. Li, “Leaf segmentation and classification with a complicated background using deep learning,”
Agronomy, vol. 10, no. 11, 2020, doi: 10.3390/agronomy10111721.
[8] N. A. Othman, N. S. Damanhuri, N. M. Ali, B. C. C. Meng, and A. A. A. Samat, “Plant leaf classification using convolutional
neural network,” 2022 8th International Conference on Control, Decision and Information Technologies (CoDIT), Istanbul, Turkey,
pp. 1043–1048, 2022, doi: 10.1109/CoDIT55151.2022.9804121.
[9] J. Hang, D. Zhang, P. Chen, J. Zhang, and B. Wang, “Classification of plant leaf diseases based on improved convolutional neural
network,” Sensors, vol. 19, no. 19, pp. 1–14, 2019, doi: 10.3390/s19194161.
[10] Y. Toda and F. Okura, “How convolutional neural networks diagnose plant disease,” Plant Phenomics, vol. 2019, 2019, doi:
10.34133/2019/9237136.
[11] A. Pandey and K. Jain, “Plant leaf disease classification using deep attention residual network optimized by opposition-based
symbiotic organisms search algorithm,” Neural Computing and Applications, vol. 34, pp. 21049–21066, 2022, doi: 10.1007/s00521-
022-07587-6.
[12] P. N. Ma’rifah, M. Sarosa, and E. Rohadi, “Garbage classification using Faster R-CNN,” 2023 International Conference on
Electrical and Information Technology (IEIT), Malang, Indonesia, pp. 196–201, 2023, doi: 10.1109/IEIT59852.2023.10335519.
[13] Q. Lv, S. Zhang, and Y. Wang, “Deep learning model of image classification using machine learning,” Advances in Multimedia,
vol. 2022, 2022, doi: 10.1155/2022/3351256.
[14] Z. Li et al., “A high-precision detection method of hydroponic lettuce seedlings status based on improved Faster RCNN,” Computers
and Electronics in Agriculture, vol. 182, 2021, doi: 10.1016/j.compag.2021.106054.
[15] M. M. Taye, “Theoretical understanding of convolutional neural network : concepts, architectures, applications, future directions,”
Computation, vol. 11, no. 52, 2023.
[16] X. Wang, Y. Zhao, and F. Pourpanah, “Recent advances in deep learning,” International Journal of Machine Learning and
Cybernetics, vol. 11, no. 4, pp. 747–750, 2020, doi: 10.1007/s13042-020-01096-5.
[17] S. Liu, H. Ban, Y. Song, M. Zhang, and F. Yang, “Method for detecting Chinese texts in natural scenes based on improved faster
R-CNN,” International Journal of Pattern Recognition and Artificial Intelligence, vol. 34, no. 2, 2020, doi:
10.1142/S021800142053002X.
[18] L. Jiang, J. Chen, H. Todo, Z. Tang, S. Liu, and Y. Li, “Application of a Fast RCNN based on upper and lower layers in face
recognition,” Computational Intelligence and Neuroscience, vol. 2021, 2021, doi: 10.1155/2021/9945934.
[19] P. N. M. Ma’rifah, M. Sarosa, and E. Rohadi, “Comparison of Faster R-CNN ResNet-50 and ResNet-101 methods for recycling
waste detection,” International Journal of Computer Applications Technology and Research, vol. 12, no. 12, pp. 26–32, 2023, doi:
10.7753/ijcatr1212.1006.
[20] W. Liu, S. Liao, W. Hu, X. Liang, and X. Chen, “Learning efficient single-stage pedestrian detectors by asymptotic localization fitting,”
Proceedings of the European Conference on Computer Vision (ECCV), pp. 643–659, 2018, doi: 10.1007/978-3-030-01264-9_38.
[21] W. Zou, Z. Zhang, Y. Peng, C. Xiang, S. Tian, and L. Zhang, “SC-RPN: A strong correlation learning framework for region
proposal,” IEEE Trans. Image Process., vol. 30, pp. 4084–4098, 2021, doi: 10.1109/TIP.2021.3069547.
[22] L. Fan, T. Zhang, and W. Du, “Optical-flow-based framework to boost video object detection performance with object
enhancement,” Expert Systems with Applications, vol. 170, 2020, 2021, doi: 10.1016/j.eswa.2020.114544.
[23] A. Ajit, K. Acharya, and A. Samanta, “A review of convolutional neural networks,” 2020 International Conference on Emerging
Trends in Information Technology and Engineering (ic-ETITE), Vellore, India, pp. 1–5, 2020, doi: 10.1109/ic-
ETITE47903.2020.049.
[24] J. Li et al., “Automatic detection and classification system of domestic waste via multimodel cascaded convolutional neural

Comparison of faster region-based convolutional neural network for algorithms … (Moechammad Sarosa)
230  ISSN: 2252-8938

network,” IEEE Transactions on Industrial Informatics, vol. 18, no. 1, pp. 163–173, 2022, doi: 10.1109/TII.2021.3085669.
[25] L. Ichim and D. Popescu, “Melanoma detection using an objective system based on multiple connected neural networks,” IEEE
Access, vol. 8, pp. 179189–179202, 2020, doi: 10.1109/ACCESS.2020.3028248.
[26] J. Wang, X. He, S. Faming, G. Lu, H. Cong, and Q. Jiang, “A real-time bridge crack detection method based on an improved
Inception ResNet V2 structure,” IEEE Access, vol. 9, pp. 93209–93223, 2021, doi: 10.1109/ACCESS.2021.3093210.
[27] Y. P. Chen, Y. Li, and G. Wang, “An enhanced region proposal network for object detection using deep learning method,” PLoS
One, vol. 13, no. 9, pp. 1–26, 2018, doi: 10.1371/journal.pone.0203897.
[28] W. Gu et al., “High accuracy thyroid tumor image recognition based on hybrid multiple models optimization,” IEEE Access, vol.
8, pp. 128426–128439, 2020, doi: 10.1109/ACCESS.2020.3008401.
[29] N. A. Prasetyo, Pranowo, and A. J. Santoso, “Automatic detection and calculation of palm oil fresh fruit bunches using faster R-
CNN,” International Journal of Applied Science and Engineering, vol. 17, no. 2, pp. 121–134, 2020, doi:
10.6703/IJASE.202005_17(2).121.
[30] M. Sarosa, N. Muna, and E. Rohadi, “Performance of faster R-CNN to detect plastic waste,” International Journal of Advanced
Trends in Computer Science and Engineering, vol. 9, no. 5, pp. 7756–7762, 2020, doi: 10.30534/ijatcse/2020/120952020.

BIOGRAPHIES OF AUTHORS

Moechammad Sarosa received the diploma of engineering technology from


Universite de Nancy I, France in 1989. He obtained his master’s and doctoral degrees from
Bandung Institute of Technology, Indonesia in 2002 and 2007 respectively. He has been the
recipient of several research grants funded by the Ministry of Research, Technology and Higher
Education of the Republic of Indonesia. His current research interests lie in information and
communication technology, artificial intelligence, mobile computing, and IoT. He can be
contacted at email: [email protected].

Puteri Nurul Ma’rifah completed Diploma studies in Telecommunications and


Master of Electrical Engineering concentration in Telecommunication Science and Information
Technology at the State Polytechnic of Malang. In 2020 it passed the 33rd PIMNAS in the field
of Community Service Research, and in 2023 received a Master's Thesis Research grant from
the Ministry of Education, Culture, Research, and Technology. While studying masters, he has
become a teaching assistant helping teach digital image processing and artificial intelligence
workshop courses, after graduating from the master's degree involved in research at the
Integrated Applied Technology Research Center (PRITTI) State Polytechnic of Malang. She
can be contacted at email: [email protected].

Mila Kusumawardani received the M.T. degree in Electro Engineering from


Brawijaya University of Malang, Indonesia, in 2010. She is a lecturer in Digital
Telecommunications Networks Study Program of Electrical Engineering Department of State
Polytechnics of Malang. She can be contacted at email: [email protected].

Dimas Firmanda Al Riza got a Doctoral degree from Kyoto University, Japan in
the field of Bio-sensing Engineering in 2019. The doctoral study was completed with the LPDP
presidential scholarship. He has published more than 70 scientific papers until 2022 including
dozens of them in Q1 reputable international journals. Recently, he received Young Researcher's
Academic Encouragement Award 2021 from The Japanese Society of Agricultural Machinery
and Food Engineers (JSAM). Currently, he is the Head of the Mechatronics Laboratory of Agro-
industry Tools and Machinery, Department of Agricultural Engineering, Faculty of Agricultural
Technology, Universitas Brawijaya. He can be contacted at email: [email protected].

Int J Artif Intell, Vol. 14, No. 1, February 2025: 222-230

You might also like