Image Analysis For Classifying Coffee Bean Quality Using A Multi Feature and Machine Learning Approach
Image Analysis For Classifying Coffee Bean Quality Using A Multi Feature and Machine Learning Approach
Corresponding Author:
Anindita Septiarini
Department of Informatics, Faculty of Engineering, Mulawarman University
St. Sambaliung, No. 9, Samarinda, Indonesia
Email: [email protected]
1. INTRODUCTION
The utilization of computers and associated technologies is seeing fast expansion and
diversification. The application of this is being observed in the field of agriculture. There exist multiple
instances wherein computers have been employed in the agricultural sector, encompassing the monitoring of
fruit ripeness [1], [2], land management [3], and plant development [4], [5]. Coffee, as one of the most
widely consumed beverages globally, holds significant importance as an economic commodity. The global
popularity of coffee can be attributed to its stimulating properties and the preference for its bitter flavor.
Coffee serves as a substantial provider of caffeine for a considerable number of individuals. While previous
research has established a connection between coffee and caffeine intake and adverse health effects, recent
studies have presented evidence suggesting that the compounds found in coffee, such as caffeine, chlorogenic
acids, kahweol, cafestol, and various micronutrients (such as magnesium, potassium, and phosphorus), may
enhance the immune system and provide protection against the development of conditions such as obesity,
diabetes, neurological diseases, osteoporosis, and pancreatic cancer [6].
The coffee industry values quality because of the relationship between coffee bean scarcity,
monetary compensation, and consumer happiness. Robusta coffee beans, widely grown, have a distinct taste
and aroma. Quality of robusta coffee beans depends on soil makeup, climate, and processing method. Coffee
prices depend on bean quality. It is crucial to note that not all growers and coffee shop owners can identify
coffee bean quality. Thus, errors may occur when they lack this expertise. Grading is time-consuming and
produces inconsistent outcomes. Due to visual perception limits, fatigue, and coffee quality evaluation
differences, these inconsistencies occur. Visual characteristics are often used to evaluate robusta coffee
beans. In this situation, computer vision may work. It extracts robusta coffee bean visual traits that highly
predict quality. Color, form, and texture may be needed for this procedure.
Numerous research investigations have been conducted in computer vision, focusing on the application
of food processing. These studies encompass a range of food items, such as banana [7], honey [8], date fruit [9],
palm oil [10], and coffee [11], [12]. The construction of this system involves several general processes, namely
pre-processing, segmentation, feature extraction, and classification [13]. Common pre-processing tasks often
involve scaling [14] and converting color spaces [15]. The Otsu thresholding method [13], K-means clustering
algorithm [16], and edge detection approach [17] were subsequently employed, along with many established
segmentation methodologies. The extractable features that can be considered for edibles encompass color [10],
shape [18], and texture [11]. Moreover, naïve Bayes (NB) [10], k-nearest neighbor (KNN) [19], and support
vector machines (SVM) [10] are frequently utilized in the classification process.
Recent studies have used machine learning to classify coffee beans across agricultural situations.
Color and shape helped identify high-quality beans. The investigation used image processing and machine
learning on an Arduino mega board. Essential criteria were assessed to determine high-quality green coffee
beans. KNN was used to evaluate coffee beans and classify them by defect type. Logic, image processing,
and supervised learning algorithms are executed and coded on the Arduino board. The machine vision system
has an average accuracy of 94.79% for quality and 95.78% for defect-type evaluation. However, long berry
bean classification was 98.05% accurate [20]. Subsequently, a variety of machine learning methodologies
such as SVM, deep neural networks (DNN), and random forest (RF) were utilized to evaluate the
significance of shape and color characteristics in the assessment of faults in coffee beans. The data presented
in the study highlights the significance of color descriptors in the classification of faults in coffee beans. The
classification models consider the most significant features obtained from the average G value of the
component in the RGB color space and the average V value in the HSV color space. All the classifier models
exhibited comparable performance, with the best accuracy value above 88% [12].
Several efforts were presented in order to identify and categorize coffee fruits, as well as to map the
stage of maturation of these fruits during the harvest process. The methodology was executed utilizing the
Darknet framework. The YOLOv3-tiny object identification system identified and categorized coffee fruit.
The collection contains 90 videos from the 2020 arabica coffee (Catuaí 144) harvest, shot at a coffee
harvester's discharge conveyor termination point. A business area in Patos de Minas, Minas Gerais, Brazil
hosted the recordings. The model performed best at around 3300th iteration with an 800×800-pixel image
input. The model had 84% mean average precision (mAP), 82% F1-score, 83% precision, and 82% recall in
the validation set. The precision values for unripe, ripe, and overripe coffee fruits were 86%, 85%, and 80%,
respectively [21]. Another study used a convolutional network on an inexpensive micro-controller board to
classify coffee leaf diseases locally without the internet. Early diagnosis of coffee plant diseases was crucial
for optimal output and production quality. Two datasets and development board images were used in this
investigation. The collection included around 6000 images from six sickness classes. The incorporated
cascade and single-stage systems were 98% and 96% accurate, respectively. These findings imply that these
structures detect coffee plantation diseases [22].
This study presents a proposed method for classifying coffee bean quality based on computer vision
techniques. The method utilizes color, shape, and texture data extracted from the RGB, HSV, and L*a*b
color spaces. The BP was employed as the classifier in this work. The objective of this method was to
ascertain the classification of coffee beans according to their quality by utilizing image data. The quality
types were classified into four classes: intact, perforated, wrinkled, and cracked.
prediction class (intact/perforated/wrinkled/cracked) was determined from selected features in the final step.
Figure 1 illustrates the robusta coffee bean quality classification.
Figure 1. Overview of all steps in the proposed method for quality classification of robusta coffee bean
2.1. Dataset
The dataset in this study was images of robusta coffee beans. JPEG images were taken with a Xiaomi
5A smartphone's inbuilt camera. The coffee bean was placed on a white background in the center of a 28×19×18
cm studio minibox. A 10 cm gap between the camera and the coffee beans was maintained by deliberately
positioning and orienting the camera. Smartphone cameras are 13-megapixel. The image has dimensions of
1560×1560 pixels. The dataset had 1440 coffee bean images, 360 each class. It was divided into four classes:
intact, perforated, wrinkled, or cracked, with the example image shown in Figures 2(a) to 2(d), respectively.
Figure 2. Examples of coffee bean images with various quality types: (a) intact, (b) perforated, (c) wrinkled,
and (d) cracked
𝑑 = ‖𝑝(𝑥, 𝑦) − 𝑐𝑘 ‖ (1)
Image analysis for classifying coffee bean quality using a multi-feature … (Anindita Septiarini)
4244 ISSN: 2252-8938
− Step 3: Assign all the pixels to the nearest centre based on distance d.
− Step 4: After all pixels have been assigned, recalculate new position of the centre using the relation given:
1
𝐶𝑖 = ∑𝑦𝜖𝑐𝑘 ∑𝑥𝜖𝑐𝑘 𝑝(𝑥, 𝑦) (2)
𝑘
− Step 4: Repeat the process until it satisfies the tolerance or error value.
− Step 5: Reshape the cluster pixels into image.
The resulting image of the K-means algorithm is shown in Figure 3(c). Afterward, a morphological
operation was applied using dilation; hence, the coffee bean area approaches the original, and the result is
depicted in Figure 3(d). Subsequently, the setting of the coffee bean area was carried out as the ground for
defining the ROI image boundary based on the yellow box, as shown in Figure 3(e). Accordingly, the formed
ROI images in binary and RGB color space are shown in Figures 3(f) and 3(g).
Figure 3. The resulting image of each process in ROI detection: (a) original image in RGB color space,
(b) L*a*b color space, (c) K-means clustering, (d) morphological operation, (e) setting the area of ROI
image, and (f) ROI image
2.3. Pre-processing
This procedure generated parameter values for feature extraction. This study examined color,
texture, and shape. RGB images must be converted to L*a*b and HSV to create color features, RGB images
to grayscale to create texture features, and binary images to build form features. In order to improve
classification results, the color space must be changed during pre-processing. Agricultural research uses RGB
for object classification. Some investigations have employed L*a*b and HSV color spaces. Using different
color spaces requires a conversion technique that uses RGB values [23]. In (3)-(6) define RGB-to-L*a*b
conversion. In HSV color space, in (3)-(4) calculate hue (H) and then saturation (S) and value (V). S and V
values were computed using as (5) and (6).
𝜃, 𝐵 ≤ 𝐺
𝐻={ (3)
360 − 𝜃, 𝐵 > 𝐺
where:
1
[(𝑅−𝐺)+(𝑅−𝐵)]
𝜃 = 𝑐𝑜𝑠 −1 {[𝑥(𝑅−𝐺)
2
2 +(𝑅−𝐵)(𝐺−𝐵]1/2 } (4)
0, max (𝑅, 𝐺, 𝐵) = 0
𝑆={ min(𝑅,𝐺,𝐵) (5)
1 − max(𝑅,𝐺,𝐵) , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Furthermore, converting the RGB image to a grayscale image was needed; hence, this work applied
texture features. These feature parameters will later be used as input for the classification process. RGB
conversion to grayscale is carried out to produce intensity (I) values using (7) [11].
1
𝐼 = 3 (𝑅 + 𝐺 + 𝐵) (7)
useful color characteristics in many applications. Converting RGB to HSV and L*a*b limits color space
dimensions and features. Texture feature extraction using the gray level co-occurrence matrix (GLCM)
follows. Our form feature extraction approach uses statistical characteristics and shape distance in the binary
picture. Table 1 lists method feature counts. Adding features doesn't necessarily enhance model performance.
Thus, accurate classification requires careful feature selection.
𝑏2
𝑒 = √1 − 𝑎2 (9)
(𝑎2 +𝑏 2)
𝑝 = 2𝜋√ (10)
2
sets with many factors. It makes object recognition work better and has been shown to lower and raise the
accuracy value [25].
2.6. Classification
Data is classified by classification. Machine learning has several plant objects uses. By researching
algorithms and using data to forecast, machine learning automates operations. The algorithm uses a model to
estimate data and make judgments based on sample input instead than following fixed instructions.
Mathematical and statistical models predicted unknown data using training data. This study classified coffee
bean quality using machine learning. Backpropagation neural network (BPNN), linear discriminant analysis
(LDA), KNN, NB, and SVM were used. Previous research on numerous plant specimens used these
methodologies [19], [25].
Table 2. Performance comparison of the classifier with various feature sets based on accuracy value (%)
Without features selection With features selection
Classifier
GLCM Area Based HSV L*a*b RGB PCA
BPNN 94.83 97.86 84.79 97.71 97.71 98.54
KNN 85.21 84.38 85.42 90.83 89.79 90.83
LDA 92.08 91.46 77.92 91.46 68.33 80.63
NB 80.83 58.54 53.96 53.54 48.54 55.83
SVM 83.54 72.71 92.92 97.50 94.83 97.50
In the results obtained via PCA feature selection, the backpropagation classifier attained the
maximum accuracy value of 98.54%. By contrast, the NB classifier yielded the lowest results, achieving an
accuracy of 55.83%. Backpropagation demonstrates high accuracy across all feature test situations without
requiring feature selection. The LDA algorithm achieved the highest accuracy rate of 98.54% across all four
test scenarios. It was performed using the GLCM feature, Area-based method, HSV color space values, and
PCA application for feature selection. The NB classifier is consistently overwhelmed by every trial situation.
The observed results indicate that a combination of texture, shape, and color features, followed by feature
selection to limit the number of features, might lead to high accuracy throughout the classification process.
The BPNN classifier performs better than other classifiers by minimizing errors in each scenario.
4. CONCLUSION
This study classifies robusta coffee beans by quality. There are four types of coffee beans: intact,
perforated, wrinkled, and broken. This procedure involves ROI detection, pre-processing, segmentation,
feature extraction, selection, and classification. Each step is done to accurately classify coffee beans and
determine their quality. The study tested designs with texture, texture with shape, and texture with color
space values (HSV, L*a*b, and RGB). BPNN study routinely outperforms other coffee bean quality
assessment methodologies. It uses the PCA feature selection technique to get the best results on GLCM,
area-based, and L*a*b features with 98.54% accuracy. Using several scenarios and attributes can improve the
variety and quality of this research.
ACKNOWLEDGEMENTS
The author would like to thank the Faculty of Engineering at Mulawarman University in Samarinda,
Indonesia, for providing financial support for the research conducted in 2023 (No. 497/UN17.L1/HK/2023).
REFERENCES
[1] M. R. Fiona, S. Thomas, I. J. Maria, and B. Hannah, “Identification of ripe and unripe citrus fruits using artificial neural
network,” in Journal of Physics: Conference Series, IOP Publishing, 2019, doi: 10.1088/1742-6596/1362/1/012033.
[2] S. Munera, F. Hernández, N. Aleixos, S. Cubero, and J. Blasco, “Maturity monitoring of intact fruit and arils of pomegranate cv.
‘Mollar de Elche’ using machine vision and chemometrics,” Postharvest Biology and Technology, vol. 156, 2019, doi:
10.1016/j.postharvbio.2019.110936.
[3] Hamdani, A. Septiarini, and D. M. Khairina, “Model assessment of land suitability decision making for oil palm plantation,” i n
2016 2nd International Conference on Science in Information Technology, ICSITech 2016: Information Science for Green Society
and Environment, IEEE, 2017, pp. 109–113, doi: 10.1109/ICSITech.2016.7852617.
[4] A. Yudhana, R. Umar, and F. M. Ayudewi, “The monitoring of corn sprouts growth using the region growing methods,” in
Journal of Physics: Conference Series, IOP Publishing, Nov. 2019, doi: 10.1088/1742-6596/1373/1/012054.
[5] A. Sezgin and V. Küçük, “Computer science monitoring plant growth with image processing methods and artificial intelligence supported
agriculture system,” in International Artificial Intelligence and Data Processing Symposium, 2022, pp. 165-176, doi: 10.53070/bbd.1172774.
[6] B. Açıkalın and N. Sanlier, “Coffee and its effects on the immune system,” Trends in Food Science & Technology, vol. 114, pp.
625–632, Aug. 2021, doi: 10.1016/j.tifs.2021.06.023.
[7] E. Piedad, J. I. Larada, G. J. Pojas, and L. V. V Ferrer, “Postharvest classification of banana (Musa acuminata) using tier-based
machine learning,” Postharvest Biology and Technology, vol. 145, pp. 93–100, 2018, doi: 10.1016/j.postharvbio.2018.06.004.
[8] A. Noviyanto and W. H. Abdulla, “Honey botanical origin classification using hyperspectral imaging and machine learning,”
Journal of Food Engineering, vol. 265, 2020, doi: 10.1016/j.jfoodeng.2019.109684.
[9] D. Zhang, D. J. Lee, B. J. Tippetts, and K. D. Lillywhite, “Date maturity and quality evaluation using color distribution analysis
and back projection,” Journal of Food Engineering, vol. 131, pp. 161–169, 2014, doi: 10.1016/j.jfoodeng.2014.02.002.
[10] A. Septiarini, H. Hamdani, T. Hardianti, E. Winarno, S. Suyanto, and E. Irwansyah, “Pixel quantification and color feature extraction
on leaf images for oil palm disease identification,” in 7th International Conference on Electrical, Electronics and Information
Engineering: Technological Breakthrough for Greater New Life, 2021, pp. 1–5, doi: 10.1109/ICEEIE52663.2021.9616645.
[11] W. G. D. Costa, I. D. P. Barbosa, J. E. D. Souza, C. D. Cruz, M. Nascimento, and A. C. B. D. Oliveira, “Machine learning and
statistics to qualify environments through multi-traits in Coffea arabica,” PLoS ONE, vol. 16, no. 1, pp. e0245298–e0245298, Jan.
2021, doi: 10.1371/journal.pone.0245298.
[12] F. F. L. D. Santos, J. T. F. Rosas, R. N. Martins, G. D. M. Araújo, L. D. A. Viana, and J. D. P. Gonçalves, “Quality assessment of coffee
beans through computer vision and machine learning algorithms,” Coffee Science, vol. 15, no. 1, pp. 1–9, 2020, doi: 10.25186/.v15i.1752.
[13] A. Septiarini, H. Hamdani, A. Rifani, Z. Arifin, N. Hidayat, and H. Ismanto, “Multi-class support vector machine for arabica coffee
bean roasting grade classification,” in ICOIACT 2022 - 5th International Conference on Information and Communications Technology:
A New Way to Make AI Useful for Everyone in the New Normal Era, 2022, pp. 407–411, doi: 10.1109/ICOIACT55506.2022.9971897.
[14] R. S. El-Sayed and M. N. El-Sayed, “Classification of vehicles’ types using histogram oriented gradients: comparative study and
modification,” IAES International Journal of Artificial Intelligence, vol. 9, no. 4, pp. 700–712, 2020, doi: 10.11591/ijai.v9.i4.pp700-712.
[15] M. Sharif, M. A. Khan, Z. Iqbal, M. F. Azam, M. I. U. Lali, and M. Y. Javed, “Detection and classification of citrus diseases in
agriculture based on optimized weighted segmentation and feature selection,” Computers and Electronics in Agriculture, vol. 150,
pp. 220–234, 2018, doi: 10.1016/j.compag.2018.04.023.
[16] A. Septiarini, H. Hamdani, S. U. Sari, H. Rahmania Hatta, N. Puspitasari, and W. Hadikurniawati, “Image processing techniques
for tomato segmentation applying k-means clustering and edge detection approach,” in 2021 International Seminar on Machine
Learning, Optimization, and Data Science, ISMODE 2021, IEEE, 2022, pp. 92–96, doi: 10.1109/ISMODE53584.2022.9742740.
[17] J. Lu et al., “Lightweight green citrus fruit detection method for practical environmental applications,” Computers and
Electronics in Agriculture, vol. 215, 2023, doi: 10.1016/j.compag.2023.108205.
[18] J. Liang, K. Huang, H. Lei, Z. Zhong, Y. Cai, and Z. Jiao, “Occlusion-aware fruit segmentation in complex natural environments
under shape prior,” Computers and Electronics in Agriculture, vol. 217, 2024, doi: 10.1016/j.compag.2024.108620.
[19] X. Yang, R. Zhang, Z. Zhai, Y. Pang, and Z. Jin, “Machine learning for cultivar classification of apricots (Prunus armeniaca L.)
based on shape features,” Scientia Horticulturae, vol. 256, 2019, doi: 10.1016/j.scienta.2019.05.051.
[20] H. Li, W. S. Lee, and K. Wang, “Identifying blueberry fruit of different growth stages using natural outdoor color images,”
Computers and Electronics in Agriculture, vol. 106, pp. 91–101, 2014, doi: 10.1016/j.compag.2014.05.015.
[21] García, C. Becerra, and Hoyos, “Quality and defect inspection of green coffee beans using a computer vision system,” Applied
Sciences, vol. 9, no. 19, Oct. 2019, doi: 10.3390/app9194195.
[22] H. C. Bazame, J. P. Molin, D. Althoff, and M. Martello, “Detection, classification, and mapping of coffee fruits during harvest
with computer vision,” Computers and Electronics in Agriculture, vol. 183, 2021, doi: 10.1016/j.compag.2021.106066.
[23] F. G. -Lamont, J. Cervantes, A. López, and L. Rodriguez, “Segmentation of images by color features: A survey,”
Neurocomputing, vol. 292, pp. 1–27, 2018, doi: 10.1016/j.neucom.2018.01.091.
[24] N. Dhanachandra, K. Manglem, and Y. J. Chanu, “Image segmentation using K-means clustering algorithm and subtractive
Image analysis for classifying coffee bean quality using a multi-feature … (Anindita Septiarini)
4248 ISSN: 2252-8938
clustering algorithm,” Procedia Computer Science, vol. 54, pp. 764–771, 2015, doi: 10.1016/j.procs.2015.06.090.
[25] A. Septiarini, R. Saputra, A. Tedjawati, M. Wati, and H. Hamdani, “Pattern recognition of sarong fabric using machine learning
approach based on computer vision for cultural preservation,” International Journal of Intelligent Engineering and Systems, vol.
15, no. 5, pp. 284–295, 2022, doi: 10.22266/ijies2022.1031.26.
BIOGRAPHIES OF AUTHORS