ESIDE: A computationally intelligent method to identify earthworm species (E. fetida) from digital images: Application in taxonomy

Saiqa Andleeb; Wajid Arshad Abbasi; Rozina Ghulam Mustafa; Ghafoor ul Islam; Anum Naseer; Irsa Shafique; Asma Parween; Bushra Shaheen; Muhamad Shafiq; Muhammad Altaf; Syed Ali Abbas

doi:10.1371/journal.pone.0255674

Abstract

Earthworms (Crassiclitellata) being ecosystem engineers significantly affect the physical, chemical, and biological properties of the soil by recycling organic material, increasing nutrient availability, and improving soil structure. The efficiency of earthworms in ecology varies along with species. Therefore, the role of taxonomy in earthworm study is significant. The taxonomy of earthworms cannot reliably be established through morphological characteristics because the small and simple body plan of the earthworm does not have anatomical complex and highly specialized structures. Recently, molecular techniques have been adopted to accurately classify the earthworm species but these techniques are time-consuming and costly. To combat this issue, in this study, we propose a machine learning-based earthworm species identification model that uses digital images of earthworms. We performed a stringent performance evaluation not only through 10-fold cross-validation and on an external validation dataset but also in real settings by involving an experienced taxonomist. In all the evaluation settings, our proposed model has given state-of-the-art performance and justified its use to aid earthworm taxonomy studies. We made this model openly accessible through a cloud-based webserver and python code available at https://sites.google.com/view/wajidarshad/software and https://github.com/wajidarshad/ESIDE.

Citation: Andleeb S, Abbasi WA, Ghulam Mustafa R, Islam Gu, Naseer A, Shafique I, et al. (2021) ESIDE: A computationally intelligent method to identify earthworm species (E. fetida) from digital images: Application in taxonomy. PLoS ONE 16(9): e0255674. https://doi.org/10.1371/journal.pone.0255674

Editor: Tunira Bhadauria, Feroze Gandhi Degree College, INDIA

Received: April 29, 2021; Accepted: July 21, 2021; Published: September 16, 2021

Copyright: © 2021 Andleeb et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: https://github.com/wajidarshad/ESIDE/tree/main/dataset.

Funding: Saiqa Andleeb acknowledges the support of the Higher Education Commission (HEC) of Pakistan for granting research projects under the National Research Program for Universities (NRPU) and Technology Development Fund (TDF)(Grant ids: NRPU-2907 & TDF-02006). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

1. Background

Earthworms (Crassiclitellata) also known as rainworms are terrestrial invertebrates, habitually found in soil, eating a wide variety of organic matter [1]. Earthworms normally burrow during the day and consume soil and extract nutrients from decomposing organic matter such as leaves and roots [2]. Earthworms vibrantly affect soil health by transporting nutrients and minerals from below to the surface through their waste and their passageways ventilate the ground. Earthworms being ecosystem engineers significantly affect the physical, chemical, and biological properties of the soil by recycling organic material, increasing nutrient availability, and improving soil structure [3].

Earthworms with more than 6000 extant species constitute a highly diverse group of burrowing annelids [4]. The ecology niche and life strategies of earthworms vary from species to species [4]. Moreover, the presence of more than one species in mixed cultures leads to lower reproduction rates and ineffective ecosystem engineering [4]. Many important activities performed by pharmacologists, farmers, taxonomists, foresters, conservation biologists, and technical personnel of environmental agencies such as monitoring endangered species, studying biodiversity, and determining the impact of climate change depend on accurate species identification. Therefore, the role of taxonomy in earthworm study is significant as without a reliable taxonomy most of the ecological studies are irrelevant [5]. Based on feeding habits and soil profile, earthworms have been classified into three main categories: epigeic, anecic, and endogeic. These parameters are not sufficient to classify earthworms properly and therefore, for the vast majority, nothing is known about their biology and ecology [4, 5].

Mostly, the taxonomy of earthworms is established using different morphological characteristics such as prostomium shape, position, segment number and shape of clitellum, spermathecae, and the arrangements of setae [4, 5]. However, taxonomic classification based on these morphological characteristics is difficult in most of the species and requires a high degree of expertise because the small and simple body plan of earthworms does not have anatomical complex and highly specialized structures [6, 7]. Recent molecular-based techniques such as 16S rDNA, 18S rDNA, and COI sequences have been successfully used as an alternative approach for earthworm identification [6, 8, 9]. However, these technologies need a wide database of DNA sequences of earthworms and involve enormous time and budget. Therefore, there is an utmost requirement for a computational approach that can assist studies to identify and correctly establish the taxonomy of different earthworm species.

In this study, we propose a machine learning-based earthworm species identification model that uses digital images of earthworms. Machine learning has successfully been used to classify different animal species in digital images [10, 11]. Currently, as a pilot study, we have only focused on Eisenia fetida (tiger worm) because of its wide range of applications in the field of medicine, pharmaceutical, and agriculture and constraints of availability of data in the form of digital images. E. fetida possessed anticoagulation and fibrinolytic activity [12], act as an antitumor, antioxidant, wound healing, and antibacterial agents [12, 13], best for vermicomposting [14]. Here, we aim to develop a method that uses a digital image of an earthworm and predict whether it is E. fetida or not. To the best of our knowledge, this is the first attempt to design such a method to identify earthworm species from digital images.

2. Methods

In this section, we give the detail of our methodology adopted to design and develop a machine learning-based earthworm species identification system and its evaluation.

2.1. Dataset and preprocessing

For this study, we have collected samples of various earthworm species including E. fetida from different localities of Azad Jammu and Kashmir, Pakistan. After carefully washing, we took digital images of all the collected samples with a high-quality digital camera (Nikon D5300). After getting high-quality images, we have sorted out these images into two categories E. fetida and others by consulting taxonomy experts in the field. In this way, we have a dataset of 1240 images of E. fetida and 772 images of other species.

We have cropped and enhanced all the images in our dataset to be used in the proposed machine learning setting. Cropping involves removing the unwanted area of the image to emphasize earthworm only. We cropped images in our dataset by bounding boxes using Adobe Photoshop (version 19). Different image enhancement techniques such as adaptive histogram equalization (AHE) have also been applied to improve the quality and the local contrast of the images [15]. These enhancement techniques have been applied using a python based tool called Scikit-Image (version: 0.17.2) [16].

2.2. Proposed methodology

We propose a machine learning-based approach for the identification of earthworm species (E. fetida) from raw digital images. Various steps involved in earthworm species (E. fetida) identification using our proposed scheme are given in Fig 1 and discussed below (please also see S1 Video). We have used conventional (shallow) machine learning models such as support vector machines (SVMs) and transfer learning paradigm instead of deep learning due to data scarcity.

Download:

Fig 1. A proposed methodology for the development of computer-aided identification of earthworm species (E. Fetida) using machine learning and digital images.

https://doi.org/10.1371/journal.pone.0255674.g001

2.2.1. Feature extraction.

In image analysis, feature extraction is important as it involves obtaining the most relevant details from the image by reducing dimensionality. If we employ a better feature extraction technique, then it can be expected that the extracted features will better represent the relevant information to perform well over the desired task. In this study, we have used both hand-crafted and deep features extracted using different off-the-shelf CNN based pre-trained models on ImageNet [17]. All of these feature representations ϕ(⋅) have been extracted from individual earthworm images. In what follows, we describe the different types of feature representations used in this study.

Hand-Crafted Features

We have used various handcrafted features in this study such as Histogram of Oriented Gradients (HOG) [18], scale-invariant feature transform (SIFT) [19], DAISY [20], Grey Level Co-Occurrence matrix (GLCM) [21], HAAR features [22], Local binary patterns (LBP) [23]. We have extracted these features from all the images in our dataset using Scikit-image (version: 0.17.2) and OpenCV (version: 3.4.2) [16, 24].

Deep Feature Maps

We have used different off-the-shelf CNN-based pre-trained models on ImageNet to extract useful feature maps from the raw digital images of earthworms in our datasets [17]. These pre-trained models include Resnet50 [25], InceptionV3 [26], Xception [27], VGG16 [28], NASNetLarge [29], DenseNet121 [30]. The selection of these pre-trained CNN-based models was based on their reported accuracy. Preprocessing such as pixel scaling and resizing expected by the pre-trained models (varies from model to model) have been applied before extracting the required feature maps. We applied resizing with resampling using pixel area relation through a library for computer vision in python called OpenCV [24].

2.2.2. Classifiers for the identification of earthworm species.

In the proposed machine learning setting, we have posed E. fetida identification from digital images as a classification problem. For this purpose, we represent each digital image in our dataset as an example of the form (I_i, y_i) where I_i is an earthworm image and y_i ∈ {+1, −1} is its associated label that indicates whether I_i is E. fetida (+1) or not (-1). For a given image I_i in our dataset, we extract hand-crafted features and deep feature maps which can be denoted as a feature vector x_i. Our objective is to learn a function f(∙) using these feature vectors to identify whether an input image belongs to E. fetida or some other species. For this purpose, we have used three different classifiers: classical Support Vector Machine (SVM), Random Forest (RF), and Gradient Boosting Machine (XGBoost) [31–33].

Support Vector Classification (SVC)

We have used Support Vector Machines (SVMs) for the detection of earthworm species through a digital image by learning a function f(x) = 〈w,x〉 with w as parameters to be learned from the training data {(x_i, y_i)|i = 1,2,…,N} where, x_i is the feature representation of an earthworm image I_i. The optimal value of the w is obtained in SVM by solving the following optimization problem [32].

(1)

The objective function in Eq (1) maximizes the margin while minimizing margin violations (or slacks ξ) [32]. The hyper-parameter controls the tradeoff between margin maximization and margin violation. We used both linear and radial basis function (RBF) kernels and coarsely optimized the values of λ and γ using grid search with Scikit-learn (version: 0.23) [34, 35].

Random Forest Classification (RFC)

Random forest is a supervised learning algorithm that builds an ensemble of decision trees, usually trained with the “bagging” method. A random forest operates by constructing several decision trees in parallel during training and outputs the mean of the classes as the prediction of all trees [31]. It usually performs better on problems having features with non-linear relationships. Each classification tree in the RF is constructed on randomly sampled subsets of input features. In this study, we have optimized RF for the number of decision trees in the forest, the maximum number of features considered for splitting a node, the maximum number of levels in each decision tree, and a minimum number of samples required to split. We have also seen this machine learning technique effectively in use in many other studies [36–39].

XGBoost Classification (XGBC)

XGBoost is a boosting-based ensemble learning technique that chains several weak learners into stronger ones in an iterative way [33, 40]. At the core of XGBoost, there is boosting that lessens biases by supervising the model about what errors have been made by previous models and variance by maneuvering multiple models. In the XGBoost technique, each subsequent model is mentored using the residuals (the variance between the predicted and actual values), then models are fitted via subjective differentiable loss function and gradient descent optimization method by pushing the limits of computational resources for efficient throughput. Here, we used trees as default base learners and optimized XGBoost in terms of the number of boosting iterations, the learning rate, booster, maximum depth, and subsample ratio by employing grid search technique and a python-based package called XGBoost (version: 0.7) [35, 40].

2.3. Experimental setup

To train a machine learning-based model and to evaluate its performance to predict the earthworm species from a digital image, we have followed the following experimental setup. We have divided the preprocessed earthworm images into two sub-sets: train-test set (80%), held-out validation set (20%), and reported performance metrics on both the sub-sets. For the train-test set, we have used stratified 10-fold cross-validation (CV). In the stratified 10-fold CV, we have shuffled images in our datasets and then split them into 10 groups by preserving the percentage of samples for each class. 10 models have been trained and evaluated with each group given a chance to be held out as the test set [41]. Average values of performance metrics across folds have been reported in this study. Similarly, to further confirm the robustness of the generalization performance of our proposed technique, we have used the held-out validation dataset to mitigate the possible bias performance improvement under 10-fold CV with hyperparameter tuning using the same training set. For the held-out validation set, we trained the classification models using the whole train-test set and tested them on the validation set. We have used the area under the ROC curve (ROC), the area under the precision-recall curve (PR), and F-measure as performance measures for model evaluation and performance assessment [41–43]. We have computed these metrics using Scikit-learn (version: 0.23) [34]. We used grid search over the training data to find the optimal values of hyper-parameters of different classification models using a python based open-access library for machine learning called Scikit-learn [34, 35]. This automatic grid search was performed once using the train-test set and then the optimum values of hyperparameters have been used during the whole cross-validation process.

2.4. Statistical analysis

We have also performed the statistical analysis by checking the statistical significance of obtained performance (F1 score) across different features and classifiers. For this purpose, we have used Wilcoxon test [44]. The test considers the null hypothesis as the median of the performance scores of different models are equal. Alternatively, the performance scores of different models are different. We have used the test statistics at a 95% confidence interval (or α = 0.05). We have performed this analysis using an online webserver (URL: https://tec.citius.usc.es/stac/) [45].

2.5. Webserver to identify E. fetida

We have developed and deployed a user-friendly cloud-based webserver that uses the optimal machine learning model for E. fetida identification. This webserver takes an earthworm digital image and predicts whether this image belongs to E. fetida or not. The webserver is available at https://sites.google.com/view/wajidarshad/software.

3. Results and discussion

In this study, we have proposed and developed a machine learning-based computational model to identify earthworm species. For this purpose, we have used a dataset of earthworm digital images, various machine learning algorithms, and different features. In what follows we present results showing the earthworm species identification performance of our proposed method using digital images across different evaluation schemes.

3.1. Earthworm species identification performance using handcrafted features

We have trained various classical machine learning models for the classification of E. fetida versus other earthworm species with a range of handcrafted features and evaluated them using both 10-fold cross-validation (CV) and on an external validation dataset. In both the adopted settings results are shown in Tables 1 and 2. Using 10-fold CV, we observed a maximum F1-score of 0.71 (p<0.05) along with 0.75, and 0.86 as the area under the ROC curve, and the area under the PR curve, respectively with Support Vector Classifier and HAAR feature representation (Table 1). The F1 score of 0.71 implies that using a trained machine learning model with SVMs and HAAR features, we have been able to classify E. fetida correctly approximately 70% of the time. To further confirm the generalization performance of our trained machine learning models with handcrafted features, we have used an external validation dataset. Using an external validation dataset, we observed a maximum F1-score of 0.75 along with 0.77, and 0.85 as the area under the ROC curve, and the area under the PR curve, respectively with Support Vector Classifier and HAAR feature representation (Table 2). We have also observed consistently better performance of HAAR feature representation across RF and XGB classifiers.

Download:

Table 1. Predictive performance for earthworm species prediction across different classification models and handcrafted features using 10-fold CV (E. fetida vs others).

https://doi.org/10.1371/journal.pone.0255674.t001

Download:

Table 2. Predictive performance for earthworm species prediction across different classification models and handcrafted features on external validation dataset (E. fetida vs others).

https://doi.org/10.1371/journal.pone.0255674.t002

3.2. Earthworm species identification performance using deep feature maps

We have trained various shallow machine learning models for the classification of E. fetida versus other earthworm species with a range of deep learning-based feature maps and evaluated using both 10-fold cross-validation (CV) and on an external validation dataset. The results of our evaluation in both settings are shown in Tables 3 and 4 and Fig 2. Using 10-fold CV, we observed a maximum F1-score of 0.80 (p<0.05) along with 0.95, and 0.98 as the area under the ROC curve, and the area under the PR curve, respectively with Support Vector Classifier and Densent121 feature map (Table 3). PR score of 0.98 represents high accuracy with fewer false positives (Classifying Other Species as E. fetida) and false negatives(Classifying E. fetida as Other Species). To confirm further the classification accuracy of our trained machine learning models with deep feature maps, we have used an external validation dataset. Using an external validation dataset, we observed a maximum F1-score of 0.92 along with 0.96, and 0.99 as the area under the ROC curve, and the area under the PR curve, respectively with Support Vector Classifier and Densent121 feature map (Table 4; Fig 2). F1 score of 0.92 and PR score of 0.98 represent a consistently improved performance of our proposed machine learning model to predict E. fetida class with high precision and recall (i.e. by producing fewer false positives and false negatives). By observing these results obtained through deep feature maps and comparing with the results obtained through handcrafted features, we can easily conclude that deep feature maps perform consistently better across all the classification algorithms. This performance improvement of deep feature maps over handcrafted features has already been reported in a previous study on X-ray scans [46]. These results justify the use of the proposed earthworm species classification model in a real setting.

Download:

Fig 2. Receiver Operating Characteristic (ROC) and Precision-Recall (PR) curves showing predictive performance of our proposed model for the classification of digital images of earthworms across different classifiers (SVM, RF, XGB) and DenseNet feature map on an external validation dataset.

E. fetida vs others: ROC(A), PR(B).

https://doi.org/10.1371/journal.pone.0255674.g002

Download:

Table 3. Predictive performance for earthworm species prediction across different classification models and deep feature maps using 10-fold CV (E. fetida vs others).

https://doi.org/10.1371/journal.pone.0255674.t003

Download:

Table 4. Predictive performance for earthworm species prediction across different classification models and deep feature maps on external validation dataset (E. fetida vs others).

https://doi.org/10.1371/journal.pone.0255674.t004

3.3. Predictive performance of the proposed model under a real setting

We have also checked the generalization performance of our best-trained model for earthworm species identification in a real setting under the supervision of an experienced taxonomist at the Vermi Tech Unit, University of Azad Jammu and Kashmir. For this purpose, we have used 30 digital images of different classes (15 E. fetida, and 15 other species). A subset of these images is shown in Fig 3. Results obtained through this evaluation are shown as a confusion matrix in Fig 4. Our proposed system (ESIDE) has been able to classify correctly 15 out of 15 provided images of E. fetida (Fig 4). Similarly, for the provided images of other species, our system classified correctly 11 out of 15 images, and 4 as E, fetida (Fig 4). These results show a reasonable performance of our proposed system and justify the use of this model in real settings.

Download:

Fig 3. Some of the images of earthworm species (E. fetida and other) used to test ESIDE in a real use under the supervision of a qualified taxonomist.

https://doi.org/10.1371/journal.pone.0255674.g003

Download:

Fig 4. Confusion matrices: Showing the performance of our proposed model for earthworm species identification in a real setting under the supervision of a qualified taxonomist.

https://doi.org/10.1371/journal.pone.0255674.g004

4. Conclusions and future work

In this study, we have proposed a machine learning-based model called ESIDE to classify earthworm species by using digital images. We have used both deep feature maps and handcrafted features in this study. Through a series of simulation experiments using both types of features and three different classification algorithms, we have shown that deep feature maps perform consistently better in comparison to handcrafted features while identifying earthworm species through digital images. The stringent performance evaluation through 10-fold CV, on an external validation dataset, and in a use under real settings show that our proposed system can effectively be used to identify E. fetida from a digital image. The use of our proposed model can aid biologists in taxonomical studies of earthworms. We have made our proposed system accessible through a publically open cloud-based webserver and open-source code. In the future, we will try to develop a generic model for the identification of maximum species of earthworm by incorporating more data.

Supporting information

S1 Video. A short video showing the scientific significance, workflow and design of the current study.

https://doi.org/10.1371/journal.pone.0255674.s001

(M4V)

Acknowledgments

We thank the reviewers and the editor for their valuable feedback and suggestions to improve the presentation of this work.

References

1. Edwards CA, Hendrix PF, Arancon NQ, Dumanig F. Biology and Ecology of Earthworms. 4th ed. Springer US; 2021. Available: https://www.springer.com/gp/book/9780387749426
2. Bonkowski M, Griffiths BS, Ritz K. Food preferences of earthworms for soil fungi. Pedobiologia. 2000;44: 666–676.
- View Article
- Google Scholar
3. Domínguez J, Gómez-Brandón M. Vermicomposting: Composting with Earthworms to Recycle Organic Wastes. Manag Org Waste. 2012 [cited 20 Dec 2020].
- View Article
- Google Scholar
4. Earthworms and Vermicomposting. [cited 20 Dec 2020]. https://doi.org/10.5772/intechopen.76088
5. Velando A, Ferreiro A. Are Eisenia fetida (Savigny, 1826) and Eisenia andrei Bouche (1972) (Oligochaeta, Lumbricidae) different biological species? Pedobiologia. 2005;49: 81–87.
- View Article
- Google Scholar
6. Pop AA, Wink M, Pop VV. Use of 18S, 16S rDNA and cytochrome c oxidase sequences in earthworm taxonomy (Oligochaeta, Lumbricidae): The 7th international symposium on earthworm ecology · Cardiff · Wales · 2002. Pedobiologia. 2003;47: 428–433.
- View Article
- Google Scholar
7. Pérez-Losada M, Ricoy M, Marshall JC, Domínguez J. Phylogenetic assessment of the earthworm Aporrectodea caliginosa species complex (Oligochaeta: Lumbricidae) based on mitochondrial and nuclear DNA sequences. Mol Phylogenet Evol. 2009;52: 293–302. pmid:19364539
- View Article
- PubMed/NCBI
- Google Scholar
8. Boyer S, Wratten SD. Using molecular tools to identify New Zealand endemic earthworms in a mine restoration project. Zool Middle East. 2010;51: 31–40.
- View Article
- Google Scholar
9. Pop AA, Cech G, Wink M, Csuzdi C, Pop VV. Application of 16S, 18S rDNA and COI sequences in the molecular systematics of the earthworm family Lumbricidae (Annelida, Oligochaeta). Eur J Soil Biol. 2007;43: S43–S52.
- View Article
- Google Scholar
10. Wäldchen J, Mäder P. Machine learning for image based species identification. Methods Ecol Evol. 2018;9: 2216–2225. https://doi.org/10.1111/2041-210X.13075
- View Article
- Google Scholar
11. Tabak MA, Norouzzadeh MS, Wolfson DW, Sweeney SJ, Vercauteren KC, Snow NP, et al. Machine learning to classify animal species in camera trap images: Applications in ecology. Methods Ecol Evol. 2019;10: 585–590. https://doi.org/10.1111/2041-210X.13120
- View Article
- Google Scholar
12. Matausic-Pisl M, Tomicic M, Micek V, Grdisa M. Influences of earthworm extract G-90 on haematological and haemostatic parameters in Wistar rats. Eur Rev Med Pharmacol Sci. 2011;15: 71–78. pmid:21381501
- View Article
- PubMed/NCBI
- Google Scholar
13. Andleeb S, Ejaz M, Awan UA, Ali S, Kiyani A, Shafique I, et al. In vitro screening of mucus and solvent extracts of Eisenia foetida against human bacterial and fungal pathogens. Pak J Pharm Sci. 2016;29: 969–977. pmid:27166541
- View Article
- PubMed/NCBI
- Google Scholar
14. Bellitürk K, Arshad A. Vermicomposting Technology For Solid Waste Management in Sustainable Agricultural Production. 2016.
- View Article
- Google Scholar
15. Hummel R. Image enhancement by histogram transformation. Comput Graph Image Process. 1977;6: 184–195.
- View Article
- Google Scholar
16. Walt S van der, Schönberger JL, Nunez-Iglesias J, Boulogne F, Warner JD, Yager N, et al. scikit-image: image processing in Python. PeerJ. 2014;2: e453. pmid:25024921
- View Article
- PubMed/NCBI
- Google Scholar
17. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L. ImageNet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009. pp. 248–255.
- View Article
- Google Scholar
18. Dalal N, Triggs B. Histograms of oriented gradients for human detection. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05). 2005. pp. 886–893 vol. 1.
- View Article
- Google Scholar
19. Križaj J, Štruc V, Pavešić N. Adaptation of SIFT Features for Robust Face Recognition. In: Campilho A, Kamel M, editors. Image Analysis and Recognition. Berlin, Heidelberg: Springer; 2010. pp. 394–404. https://doi.org/10.1007/978-3-642-13772-3_40
20. Tola E, Lepetit V, Fua P. DAISY: An Efficient Dense Descriptor Applied to Wide-Baseline Stereo. IEEE Trans Pattern Anal Mach Intell. 2010;32: 815–830. pmid:20299707
- View Article
- PubMed/NCBI
- Google Scholar
21. Singh S, Srivastava D, Agarwal S. GLCM and its application in pattern recognition. 2017 5th International Symposium on Computational and Business Intelligence (ISCBI). 2017. pp. 20–25.
- View Article
- Google Scholar
22. Viola P, Jones M. Rapid object detection using a boosted cascade of simple features. Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition CVPR 2001. 2001. p. I–I.
- View Article
- Google Scholar
23. Ojala T, Pietikäinen M, Harwood D. A comparative study of texture measures with classification based on featured distributions. Pattern Recognit. 1996;29: 51–59.
- View Article
- Google Scholar
24. Bradski G, Kaehler A. Learning openCV: computer vision with the openCV library. In: CERN Document Server [Internet]. O’Reilly; 2008 [cited 18 Dec 2020]. Available: https://cds.cern.ch/record/1158218
25. He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition. ArXiv151203385 Cs. 2015 [cited 29 Nov 2020]. Available: http://arxiv.org/abs/1512.03385
26. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the Inception Architecture for Computer Vision. ArXiv151200567 Cs. 2015 [cited 29 Nov 2020]. Available: http://arxiv.org/abs/1512.00567
27. Chollet F. Xception: Deep Learning with Depthwise Separable Convolutions. ArXiv161002357 Cs. 2017 [cited 29 Nov 2020]. Available: http://arxiv.org/abs/1610.02357
28. Simonyan K, Zisserman A. Very Deep Convolutional Networks for Large-Scale Image Recognition. ArXiv14091556 Cs. 2015 [cited 29 Nov 2020]. Available: http://arxiv.org/abs/1409.1556
29. Zoph B, Vasudevan V, Shlens J, Le QV. Learning Transferable Architectures for Scalable Image Recognition. ArXiv170707012 Cs Stat. 2018 [cited 29 Nov 2020]. Available: http://arxiv.org/abs/1707.07012
- View Article
- Google Scholar
30. Huang G, Liu Z, van der Maaten L, Weinberger KQ. Densely Connected Convolutional Networks. ArXiv160806993 Cs. 2018 [cited 29 Nov 2020]. Available: pmid:29997087
- View Article
- PubMed/NCBI
- Google Scholar
31. Breiman L. Random Forests. Mach Learn. 2001;45: 5–32.
- View Article
- Google Scholar
32. Cortes C, Vapnik V. Support-Vector Networks. Mach Learn. 1995;20: 273–297. pmid:11099962
- View Article
- PubMed/NCBI
- Google Scholar
33. Friedman JH. Greedy function approximation: A gradient boosting machine. Ann Stat. 2001;29: 1189–1232.
- View Article
- Google Scholar
34. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. J Mach Learn Res. 2011;12: 2825−2830.
- View Article
- Google Scholar
35. Bergstra J, Bengio Y. Random Search for Hyper-Parameter Optimization. J Mach Learn Res. 2012;13: 281–305.
- View Article
- Google Scholar
36. Abbasi WA, Hassan FU, Yaseen A, Minhas FUAA. ISLAND: In-Silico Prediction of Proteins Binding Affinity Using Sequence Descriptors. 2017 [cited 8 Jan 2018]. Available: https://128.84.21.199/abs/1711.10540
- View Article
- Google Scholar
37. Li H, Leung K-S, Wong M-H, Ballester PJ. Substituting random forest for multiple linear regression improves binding affinity prediction of scoring functions: Cyscore as a case study. BMC Bioinformatics. 2014;15: 291. pmid:25159129
- View Article
- PubMed/NCBI
- Google Scholar
38. Ballester PJ, Mitchell JBO. A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking. Bioinforma Oxf Engl. 2010;26: 1169–1175. pmid:20236947
- View Article
- PubMed/NCBI
- Google Scholar
39. Moal IH, Agius R, Bates PA. Protein-protein binding affinity prediction on a diverse set of structures. Bioinformatics. 2011; btr513. pmid:21903632
- View Article
- PubMed/NCBI
- Google Scholar
40. Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: ACM; 2016. pp. 785–794. https://doi.org/10.1145/2939672.2939785
41. Abbasi WA, Minhas FUAA. Issues in performance evaluation for host–pathogen protein interaction prediction. J Bioinform Comput Biol. 2016;14: 1650011. pmid:26932275
- View Article
- PubMed/NCBI
- Google Scholar
42. Davis J, Goadrich M. The Relationship Between Precision-Recall and ROC Curves. Proceedings of the 23rd International Conference on Machine Learning. New York, NY, USA: ACM; 2006. pp. 233–240. https://doi.org/10.1145/1143844.1143874
43. Tharwat A. Classification assessment methods. Appl Comput Inform. 2020; ahead-of-print.
- View Article
- Google Scholar
44. Chandra TB, Verma K, Singh BK, Jain D, Netam SS. Coronavirus disease (COVID-19) detection in Chest X-Ray images using majority voting based classifier ensemble. Expert Syst Appl. 2021;165: 113909. pmid:32868966
- View Article
- PubMed/NCBI
- Google Scholar
45. Rodriguez-Fdez I, Canosa A, Mucientes M, Bugarin A. STAC: A web platform for the comparison of algorithms using statistical tests. 2015 [cited 22 Jan 2021].
- View Article
- Google Scholar
46. Chandra TB, Verma K, Singh BK, Jain D, Netam SS. Coronavirus disease (COVID-19) detection in Chest X-Ray images using majority voting based classifier ensemble. Expert Syst Appl. 2021;165: 113909. pmid:32868966
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Edwards CA, Hendrix PF, Arancon NQ, Dumanig F. Biology and Ecology of Earthworms. 4th ed. Springer US; 2021. Available: https://www.springer.com/gp/book/9780387749426

[ref2] 2. Bonkowski M, Griffiths BS, Ritz K. Food preferences of earthworms for soil fungi. Pedobiologia. 2000;44: 666–676.
View Article
Google Scholar

[3] View Article

[4] Google Scholar

[ref3] 3. Domínguez J, Gómez-Brandón M. Vermicomposting: Composting with Earthworms to Recycle Organic Wastes. Manag Org Waste. 2012 [cited 20 Dec 2020].
View Article
Google Scholar

[6] View Article

[7] Google Scholar

[ref4] 4. Earthworms and Vermicomposting. [cited 20 Dec 2020]. https://doi.org/10.5772/intechopen.76088

[ref5] 5. Velando A, Ferreiro A. Are Eisenia fetida (Savigny, 1826) and Eisenia andrei Bouche (1972) (Oligochaeta, Lumbricidae) different biological species? Pedobiologia. 2005;49: 81–87.
View Article
Google Scholar

[10] View Article

[11] Google Scholar

[ref6] 6. Pop AA, Wink M, Pop VV. Use of 18S, 16S rDNA and cytochrome c oxidase sequences in earthworm taxonomy (Oligochaeta, Lumbricidae): The 7th international symposium on earthworm ecology · Cardiff · Wales · 2002. Pedobiologia. 2003;47: 428–433.
View Article
Google Scholar

[13] View Article

[14] Google Scholar

[ref7] 7. Pérez-Losada M, Ricoy M, Marshall JC, Domínguez J. Phylogenetic assessment of the earthworm Aporrectodea caliginosa species complex (Oligochaeta: Lumbricidae) based on mitochondrial and nuclear DNA sequences. Mol Phylogenet Evol. 2009;52: 293–302. pmid:19364539
View Article
PubMed/NCBI
Google Scholar

[16] View Article

[17] PubMed/NCBI

[18] Google Scholar

[ref8] 8. Boyer S, Wratten SD. Using molecular tools to identify New Zealand endemic earthworms in a mine restoration project. Zool Middle East. 2010;51: 31–40.
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref9] 9. Pop AA, Cech G, Wink M, Csuzdi C, Pop VV. Application of 16S, 18S rDNA and COI sequences in the molecular systematics of the earthworm family Lumbricidae (Annelida, Oligochaeta). Eur J Soil Biol. 2007;43: S43–S52.
View Article
Google Scholar

[23] View Article

[24] Google Scholar

[ref10] 10. Wäldchen J, Mäder P. Machine learning for image based species identification. Methods Ecol Evol. 2018;9: 2216–2225. https://doi.org/10.1111/2041-210X.13075
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref11] 11. Tabak MA, Norouzzadeh MS, Wolfson DW, Sweeney SJ, Vercauteren KC, Snow NP, et al. Machine learning to classify animal species in camera trap images: Applications in ecology. Methods Ecol Evol. 2019;10: 585–590. https://doi.org/10.1111/2041-210X.13120
View Article
Google Scholar

[29] View Article

[30] Google Scholar

[ref12] 12. Matausic-Pisl M, Tomicic M, Micek V, Grdisa M. Influences of earthworm extract G-90 on haematological and haemostatic parameters in Wistar rats. Eur Rev Med Pharmacol Sci. 2011;15: 71–78. pmid:21381501
View Article
PubMed/NCBI
Google Scholar

[32] View Article

[33] PubMed/NCBI

[34] Google Scholar

[ref13] 13. Andleeb S, Ejaz M, Awan UA, Ali S, Kiyani A, Shafique I, et al. In vitro screening of mucus and solvent extracts of Eisenia foetida against human bacterial and fungal pathogens. Pak J Pharm Sci. 2016;29: 969–977. pmid:27166541
View Article
PubMed/NCBI
Google Scholar

[36] View Article

[37] PubMed/NCBI

[38] Google Scholar

[ref14] 14. Bellitürk K, Arshad A. Vermicomposting Technology For Solid Waste Management in Sustainable Agricultural Production. 2016.
View Article
Google Scholar

[40] View Article

[41] Google Scholar

[ref15] 15. Hummel R. Image enhancement by histogram transformation. Comput Graph Image Process. 1977;6: 184–195.
View Article
Google Scholar

[43] View Article

[44] Google Scholar

[ref16] 16. Walt S van der, Schönberger JL, Nunez-Iglesias J, Boulogne F, Warner JD, Yager N, et al. scikit-image: image processing in Python. PeerJ. 2014;2: e453. pmid:25024921
View Article
PubMed/NCBI
Google Scholar

[46] View Article

[47] PubMed/NCBI

[48] Google Scholar

[ref17] 17. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L. ImageNet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009. pp. 248–255.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref18] 18. Dalal N, Triggs B. Histograms of oriented gradients for human detection. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05). 2005. pp. 886–893 vol. 1.
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref19] 19. Križaj J, Štruc V, Pavešić N. Adaptation of SIFT Features for Robust Face Recognition. In: Campilho A, Kamel M, editors. Image Analysis and Recognition. Berlin, Heidelberg: Springer; 2010. pp. 394–404. https://doi.org/10.1007/978-3-642-13772-3_40

[ref20] 20. Tola E, Lepetit V, Fua P. DAISY: An Efficient Dense Descriptor Applied to Wide-Baseline Stereo. IEEE Trans Pattern Anal Mach Intell. 2010;32: 815–830. pmid:20299707
View Article
PubMed/NCBI
Google Scholar

[57] View Article

[58] PubMed/NCBI

[59] Google Scholar

[ref21] 21. Singh S, Srivastava D, Agarwal S. GLCM and its application in pattern recognition. 2017 5th International Symposium on Computational and Business Intelligence (ISCBI). 2017. pp. 20–25.
View Article
Google Scholar

[61] View Article

[62] Google Scholar

[ref22] 22. Viola P, Jones M. Rapid object detection using a boosted cascade of simple features. Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition CVPR 2001. 2001. p. I–I.
View Article
Google Scholar

[64] View Article

[65] Google Scholar

[ref23] 23. Ojala T, Pietikäinen M, Harwood D. A comparative study of texture measures with classification based on featured distributions. Pattern Recognit. 1996;29: 51–59.
View Article
Google Scholar

[67] View Article

[68] Google Scholar

[ref24] 24. Bradski G, Kaehler A. Learning openCV: computer vision with the openCV library. In: CERN Document Server [Internet]. O’Reilly; 2008 [cited 18 Dec 2020]. Available: https://cds.cern.ch/record/1158218

[ref25] 25. He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition. ArXiv151203385 Cs. 2015 [cited 29 Nov 2020]. Available: http://arxiv.org/abs/1512.03385

[ref26] 26. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the Inception Architecture for Computer Vision. ArXiv151200567 Cs. 2015 [cited 29 Nov 2020]. Available: http://arxiv.org/abs/1512.00567

[ref27] 27. Chollet F. Xception: Deep Learning with Depthwise Separable Convolutions. ArXiv161002357 Cs. 2017 [cited 29 Nov 2020]. Available: http://arxiv.org/abs/1610.02357

[ref28] 28. Simonyan K, Zisserman A. Very Deep Convolutional Networks for Large-Scale Image Recognition. ArXiv14091556 Cs. 2015 [cited 29 Nov 2020]. Available: http://arxiv.org/abs/1409.1556

[ref29] 29. Zoph B, Vasudevan V, Shlens J, Le QV. Learning Transferable Architectures for Scalable Image Recognition. ArXiv170707012 Cs Stat. 2018 [cited 29 Nov 2020]. Available: http://arxiv.org/abs/1707.07012
View Article
Google Scholar

[75] View Article

[76] Google Scholar

[ref30] 30. Huang G, Liu Z, van der Maaten L, Weinberger KQ. Densely Connected Convolutional Networks. ArXiv160806993 Cs. 2018 [cited 29 Nov 2020]. Available: pmid:29997087
View Article
PubMed/NCBI
Google Scholar

[78] View Article

[79] PubMed/NCBI

[80] Google Scholar

[ref31] 31. Breiman L. Random Forests. Mach Learn. 2001;45: 5–32.
View Article
Google Scholar

[82] View Article

[83] Google Scholar

[ref32] 32. Cortes C, Vapnik V. Support-Vector Networks. Mach Learn. 1995;20: 273–297. pmid:11099962
View Article
PubMed/NCBI
Google Scholar

[85] View Article

[86] PubMed/NCBI

[87] Google Scholar

[ref33] 33. Friedman JH. Greedy function approximation: A gradient boosting machine. Ann Stat. 2001;29: 1189–1232.
View Article
Google Scholar

[89] View Article

[90] Google Scholar

[ref34] 34. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. J Mach Learn Res. 2011;12: 2825−2830.
View Article
Google Scholar

[92] View Article

[93] Google Scholar

[ref35] 35. Bergstra J, Bengio Y. Random Search for Hyper-Parameter Optimization. J Mach Learn Res. 2012;13: 281–305.
View Article
Google Scholar

[95] View Article

[96] Google Scholar

[ref36] 36. Abbasi WA, Hassan FU, Yaseen A, Minhas FUAA. ISLAND: In-Silico Prediction of Proteins Binding Affinity Using Sequence Descriptors. 2017 [cited 8 Jan 2018]. Available: https://128.84.21.199/abs/1711.10540
View Article
Google Scholar

[98] View Article

[99] Google Scholar

[ref37] 37. Li H, Leung K-S, Wong M-H, Ballester PJ. Substituting random forest for multiple linear regression improves binding affinity prediction of scoring functions: Cyscore as a case study. BMC Bioinformatics. 2014;15: 291. pmid:25159129
View Article
PubMed/NCBI
Google Scholar

[101] View Article

[102] PubMed/NCBI

[103] Google Scholar

[ref38] 38. Ballester PJ, Mitchell JBO. A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking. Bioinforma Oxf Engl. 2010;26: 1169–1175. pmid:20236947
View Article
PubMed/NCBI
Google Scholar

[105] View Article

[106] PubMed/NCBI

[107] Google Scholar

[ref39] 39. Moal IH, Agius R, Bates PA. Protein-protein binding affinity prediction on a diverse set of structures. Bioinformatics. 2011; btr513. pmid:21903632
View Article
PubMed/NCBI
Google Scholar

[109] View Article

[110] PubMed/NCBI

[111] Google Scholar

[ref40] 40. Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: ACM; 2016. pp. 785–794. https://doi.org/10.1145/2939672.2939785

[ref41] 41. Abbasi WA, Minhas FUAA. Issues in performance evaluation for host–pathogen protein interaction prediction. J Bioinform Comput Biol. 2016;14: 1650011. pmid:26932275
View Article
PubMed/NCBI
Google Scholar

[114] View Article

[115] PubMed/NCBI

[116] Google Scholar

[ref42] 42. Davis J, Goadrich M. The Relationship Between Precision-Recall and ROC Curves. Proceedings of the 23rd International Conference on Machine Learning. New York, NY, USA: ACM; 2006. pp. 233–240. https://doi.org/10.1145/1143844.1143874

[ref43] 43. Tharwat A. Classification assessment methods. Appl Comput Inform. 2020; ahead-of-print.
View Article
Google Scholar

[119] View Article

[120] Google Scholar

[ref44] 44. Chandra TB, Verma K, Singh BK, Jain D, Netam SS. Coronavirus disease (COVID-19) detection in Chest X-Ray images using majority voting based classifier ensemble. Expert Syst Appl. 2021;165: 113909. pmid:32868966
View Article
PubMed/NCBI
Google Scholar

[122] View Article

[123] PubMed/NCBI

[124] Google Scholar

[ref45] 45. Rodriguez-Fdez I, Canosa A, Mucientes M, Bugarin A. STAC: A web platform for the comparison of algorithms using statistical tests. 2015 [cited 22 Jan 2021].
View Article
Google Scholar

[126] View Article

[127] Google Scholar

[ref46] 46. Chandra TB, Verma K, Singh BK, Jain D, Netam SS. Coronavirus disease (COVID-19) detection in Chest X-Ray images using majority voting based classifier ensemble. Expert Syst Appl. 2021;165: 113909. pmid:32868966
View Article
PubMed/NCBI
Google Scholar

[129] View Article

[130] PubMed/NCBI

[131] Google Scholar

Figures

Abstract

1. Background

2. Methods

2.1. Dataset and preprocessing

2.2. Proposed methodology

2.2.1. Feature extraction.

2.2.2. Classifiers for the identification of earthworm species.

2.3. Experimental setup

2.4. Statistical analysis

2.5. Webserver to identify E. fetida

3. Results and discussion

3.1. Earthworm species identification performance using handcrafted features

3.2. Earthworm species identification performance using deep feature maps

3.3. Predictive performance of the proposed model under a real setting

4. Conclusions and future work

Supporting information

S1 Video. A short video showing the scientific significance, workflow and design of the current study.

Acknowledgments

References