Failure prediction in the refinery piping system using machine learning algorithms
Failure prediction in the refinery piping system using machine learning algorithms
com
ScienceDirect
ScienceDirect
Procedia Computer Science 00 (2023) 000–000
Available online at www.sciencedirect.com www.elsevier.com/locate/procedia
Procedia Computer Science 00 (2023) 000–000
www.elsevier.com/locate/procedia
ScienceDirect
Procedia Computer Science 232 (2024) 1663–1672
Yassine Kanouna, Aynaz Mohammadi Aghbash Mrada, *a, Tikou Belema, Bassem Zouarib, Hatem
a Mrada, *
School of Engineering, University of Quebec in Abitibi-Témiscamingue (UQAT), Rouyn-Noranda, Canada J9X 5E4
b
National Engineering School of Sfax (ENIS), LA2MP, B.P., W3038 Sfax, Tunisia
a
School of Engineering, University of Quebec in Abitibi-Témiscamingue (UQAT), Rouyn-Noranda, Canada J9X 5E4
b
National Engineering School of Sfax (ENIS), LA2MP, B.P., W3038 Sfax, Tunisia
Abstract
Abstract
Pipelines play a pivotal role in transporting large volumes of oil and gas within refineries. However, over time, they are susceptible
to deterioration, leading to potential failures. Effective monitoring is imperative to maintain their optimal performance and safety.
Pipelines play introduces
This research a pivotal role in transporting
a machine large
learning (ML)volumes of oiltoand
approach gas within
pinpoint refineries.
failure sourcesHowever,
in oil andover
gastime, they are
pipelines. susceptible
Analysing an
to deterioration,
industrial dataset,leading to potential
we compared six failures.
ML models Effective monitoring
to predict failuresisinimperative to maintain
refinery pipelines. their optimal
Leakage sourcesperformance
are predictedand safety.
based on
This
three research introduces
operational a machine
parameters: learning
transported (ML)
fluid, approach and
temperature, to pinpoint
pressure.failure sourcesare
The models in evaluated
oil and gasandpipelines.
compared Analysing
in terms an
of
industrial
precision, dataset, we compared
recall, F1-score, six MLand
accuracy, models to predict failures
the ROC-AUC. in refinery
Remarkably, the pipelines.
XGBoost Leakage
classifiersources are apredicted
exhibited based on
99.7% accuracy,
three operational
outperforming parameters:
other algorithms transported fluid,the
in predicting temperature, and pressure.
failure source. The models
Emphasizing are of
the value evaluated
Industryand
4.0compared
solutions,inthis
terms of
study
precision,
underscoresrecall, F1-score,of accuracy,
the potential advanced and ML the ROC-AUC.
in enhancing Remarkably,
pipeline the XGBoost
monitoring. classifier
Such predictions exhibited
empower a 99.7%to accuracy,
operators pre-empt
outperforming otherindustry
failures, reinforcing algorithms in and
safety predicting the failure source. Emphasizing the value of Industry 4.0 solutions, this study
sustainability.
underscores the potential of advanced ML in enhancing pipeline monitoring. Such predictions empower operators to pre-empt
failures,
© 2023 reinforcing
The Authors. industry safety by
Published andELSEVIER
sustainability.
B.V.
This
© 2024is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
© 2023The
Peer-review
Authors.
Theunder
Authors. Published
Published
responsibility
by by
of
Elsevier
the
B.V. B.V.
ELSEVIER
scientific committee of the 5th International Conference on Industry 4.0 and Smart
This is
This is an
an open
openaccess
access article under
article the CC
under the BY-NC-ND
CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
license (https://creativecommons.org/licenses/by-nc-nd/4.0)
Manufacturing
Peer-review under responsibility of the scientific committee of the 5th International Conference on Industry 4.0 and Smart Manufacturing
Peer-review under responsibility of the scientific committee of the 5th International Conference on Industry 4.0 and Smart
Manufacturing
Keywords: Piping; failure; machine learning; prediction.
many countries. However, their failure in various circumstances can have severe consequences such as environmental
damage, financial losses, and catastrophic outcomes. According to a report by CONCAWE, [1], pipeline damage can
result from factors such as mechanical failure, operational malfunctions, natural hazards, corrosion/erosion, and
foreign interference.
To mitigate these risks, many companies have resorted to intrusive techniques to enhance safety measures, system
performance, and monitoring capabilities [2,3].
The fourth industrial revolution seeks to make industrial process control systems more autonomous and intelligent
by integrating data acquisition systems through sensors and instruments [2,3].
Data generated from these instruments is typically stored in local databases and processed using supervised and
unsupervised learning models to predict the system's state. To predict discrete or continuous variables, such as failures
in the oil and gas system or the corrosion rate of an asset, supervised learning methods use classification and
regression. The success of these algorithms depends on fundamental steps such as cleaning and preparing the database,
which involves analyzing missing values, outliers, and normalizing collected data. On the other hand, unsupervised
learning methods cluster similar data together without relying on the input-output concept [4].
In previous works, numerous studies have explored the application of machine learning techniques to predict
failures in petrochemical pipelines. For example, El-Abbasy et al. [5] developed regression-based models to evaluate
pipeline conditions, while Kimiya et al. [6] used artificial neural networks to predict piping failures. Liao et al. [7]
used a neural network algorithm to predict corrosion rates in gas pipelines, and Sumayah et al. [8] developed five
machine learning algorithms to predict leaks. Bersani et al. [9] developed a risk evaluation model to predict failures
caused by third-party activities, and De Kerf et al. [10] proposed a model for detecting oil leaks using IR thermal
cameras and unmanned aerial vehicles (UAVs).
This study advances the field of piping failure prediction by introducing an automated system capable of detecting
potential sources of piping failures. Utilizing industrial datasets, we compared six machine learning algorithms to
identify pipeline leakages, emphasizing metrics such as accuracy, precision, recall, F1-score, and ROC-AUC. Through
optimization techniques, the performance of these models was enhanced, resulting in more accurate and efficient
anomaly detection. The author aims to investigate the potential of each ML algorithm in predicting the source of
failure in piping system.
The paper is structured to initially present the methodology for pipeline leakage detection using machine learning
algorithms. It then delves into the results and evaluates the model's performance, concluding with remarks and
directions for future research. This research not only illuminates the potential of machine learning in pipeline failure
prediction but also showcases optimization techniques that can pave the way for future studies in the domain.
2. Methodology
Failure prediction is an important task in many fields, including engineering, manufacturing, transportation, and
healthcare, where the goal is to identify potential failures or issues before they occur so that appropriate preventive
measures can be taken [6]. The methodology that was adopted during this study to predict failures accurately includes
several steps as shown in Fig. 1. After the literature review presented in the previous section, next steps involve
gathering the relevant data about the system’s design including historical failure data, maintenance records, sensor
data, and other relevant information. The next step includes the preprocessing phase. The final steps involve feature
extraction followed by the model development and evaluating its predictive performances using ML metrics. In the
subsequent subsections, each step is covered separately.
Yassine Kanoun et al. / Procedia Computer Science 232 (2024) 1663–1672 1665
Yassine Kanoun/ Procedia Computer Science 00 (2023) 000–000 3
The pie chart depicted in Fig. 2 presents the proportion of the different failure sources in the studied refinery system.
Based on this figure, corrosion failure has been identified as the most common type of failure for these assets, where
sour water corrosion corresponds to approximately a quarter of the total source of failure. However, as indicated by
the title, these types of failure can arise from various underlying causes such as the type of service and the operating
condition (temperature and fluid state).
evaluate the model’s performance during training and tune its hyperparameters. 20% of the dataset is used for
validation. The test set is held out from training and validation and is used to evaluate the final performance of the
ML model. This set represent also 20% of the dataset and it should be completely independent of the training and
validation sets. The training process is performed using "supervised learning" in which the inputs (predictors or
attributes) and outputs are known in the context of the specific problem (in our case the target is the source of failure).
As mentioned previously, the developed model is designed to objectively predict the source of failure that could
threaten the piping system. To attend to this objective, the authors employed 6 classification algorithms. These
algorithms are an essential part of machine learning, and they are used to classify data into various categories or
classes. Each algorithm has its strengths and weaknesses, and the choice of algorithm depends on the nature of the
problem at hand.
2.3.1. Decision Tree
Decision trees are among the most widely used non-parametric supervised machine learning models in the context
of classification and regression. On the one hand, it uses a tree-like structure which improves its algorithmic simplicity,
and, on the other hand, because of the ease of interpretation and explanation of the generated results. Each node in the
tree represents a decision based on the input variables, and the branches (final prediction) represent the possible
outcomes of that decision.
2.3.2. Ensemble methods
Ensemble learning is a general meta-approach to machine learning that seeks to achieve better predictive
performance by combining the predictions of several single models (such as decision trees). These techniques are used
to eliminate errors in a single predictive model, particularly over/underfitting problems, and can be classified into two
subcategories: parallel ensemble methods (Bagging) and sequential ensemble methods (Boosting) [12].
2.3.2.1. Parallel ensemble methods (Bagging)
Bagging, also called Bootstrap Aggregating, is one of the ensemble techniques in which many predictors (decision
trees) are generated independently and in parallel. In bagging, multiple models are trained on different subsets of the
training data, and their predictions are combined to make a final prediction. It combines multiple weak learners to
create a strong predictor. Specifically, each tree "votes" on the class of the instance, and the most common class across
all trees is selected as the final prediction. This majority voting approach ensures that the final predictions are robust
and less sensitive to noise and outliers in the data. The use of these kinds of techniques reduces the risk of overfitting.
An example of these methods is the random forest (RF) [12].
2.3.2.2. Sequential ensemble methods (boosting)
Boosting involves the generation of multiple weak models in a sequential and dependent manner. Each subsequent
model attempts to correct the errors of the previous models by identifying misclassified instances and increasing their
weight. This technique ensures that the next model provides specific attention to correct the errors [13]. As a result,
boosting is an effective approach for improving the accuracy of machine learning models. The most popular boosting
algorithm is:
- Gradient Boosting (GDBT) [14] : it creates an ensemble of weak learners like decision trees, by sequentially
constructing a sequence of decision trees to minimize the residual error between the predictions and the actual
target values. This approach leads to incremental minimization of errors in each iteration of new decision
tree construction, which improves the predictive accuracy of the ensemble.
- XGBoost (XGB) [14]: Extreme Gradient Boosting, XGBoost, is an extension of GDBT that uses a gradient
boosting framework, which involves iteratively adding decision trees to the model, with each subsequent tree
trained to correct the errors of the previous trees. The algorithm adds trees in a way that minimizes a specified
loss function. It includes features such as regularization, scarcity, and cross-validation to improve the
performance and stability of the model.
- Adaboost (Adb) [15]: Adaptive boost algorithm: it adjusts the weights of the training instances based on the
performance of the previous weak classifier. The final model is a weighted combination of weak learners.
1668 Yassine Kanoun et al. / Procedia Computer Science 232 (2024) 1663–1672
6 Kanoun et al./ Procedia Computer Science 00 (2023) 000–000
Some parameters must be implemented in the algorithm to reach optimal performances, these parameters are often
called 'hyperparameters' [16]. Among these, we find:
- n_estimators: this parameter controls the number of trees inside the model.
- max_depth: this parameter is one of the most important parameters. It governs the maximum depth to which
trees inside the forest can grow.
- max_features: the forest takes random subsets of features and tries to determine the best distribution.
max_features can take four values: "auto", "sqrt", "log2" and None.
- learning_rate: Weighting applied to each classifier at each boosting iteration (For AdaBoost).
- Algorithm (algorithm): this is the technique used for the weighting (for AdaBoost)
2.3.3. Support vector machine
Support vector machine [17] is a type of ML model used for regression and classification and outlier detection.
SVMs, as a single classifier, are powerful tools that can be used to build models for both linear and nonlinear
classification problems. It is a model that tries to find the hyperplane that best separates the data into different classes.
The hyperplane is chosen such that the margin between the two classes is maximized. The hyperparameters for this
method are [17]:
- C: it presents the regularization parameter of a support vector machine (SVM) that controls the trade-off
between two objectives: maximizing the margin (i.e., the distance between the hyperplane and the closest
data points) and minimizing the classification error.
- Kernel type: Support vector machines (SVMs) can utilize various types of kernel functions to transform
data into a higher-dimensional space where it can be separated more effectively. This is known as the
kernel trick, which avoids the computational burden of explicitly projecting the data into higher
dimensions. By choosing an appropriate kernel function, SVMs can improve their performance in
separating complex, nonlinear datasets.
- Gamma: This parameter is used in some kernel functions (e.g., RBF) to control the shape of the decision
boundary. This parameter can significantly affect the performance of SVMs, particularly when dealing with
non-linearly separable data.
The choice of these parameters depends on the characteristics of the data being used and the problem. Table 2
presents the optimal values for each parameter of the applied model in this study using GridSearchCV method of
sklearn [18]. This method enables grid-based searching, where it generates candidates from a grid of predefined
parameters. Following implementation of the grid search on the dataset, the best combination is selected after
evaluation of all possible combinations of parameter values [18].
Table 2. Optimal value
Model Parameter Batch size Optimal value
Bootstrap [True, False] False
Max_depth [80, 90,100,200] 50
Random Forest Classifier
Max_features ["sqrt", "log2", "auto", "None"] Auto
N_estimators [50,60,100, 200,300] 100
C Float number 10
Support vector machine Gamma ["scale", "auto"] 0.1
Kernal ["linear", "poly", "rbf", "sigmoid"] Rbf
Max_depth [80, 90,100,200] Not specified
Gradient boosting Classifier Max_features ["sqrt", "log2", "auto", "None"] All
N_estimators [50,60,100, 200,300] 10
Criterion ["gini", "entropy", "log_loss"] Entropy
Decision Tree Classifier
Max_depth [10,15,20, 30,100] 15
N_estimators [50,60,100,200,300,400] 400
AdaBoost Classifier Algorithm ['SAMME', 'SAMME.R'] SAME.R
Learning_rate [(0.97 + x / 100) for x in range(0, 8)] 1.04
Max_depth [80, 90,100,200] 90
XGBoost Classifier Max_features ["sqrt", "log2", "auto", "None"] Auto
N_estimators [50,60,100, 200,300] 100
Yassine Kanoun et al. / Procedia Computer Science 232 (2024) 1663–1672 1669
Yassine Kanoun/ Procedia Computer Science 00 (2023) 000–000 7
To better showcase the performance of the proposed algorithms, the evaluation of each model encompasses the
computation of the following indicators: Precision (P), Recall (R), F1 score and the area under the receiver operating
2∗𝑃𝑃𝑃𝑃∗𝑅𝑅𝑅𝑅 𝑇𝑇𝑇𝑇𝑃𝑃𝑃𝑃 𝑇𝑇𝑇𝑇𝑃𝑃𝑃𝑃
characteristic curve (AUC-ROC). These can be defined as: F1 = , 𝑃𝑃𝑃𝑃 = and 𝑅𝑅𝑅𝑅 = .
𝑃𝑃𝑃𝑃+𝑅𝑅𝑅𝑅 𝑇𝑇𝑇𝑇𝑃𝑃𝑃𝑃+𝐹𝐹𝐹𝐹𝑃𝑃𝑃𝑃 𝑇𝑇𝑇𝑇𝑃𝑃𝑃𝑃+𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹
The AUC (Area Under the Curve) - ROC (Receiver Operating Features) curve: (also written as AUROC) is one of
the most important evaluation measures to check the performance of the classification models to distinguish between
classes [20] :
- ROC is a probability curve: a plot between the false positive rates (FPR) (X-axis) and the true positive rates
(TPR) (Y-axis). The space at the top left of the curve must be evaluated to select the best model. The smaller
the space (near the top corner), the better the result.
- AUC represents the degree or measure of separability; it is a numerically quantified indicator. An excellent
model has an AUC close to 1, which means that it has a good separability measure. A poor model has an
AUC close to 0, which means that it has the worst separability measure, it predicts 0 as 1 and 1 as 0. And
when AUC is 0.5, it means that the model has no class separability. The model with the highest AUC is the
one that performs best [20].
3. Results and discussion
The evaluation results of the models on the furnished dataset, after the deployment of the optimal parameter, are
presented in Table 4.
As evident from Table 4, all the models performed well with an accuracy higher than 96%, except the AdaBoost
has the least accuracy which is equal to 73%. XGB has the highest metrics performance Accuracy (99.7%), Precision
(96.6), Recall (100%), and AUC (100%). AdB has the lowest accuracy (72.3%), Precision (72%), Recall (72%), and
AUC (69.5%) of all the classifiers. Ensemble classification algorithms (i.e., XGB, GDBT, and RF) generally provide
better results than single classifiers (SVM).
Ensemble techniques consistently outperform non-ensemble techniques across all evaluation metrics, according to
research conducted by Zhang et al.[21]. Specifically, the RF, GBDT, and XGB classifiers demonstrate the highest
classification accuracy across several datasets. This is not surprising, as ensemble learning improves a classifier's
generalizability and robustness[22]. Additionally, the superiority of ensemble techniques may be due to the nature of
the input features, as five out of nine have a categorical nature. The ensemble classifiers used in this research are
constructed based on information-based learning, which is better suited for handling categorical features[23].
However, the AdB ensemble technique performs the worst in all metric categories, making it the least preferred.
Between GBDT, RF, and XGB, it is challenging to select a preferred classifier due to their marginal performance
disparity, lack of robustness in performance differences, and analogous performance dispersion. However, XG is
slightly preferred because it performs better than RF in all evaluation metrics and has faster training and testing
execution times [21].
1670 Yassine Kanoun et al. / Procedia Computer Science 232 (2024) 1663–1672
8 Kanoun et al./ Procedia Computer Science 00 (2023) 000–000
The confusion matrix as well as the prediction error assert the good performance of the algorithms in predicting
the source of failure. Fig. 3 shows the ability of the model to distinguish the different degradation modes. The number
of misclassified samples was considered negligible (maximum 4 per class for the 6 applied models).
It is important to note that no common classifier is universally effective across all datasets [24]. Even when using
the same dataset, the performance of each classifier can vary depending on the specific data pre-processing and settings
applied, such as the selection of input features and how those features are transformed. To predict the source of failure
in the piping system, the choice of service type (H2S, GASOIL, NAPHTA, etc.), the operating temperature, and the
fluid state can all impact the selection and performance of the classifiers.
3.1. Factors Importance
Understanding the data and constructing accurate predictive models in machine learning heavily rely on identifying
the importance of features. Several key factors, data quality and quantity, model development, and interpretability,
can influence the significance of these features in ML. For the case of this study Fig. 5, it’s clear that the importance
of each factor is very depending on the algorithm type and complexity, but it can be concluded that the type of service,
fluid state, and operating temperature, stands out as the most critical predictor that need to be controlled to ensure the
safety of the process, whereas the type of circuit is considered the least important factor in this context.
Yassine Kanoun et al. / Procedia Computer Science 232 (2024) 1663–1672 1671
Yassine Kanoun/ Procedia Computer Science 00 (2023) 000–000 9
4. Conclusion
This paper focuses on the challenge of oil and gas leakage within piping system, a common problem faced by many
oil and gas companies. The authors reviewed previous studies to identify potential solutions and determine which
machine learning (ML) algorithms could be used. After selecting an appropriate dataset, several predictive models,
single and ensemble, were constructed using various ML algorithms, and their performance was compared. The XGB
model outperformed the others and remained the best, with 99.7% accuracy and 99.6% precision, recall, F1-score,
and ROC-AUC. The confusion matrix indicated the model's ability to detect and distinguish between the different
sources of failure corrosion. Based on the results of the study, it was determined that the type of the product being
transported had a greater influence on the sources of failures that occurred in piping system, compared to the type of
circuit of the pipeline itself. The developed models can aid in minimizing the need for unnecessary pipeline inspections
by enabling prioritization. The proposed model demonstrated good performance with the industrial data used,
successfully meeting the study's goal for real-world applications. Additionally, these models offer a comprehensive
assessment of the risks that pose a threat to a pipeline, thereby enabling decision-makers to take appropriate action to
mitigate those risks and maintain pipeline safety. As a result, the use of these models can enhance piping system
management efficiency by providing decision-makers with a clear understanding of which areas require more attention
and resources, leading to a reduction in the overall cost of inspections. The main difficulty experienced in this study
was collecting data from the available documents and resources. In future work, it is interesting to investigate
intelligent anomaly mitigation techniques for predicting the corrosion rate associated with each transported product
and employ these intelligent approaches for risk-based inspection (RBI) screening assessment in the refinery system.
Acknowledgements
The authors would like to acknowledge Joël Fortin, François Grégoire and Alexia Blanchard-Lapierre from Norda
Stelo company in Montreal for the supply and help during this project. This work was funded by Mitacs and Norda
Stelo comapny.
References
[1] Davis, P. M., Dubois, J., Gambardella, F., Sanchez-Garcia, E., Uhlig, F., Haan, K., and Larivé, J.-F., 2011,
“Performance of European Crosscountry Oil Pipelines - Statistical Summary of Reported Spillages in 2010
and since 1971.”
[2] Santos, M. Y., Oliveira E Sá, J., Andrade, C., Vale Lima, F., Costa, E., Costa, C., Martinho, B., and Galvão,
J., 2017, “A Big Data System Supporting Bosch Braga Industry 4.0 Strategy,” International Journal of
Information Management, 37(6), pp. 750–760.
[3] Springer India-New Delhi, 2016, “Bosch Looks to Enhance Competitiveness with Industry 4.0,” Auto Tech
Rev, 5(9), pp. 58–61.
[4] Singh, A., Thakur, N., and Sharma, A., 2016, “A Review of Supervised Machine Learning Algorithms,” 2016
1672 Yassine Kanoun et al. / Procedia Computer Science 232 (2024) 1663–1672
10 Kanoun et al./ Procedia Computer Science 00 (2023) 000–000
3rd International Conference on Computing for Sustainable Global Development (INDIACom), pp. 1310–
1315.
[5] Senouci, A., Elabbasy, M., Elwakil, E., Abdrabou, B., and Zayed, T., 2014, “A Model for Predicting Failure
of Oil Pipelines,” Structure and Infrastructure Engineering, 10(3), pp. 375–387.
[6] Zakikhani, K., Nasiri, F., and Zayed, T., 2020, “Availability-Based Reliability-Centered Maintenance
Planning for Gas Transmission Pipelines,” International Journal of Pressure Vessels and Piping, 183, p.
104105.
[7] Liao, K., Yao, Q., Wu, X., and Jia, W., 2012, “A Numerical Corrosion Rate Prediction Method for Direct
Assessment of Wet Gas Gathering Pipelines Internal Corrosion,” Energies, 5(10), pp. 3892–3907.
[8] Aljameel, S. S., Alomari, D. M., Alismail, S., Khawaher, F., Alkhudhair, A. A., Aljubran, F., and Alzannan,
R. M., 2022, “An Anomaly Detection Model for Oil and Gas Pipelines Using Machine Learning,”
Computation, 10(8), p. 138.
[9] Chiara Bersani, Lucia Citro, Roberta Valentina Gagliardi, Roberto Sacile, and Angela Maria Tomasoni, 2010,
“Accident Occurrance Evaluation in the Pipeline Transport of Dangerous Goods,” Chemical Engineering
Transactions, 19, pp. 249–254.
[10] De Kerf, T., Gladines, J., Sels, S., and Vanlanduit, S., 2020, “Oil Spill Detection Using Machine Learning and
Infrared Images,” Remote Sensing, 12(24), p. 4090.
[11] So, A., Hooshyar, D., Park, K. W., and Lim, H. S., 2017, “Early Diagnosis of Dementia from Clinical Data by
Machine Learning Techniques,” Applied Sciences, 7(7), p. 651.
[12] Ben Seghier, M. E. A., Höche, D., and Zheludkevich, M., 2022, “Prediction of the Internal Corrosion Rate for
Oil and Gas Pipeline: Implementation of Ensemble Learning Techniques,” Journal of Natural Gas Science and
Engineering, 99, p. 104425.
[13] Yang, P., Hwa Yang, Y., B. Zhou, B., and Y. Zomaya, A., 2010, “A Review of Ensemble Methods in
Bioinformatics,” CBIO, 5(4), pp. 296–308.
[14] Chen, T., and Guestrin, C., 2016, “XGBoost: A Scalable Tree Boosting System,” Proceedings of the 22nd
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, San Francisco
California USA, pp. 785–794.
[15] Wu, P., and Zhao, H., 2011, “Some Analysis and Research of the AdaBoost Algorithm,” Intelligent
Computing and Information Science, R. Chen, ed., Springer, Berlin, Heidelberg, pp. 1–5.
[16] Gaikwad, C., 2021, “Hyperparameter Tuning for Tree Models,” ChiGa.
[17] Gandhi, R., 2018, “Support Vector Machine — Introduction to Machine Learning Algorithms,” Medium
[Online]. Available: https://towardsdatascience.com/support-vector-machine-introduction-to-machine-
learning-algorithms-934a444fca47. [Accessed: 04-Mar-2023].
[18] Unhelker, B., Pandey, H. M., and Raj, G., 2022, Applications of Artificial Intelligence and Machine Learning:
Select Proceedings of ICAAAIML 2021, Springer Nature.
[19] Handelman, G. S., Kok, H. K., Chandra, R. V., Razavi, A. H., Huang, S., Brooks, M., Lee, M. J., and Asadi,
H., 2019, “Peering Into the Black Box of Artificial Intelligence: Evaluation Metrics of Machine Learning
Methods,” American Journal of Roentgenology, 212(1), pp. 38–43.
[20] Hanley, J. A., and McNeil, B. J., 1982, “The Meaning and Use of the Area under a Receiver Operating
Characteristic (ROC) Curve.,” Radiology, 143(1), pp. 29–36.
[21] Al-Moubaraki, A. H., and Obot, I. B., 2021, “Corrosion Challenges in Petroleum Refinery Operations:
Sources, Mechanisms, Mitigation, and Future Outlook,” Journal of Saudi Chemical Society, 25(12), p.
101370.
[22] Kotsiantis, S. B., Zaharakis, I. D., and Pintelas, P. E., 2006, “Machine Learning: A Review of Classification
and Combining Techniques,” Artif Intell Rev, 26(3), pp. 159–190.
[23] Günther, W. A., Rezazade Mehrizi, M. H., Huysman, M., and Feldberg, F., 2017, “Debating Big Data: A
Literature Review on Realizing Value from Big Data,” The Journal of Strategic Information Systems, 26(3),
pp. 191–209.
[24] Sharma, R., Mithas, S., and Kankanhalli, A., 2014, “Transforming Decision-Making Processes: A Research
Agenda for Understanding the Impact of Business Analytics on Organisations,” European Journal of
Information Systems, 23, pp. 433–441.