Ensemble Auto Insurance Fraud Detection
Ensemble Auto Insurance Fraud Detection
ABSTRACT The prevalence of insurance fraud in the auto industry poses significant financial challenges
and undermines customer trust. Despite the application of machine learning methods to reduce these losses,
current literature lacks effective tuned algorithms for detecting fraud in insurance claims. To address this
gap, this study proposes an ensemble-based method with a weighted voting strategy for auto insurance fraud
detection. The study uses the Binary Quantum-Based Avian Navigation Optimizer Algorithm (BQANA)
to optimize the hyperparameters of Support Vector Machines (SVM), Random Forest (RF), and XGBoost
classifiers, which are combined into an ensemble. To address the dataset’s imbalance, random undersampling
was applied to create five legitimate-to-fraudulent claim ratios: A:A, 1:1, 2:1, 4:1, and 8:1. The performance
of BQANA was compared with Genetic Algorithms and Simulated Annealing for hyperparameter tuning.
The results indicate that the ensemble model with BQANA-optimized hyperparameters outperforms other
methods, particularly at a 1:1 ratio, achieving 99.94% Accuracy, 98.93% Precision, 100% Recall, and a
99.46% F1-score. These metrics surpass those obtained without optimization or with traditional tuning
methods. This research highlights the efficacy of the BQANA algorithm in optimizing hyperparameters for
classification models. By combining these optimized classifiers into an ensemble, the study significantly
enhances predictive accuracy in car insurance fraud detection, offering notable improvements over
conventional methods.
INDEX TERMS Insurance fraud detection, machine learning, ensemble learning, metaheuristic,
hyperparameter tuning.
insurance companies frequently pay illegitimate claims, and tion [21]. By reducing the number of non-fraudulent cases
insurance fraud is rarely prosecuted [4].Traditional manual through techniques like random undersampling, the dataset
verification methods are inadequate, often producing false becomes more balanced, reducing bias and enhancing model
alarms. Consequently, it is essential to employ precise and reliability. Ensemble learning, which combines multiple
intelligent methods, such as data mining, to effectively detect base classifiers with distinct features, further improves
and reduce fraudulent cases [10]. detection accuracy [22]. When these base classifiers have
Various methods are employed to detect auto insurance their hyperparameters tuned with metaheuristic algorithms,
fraud, including statistical techniques [11], machine learning their individual performances are optimized. This optimized
(ML) [12], and deep learning (DL) [13]. Statistical methods, performance, along with the ensemble method employing
such as regression and hypothesis testing, are used to identify weighted voting between the results of these base classifiers,
and analyze anomalies in auto insurance data. Moreover, ensures that the most accurate predictions are prioritized. This
ML methods use intelligent algorithms to detect fraudulent method collectively addresses the three primary challenges
activities in real-time by analyzing relevant data, while DL unbalanced datasets, high false positives, and untuned
methods leverage artificial neural networks (ANN) [14] to hyperparameters resulting in a more effective fraud detection
automatically identify complex patterns and features in large system.
datasets. Each of these methods has its own pros and cons This study introduces an ensemble learning framework
when applied to different types of data, making the choice of designed for the detection of auto insurance fraud, utilizing
method crucial for effective fraud detection [12]. a publicly available dataset of 15,420 car insurance claims
In the field of auto insurance fraud detection, three main from Kaggle, recorded between 1994 and 1996. The dataset
challenges are commonly encountered: unbalanced datasets, includes 32 features, combining categorical (e.g., accident
a high number of false positives, and ML models with area, policy type) and numerical (e.g., age, deductible) data,
untuned hyperparameters. The imbalance between fraudulent offering a rich foundation for analysis. However, with only
and non-fraudulent samples in datasets can lead to biased 6% of claims classified as fraudulent and 94% as legitimate,
models and poor fraud detection [15]. To address this, two the dataset presents a significant class imbalance, posing
main methods are used: oversampling and undersampling. a key challenge for fraud detection. To address this, the
Oversampling techniques like synthetic minority oversam- proposed framework incorporates three base classifiers—
pling technique (SMOTE) [16] increase the number of random forest (RF), extreme gradient boosting (XGBoost),
fraudulent cases, while undersampling methods, such as and support vector machines (SVM)—combined into an
random undersampling [17], reduce the number of non- ensemble model with a weighted voting strategy, and
fraudulent cases. Both methods aim to balance the dataset leverages balancing techniques to mitigate the impact of the
and improve the performance and reliability of predictive imbalance.
models [18]. The hyperparameters of each base classifier are fine-tuned
Additionally, a high number of false positives can occur, using a metaheuristic optimization algorithm named
meaning legitimate claims are incorrectly flagged as fraud- binary quantum-based avian navigation optimizer algorithm
ulent. This not only leads to customer dissatisfaction but (BQANA). For comparative analysis, additional hyperparam-
also increases operational costs for insurance companies. eter tuning methods, including simulated annealing (SA) and
Advanced anomaly detection techniques and robust vali- genetic algorithms (GA), are employed, with their evaluation
dation processes can help mitigate this issue by refining outcomes assessed. To further explore the impact of data
model accuracy [19]. Moreover, ML models with untuned imbalance, the study generates four subsets with varying
hyperparameters often result in suboptimal performance. ratios of fraudulent to legitimate claims. Each classifier
Hyperparameter tuning is essential for enhancing the accu- is assigned a specific weight based on its performance
racy and efficiency of fraud detection models [20]. Properly during classification, enhancing the accuracy of fraud
tuned models can better differentiate between fraudulent identification in the final ensemble voting and prediction
and non-fraudulent claims, thereby reducing both false stages. By integrating these techniques, the framework
positives and false negatives. Our motivation is to address effectively addresses the challenges of unbalanced datasets,
these challenges by incorporating balancing techniques, high false positives, and untuned hyperparameters, providing
an ensemble learning model, and a metaheuristic-based a robust solution for fraud detection in the auto insurance
hyperparameter tuning algorithm to enhance the accuracy sector. The contributions of this work are summarized as
and robustness of auto insurance fraud detection systems. follows:
This approach aims to develop a comprehensive solution that
effectively tackles the limitations of existing methods. • Utilization of the BQANA metaheuristic optimization
Using an undersampled dataset as the input for ensemble algorithm for hyperparameter tuning to enhance classi-
learning, which integrates multiple base classifiers, can fier performance.
effectively mitigate the challenges of unbalanced datasets • Development of an ensemble learning framework with
and high false positives in auto insurance fraud detec- RF, XGBoost, and SVM, using a weighted voting
strategy to combine base classifier results, effectively with balanced and unbalanced data. These insights emphasize
reducing false positives in fraud detection. the importance of selecting appropriate algorithms to enhance
• Application of balancing techniques, such as random fraud detection systems. Furthermore, [29] examined the
undersampling, to mitigate the significant class imbal- use of RF, LR, and ANN for fraud detection, revealing
ance present in the dataset. that the RF method demonstrated superior performance,
The rest of this paper is organized into four sections. achieving an Accuracy of 98.21%, Precision of 98.08%,
In Section II, we review previous studies on auto insurance Recall of 100%, and F1-score of 99.03%. In addition, [30]
fraud detection. Section III discusses the research method proposed an ensemble model combining basic ML algorithms
employed in this study. The obtained results and the model’s (RF, DT, XGBoost, and LR) with a meta-heuristic method
effectiveness compared to other methods are presented in called Particle Swarm Optimization (PSO). After balancing
Section IV. Finally, in Section V, we provide practical the classes using SMOTE, the proposed ensemble model
conclusions and implications of the findings. improved the overall Accuracy to 99%.
Several recent studies have also focused on hyperparameter
II. LITERATURE REVIEW tuning techniques to optimize ML models in fraud detection
The detection of insurance fraud remains a complex and contexts. Researchers have adopted both exact and meta-
persistent challenge in contemporary society, requiring the heuristic methods, each offering distinct advantages. Exact
continuous development of innovative algorithms and meth- methods, such as Grid Search (GS), as discussed in [31],
ods to address it effectively. As outlined in the study by [23], provide a systematic approach to hyperparameter optimiza-
fraud can be defined as the misuse of professional positions tion but can be computationally intensive. In contrast,
for personal gain through the intentional misappropriation [32] employed ML techniques, specifically employing GA
of organizational resources. Additionally, the study by [24] to optimize hyperparameters. Their findings showed that
emphasizes the growing issue of financial fraud, particularly incorporating GA results into the LR model increased
within the banking and finance sectors, where complex Accuracy to 94%.
organizational structures and international capital flows are Moreover, [33] offers a thorough exploration of GA
exploited for illegitimate gains. This manipulation under- and XGBoost, focusing on hyperparameter optimization to
mines economic stability and violates legal frameworks and enhance fraud detection systems in smart grid environments.
ethical standards, thus making the development of advanced The experimental findings showed a significant boost in
fraud detection systems increasingly vital. model performance, raising Accuracy from 0.82 to 0.978.
Given the significant economic impact of auto insur- Similarly, [34] compares the proposed PSO method with GS,
ance, numerous researchers have investigated innovative demonstrating that PSO can produce superior solutions more
techniques to detect fraud in this domain, underscoring rapidly. Incorporating PSO results into a Deep Neural Net-
the need for continued improvement. These include ML work (DNN) model led to an Accuracy of 94.93%. Building
techniques such as Multi-Layer Perceptron (MLP), Decision on these advancements, the recent study by [35] introduces a
Trees (DT), SVM [25], Logistic Regression (LR), ANN, and PSO-XGBoost framework tailored for automobile insurance
AdaBoost, Stochastic gradient descent (SGD) methods [18], fraud detection. This framework leverages the optimization
Bagging [26], and other ensemble approaches. The appli- capabilities of PSO to fine-tune XGBoost hyperparameters,
cation of ML techniques in various aspects of the finance achieving a notable 95% accuracy. By enhancing model pre-
domain has been the subject of extensive research in recent cision and interpretability, this approach provides actionable
years. insights for early fraud prevention, further demonstrating the
The study by [12] examines the performance of several effectiveness of PSO in optimizing machine learning models
ML models, including SVM, RF, DT, Adaboost, K-Nearest for complex fraud detection challenges.
Neighbor (KNN), LR, Naïve Bayes (NB), and MLP. Their Data preprocessing and class imbalance handling tech-
findings reveal that the DT model improves the overall niques have been critical in developing effective fraud
Accuracy of the fraud detection system. Similarly, [27] detection methods. Specifically, [36], [37], and [38] empha-
investigates the use of various ML models for insurance fraud sized the use of SMOTE, Random Under-Sampling (RUS),
detection, including RF, Adaboost, XGBoost, KNN, and LR. and Random Over-Sampling (ROS) to address imbalanced
Among the models tested, the RF algorithm demonstrated the datasets. Building on these efforts, [39] propose methods to
highest Accuracy and F1 score in detecting insurance fraud, address imbalanced datasets and missing values, which are
thereby highlighting its strong classification performance. common challenges in real-world insurance fraud detection.
Reference [28] investigated the application of various ML Their framework integrates data imputation techniques, such
algorithms to detect fraudulent vehicle insurance claims. The as KNN and multivariate imputation, with ensemble learning
research evaluated the performance of several models, includ- methods, including Random Forest, XGBoost, and stacking
ing AdaBoost, XGBoost, NB, SVM, LR, DT, ANN, and RF, classifiers. By incorporating advanced resampling techniques
finding that AdaBoost and XGBoost outperformed the other like SMOTE and ADASYN, the model achieved superior
models by achieving a classification Accuracy of 84.5%. accuracy and F1-scores, demonstrating its effectiveness in
In contrast, the LR classifiers showed poor performance, both detecting fraudulent claims and further highlighting the
VOLUME 13, 2025 42999
A. Gheysarbeigi et al.: Ensemble-Based Auto Insurance Fraud Detection
importance of robust preprocessing strategies. Moreover, The employed dataset shows a clear imbalance in the
feature selection methods, such as GA, Firefly Algorithm distribution of fraudulent cases, which account for only 6%
Optimization (FFA), PSO, ANOVA, and Chi-2 [40], [41], (923 claims) of the total, in contrast to the 94% (14,497
are frequently employed to identify the most relevant claims) that are non-fraudulent. This data imbalance shows a
features for robust fraud detection. Additionally, some studies fundamental challenge in fraud detection and is a significant
have utilized unsupervised learning techniques, including factor to consider when selecting appropriate modeling and
K-means and C-means clustering [42], [43], to enhance evaluation techniques.
detection capability. In preparation for the analysis, the dataset underwent a
Despite these advances, several gaps remain. First, unbal- partitioning process to facilitate the model’s training, testing,
anced datasets continue to pose a significant challenge [44], and validation phases. Specifically, the data was randomly
often leading to skewed performance metrics and overlooked divided into three subsets: 60% was allocated for training
fraudulent instances [45]. Second, a high number of false the model, enabling it to learn and adapt to the patterns
positives can undermine practitioner trust and inflate inves- within the data; 20% was reserved for testing, providing an
tigation costs [46]. Finally, many existing ML models are not evaluation of the model’s predictive performance on unseen
rigorously tuned, resulting in suboptimal performance when data; and the remaining 20% was allocated for validation
dealing with complex fraud patterns [47]. purposes, allowing an additional layer of evaluation to
To address these challenges, this study systematically ana- fine-tune the model’s hyperparameters. This partitioning
lyzes different class ratios to identify the optimal approach for strategy is essential for the valid evaluation of fraud detection
balancing our dataset, thereby mitigating skewed detection models. All codes were implemented in Python 3.6.15. The
outcomes. We also reduce false positives through advanced computations were performed on a system equipped with an
feature selection, leading to more precise fraud detection. Intel Core i3-3220 processor and 4 GB DDR3 RAM.
In addition, our proposed methodology adopts an ensemble
learning algorithm that employs a weighted voting strategy to
improve predictive performance in the insurance fraud detec- B. PREPROCESSING
tion field. Furthermore, we leverage BQANA to fine-tune The preprocessing step commenced with a meticulous
the hyperparameters of each base classifier incorporated into examination of the initial dataset to detect and eliminate
the ensemble model, thereby enhancing overall detection redundant features that could hinder the model’s learning
accuracy. By tackling unbalanced data, high false-positive efficiency and jeopardize result Accuracy. The features
rates, and untuned model parameters simultaneously, our named ‘‘Policy Number’’ and ‘‘Age’’ were removed from
research fills a critical gap in the literature and establishes the dataset, as ‘‘Policy Number’’ was used solely as
a more robust framework for detecting fraudulent activities distinct identifiers for each claim, and ‘‘Age’’ duplicated the
in the auto insurance domain. information already provided by the ‘‘Age of policyholder’’
Finally, this research synthesizes key approaches in feature [4]. Upon a detailed review of prior studies outlined
Table 1, specifically examining the BQANA technique in Table 3, it was observed that 10 features have no impact
for hyperparameter optimization in auto insurance fraud on the model’s accuracy and efficiency. Consequently, these
detection. Through this review, we aim to advance the field irrelevant features were methodically eliminated from the
and offer valuable insights that inform future developments. dataset to enhance the speed and Accuracy of fraud detection
in auto insurance. The final set of features that we removed
from the dataset in our study are shown in the last row
III. MATERIAL AND METHODS of Table 3. Moreover, to meet the algorithmic requirement
A. DATASET for numerical input, the classification features underwent
The current research utilizes a comprehensive dataset a transformation process assigning each category a unique
obtained from an insurance company’s car claim records. integer value, streamlining computational processing and
This dataset, available on the Kaggle platform (Link to analysis.
Dataset), contains a substantial collection of 15,420 insurance Additionally, to meet the algorithmic requirement for
claims recorded from January 1994 to December 1996 numerical input, the categorical features underwent a trans-
[49]. For accessibility and reproducibility, details on data formation process, where each category was assigned a
availability are provided in the Data and Code Availability unique integer value. This transformation step reduces the
section at the end of the paper. computational processing and eases the analysis of the data.
The dataset contains 32 features, in addition to a target Another significant issue that researchers are involved with
feature, as detailed in Table 2. Each sample in the dataset is the challenge of data imbalance [15]. Previous scientific
is shown by a binary target feature, which is essential literature has documented various strategies, including both
for classifying the claims into fraudulent or non-fraudulent undersampling and oversampling techniques, to address
categories. The dataset is identified by its variety, and the challenge of data imbalance in the context of fraud
contains 25 categorical and 8 numerical features, providing detection. Previous research has explored the effectiveness
a rich foundation for analysis. of methods such as SMOTE [58], Adaptive Synthetic
Sampling Approach (ADASYN) [59], and Tabular Genera- errors. Meanwhile, the testing and validation sets, each
tive Adversarial Networks (TGAN) [60] in decreasing the comprising 2,899 normal and 185 fraud samples, remain
imbalance within the dataset. These techniques have been unchanged, ensuring an unbiased evaluation of the trained
explored and evaluated for their ability to effectively handle model.
the disproportionate representation of fraudulent and non- Another significant aspect in the field of data prepro-
fraudulent cases, which is a fundamental issue in fraud cessing is the identification of important features. Based on
detection. previous research, various feature selection techniques have
In alignment with the findings of previous study [27], been employed, as documented in Table 5. These include
we opted for the RUS to rectify the balance in our dataset. Our Boruta’s algorithm [4] and meta-heuristic methods such as
objective was to systematically assess which undersampling Ant Colony Optimization (ACO), PSO, and GA [57].
ratio produces the most effective results during the training The results presented in Table 3 and Table 5 shows that
phase and, subsequently, leads to the best performance in certain features, such as ‘‘Rep Number’’, ‘‘Deductible’’, and
prediction. Accordingly, the dataset was divided into training, ‘‘Policy Type’’, have been recognized as significant factors
testing, and validation subsets. Specifically, the entire dataset in some research studies while being deemed less significant
was split randomly, with 60% of the samples constituting the in others. When these features are recognized as important,
training set, 20% allocated to testing, and the remaining 20% the model shows increased Accuracy levels, prompting the
reserved for validation. researchers to categorize these three features as significant.
Table 4 illustrates the five different undersampling ratios Consequently, in the current study, we have utilized a set
we applied to the training set. In Ratio A:A (no under- of 23 features after removing 10 less significant features
sampling), all training samples remained intact, resulting identified through the aforementioned feature selection
in 8,699 normal and 553 fraudulent cases. By contrast, the techniques. This feature engineering process aims to increase
1:1 ratio reduces the number of normal samples to 553, the speed and Accuracy of the fraud detection model by
creating a perfectly balanced subset. The 2:1 ratio allows focusing on the most relevant features.
for 1,106 normal and 553 fraudulent samples, while the
4:1 and 8:1 ratios further increase the number of normal C. MODELING
samples to 2,212 and 4,424, respectively, against the same After the data preprocessing steps on the primary dataset
553 fraud cases. Our intent in gradually modifying the class are completed, the focus is on developing a fraud detection
distribution is to identify the ratio that optimizes the detection model. Through a comprehensive review of the existing
of fraudulent activities while minimizing misclassification literature in the field of fraud detection, the researchers
TABLE 5. Methods used to select important features. Afterward, we calculated the weight for all 5 ensembles:
migratory birds during extended aerial journeys. This algo- is a random integer acting as a coefficient for adjusting the
rithm, QANA, is designed with a multi-flock framework length of the vector |ϕi ⟩d within the Bloch sphere [61].
and quantum-driven navigation, incorporating two muta- (
tion techniques and a qubit-crossover operator to enhance xid (t + 1), |ϕi |d < rand
uid (t + 1) = (9)
efficient exploration of the search domain. At first, the vid (t + 1), |ϕi |d ≥ rand
initial step involves dividing the migratory bird population
θ θ
into multiple flocks in a random manner. Subsequently, the |ϕi ⟩d = |ϕR ⟩d × cos |0⟩ + eiϕ sin |1⟩ ,
2 2
algorithm imitates the flight formation of migratory birds π
to share acquired information among search agents through θ, ϕ = rand × (10)
2
the utilization of a V-echelon communication structure.
Assuming V represents a collection of n individuals within As per the findings from a prior study [61], QANA
the flock fq , comprising a header (H) and two subgroups demonstrates superior performance compared to other estab-
known as right-line (R) and left-line (L) arranged in a V- lished optimizers across diverse continuous search space
shaped configuration. benchmark assessments. When compared to its rivals, QANA
The flocks employ a quantum-based navigation method surpasses them in terms of both exploration and exploitation
for search space exploration, incorporating a Success-based capabilities. Consequently, the foundational components
Population Distribution (SPD) strategy, two mutation meth- of the conventional QANA are adapted to formulate its
ods known as ‘‘DE/quantum/I’’ and ‘‘DE/quantum/II,’’ and binary counterpart. During the binary QANA’s formulation,
a qubit-crossover operator. Each flock dynamically switches the initial solutions are generated at random within the
between these mutation techniques, with fm representing the interval [0, 1]. Following this initialization, the iterative
flocks utilizing Mm in iteration t (as shown in Equation (5)). procedure is carried out until the predefined termination
The variable τij is set to 1 if Mm enhances aj of the i-th flock criterion, typically the maximum number of iterations is met.
in the set fm ; otherwise, it is assigned a value of 0. Based on this study [62], using the threshold method for
binary conversion of continuous solutions yields significantly
X nj=1 τij .
P
improved outcomes compared to transfer functions like S-
SRm (t) = |fm | × 100 (5) shaped, V-shaped, U-shaped, and Z-shaped.
n
i∈fm The performance of SVM, RF, and XGBoost models
The quantum mutation strategies are defined by Equa- within an ensemble model relies on the precise selection of
tions (6) and (7). Here, xi (t) represents the current position optimal hyperparameters, so in this study, our objective is to
of search agent ai in the iteration t, xVechelon (t) denotes the employ BQANA to select optimal hyperparameters set for
position of the subsequent search agent after ai , and xbest (t) each base classifiers inside the ensemble. A list of possible
indicates the best search agent’s location. Random selections parameters associated with these three models is outlined in
from short-term memory (STM) and long-term memory Table 6.
(LTM) are denoted by xj∈STM (t) and xj∈LTM (t) respectively. The SVM model includes c, kernel, degree, gamma,
Equation (8) is utilized to compute the trial vector vH (t + 1) shrinking, and tol hyperparameters [34]. In the case
as the leader in the V-echelon structure, where L and U of the RF model, fine-tuning involved hyperparameters
represent the lower and upper boundaries of the search space. such as solver type, n_estimators, criterion, max_depth,
Additionally, Si denotes the quantum orientation of the bird min_samples_split, min_samples_leaf, max_features, boot-
ai , and incorporates a parameter adaptation mechanism based strap, and max_features [34]. The tuning of the XGBoost
on a historical record of successful parameters. model centered on hyperparameters such as learning
rate, n_estimators, min_weight_fraction_leaf, max_depth,
vi (t + 1) = xbest (t) + Si (t) × xVechelon (t) − xj∈LTM (t) min_impurity_decrease, colsample_bytree, reg_alpha,
reg_lambda, and subsample [33]. The range of the relevant
+ Si (t) × xVechelon (t) − xbest (t)
hyperparameters was identified through a review of the
+ Si (t) × xj∈LTM (t) − xj∈STM (t) (6)
documentation available on the Scikit-learn ([Link]
vi (t + 1) = Si (t) × xbest (t) − xVechelon (t) [Link]) platform as well as related scientific literature.
+ Si (t) × xi (t) − xj∈LTM (t) − xj∈STM (t) (7)
vH (t + 1) = Si (t) × xbest + (L + (U − L) × rand(0, 1)) D. FITNESS EVALUATION
(8) To optimize the hyperparameters of the XGBoost, SVM,
and RF classifiers, we employed the BQANA algorithm.
To generate the trial vector ui (t + 1), the mutant vector This algorithm at first generates proposed solutions for the
vi (t + 1) is combined with its parent xi (t) using Equation (9), model hyperparameters in the first iteration. Each solution
with |ϕi ⟩d representing the qubit-crossover probability for the length is equal to a number of hyperparameters of classifiers.
d-th dimension. Each iteration involves the calculation of a Subsequently, the fitness function calculates the mean of
qubit-crossover |ϕi ⟩d for each dimension of the trial vector Accuracy, Precision, Recall, and F1-score for each of these
ui (t + 1) through Equation (10), where the parameter |ϕR ⟩d solutions and selects the best solutions from the first iteration.
TABLE 7. Hyperparameters and values used in three metaheuristic TABLE 8. Hyperparameter values for the SVM model tuned using BQANA,
optimization algorithms BQANA, GA, and SA. GA, and SA across different training set ratios.
TABLE 9. Hyperparameter values for the RF model tuned using BQANA, TABLE 10. Hyperparameter values for the XGBoost model tuned using
GA, and SA across different training set ratios. BQANA, GA, and SA across different training set ratios.
TABLE 14. Comparison of evaluation metrics with related studies. The practical implications of this study are significant,
especially for insurance companies. By accurately iden-
tifying fraudulent claims, the proposed framework has
the potential to reduce financial losses and administrative
burdens. However, challenges such as handling real-world
data inconsistencies and ensuring seamless integration with
existing IT infrastructures must be addressed. For instance,
incorporating preprocessing steps to handle noisy or missing
data and optimizing runtime for larger datasets will be crucial
for operational deployment. Additionally, the ensemble
approach requires compatibility with existing fraud detection
systems, which may necessitate customization or incremental
integration strategies. Despite these challenges, the demon-
strated effectiveness of the BQANA-tuned ensemble model
underscores its potential as a transformative tool for modern
insurance fraud detection systems.
V. CONCLUSION
Insurance fraud remains a critical challenge, necessitating
and neural networks optimized for imbalanced datasets [23]. sophisticated detection systems. This study introduced a
Despite their effectiveness in specific scenarios, the results robust ensemble learning framework with hyperparameter
show that our ensemble model with BQANA tuning outper- optimization using the BQANA algorithm, outperforming
forms these approaches across multiple evaluation metrics GA and SA in enhancing model performance. Using the
(Accuracy, Precision, Recall, and F1-score), demonstrating imbalanced Carclaims dataset, the ensemble model tuned
its robustness and adaptability to the complex challenge of with BQANA and a 1:1 ratio achieved 99.94% accuracy,
insurance fraud detection. demonstrating superior results across all evaluation metrics.
The advantages of using metaheuristic algorithms, partic- These findings underscore the significance of addressing data
ularly BQANA, extend beyond superior performance metrics imbalances, employing metaheuristic-based hyperparameter
to their potential real-world applicability. For instance, tuning, and leveraging ensemble techniques to improve fraud
the ensemble model tuned with BQANA achieved an detection systems.
Accuracy of 99.94% and a perfect Recall of 100% at the Despite its promising outcomes, this study is limited
1:1 ratio, outperforming SA and GA by notable margins. to a single dataset, which may affect the generalizability
The balanced dataset provided by the 1:1 ratio amplifies of the findings. Additionally, while BQANA achieved the
the effectiveness of hyperparameter optimization, enabling best predictive performance, it incurred slightly higher
the model to accurately detect fraudulent claims while computational costs compared to GA and SA. Future work
minimizing false positives. These results, supported visually should focus on validating the proposed framework on
by Figures 5, 6, and 7, highlight the consistent superiority diverse datasets, enhancing computational efficiency, and
of the BQANA-tuned ensemble method compared to other integrating deep learning techniques to further advance fraud
approaches. Furthermore, the computational efficiency of detection capabilities.
BQANA, as shown in Table 13, demonstrates its feasibility
DATA AND CODE AVAILABILITY
for practical deployment. While BQANA incurs slightly
higher computational costs compared to SA and GA, this The dataset and code used in this study are publicly
is justified by its substantial performance gains, making it available in the Figshare repository under the DOI:
a viable solution for real-world scenarios where accuracy is 10.6084/[Link].28207571.
paramount. REFERENCES
The robustness of the proposed framework is particu- [1] E. W. T. Ngai, Y. Hu, Y. H. Wong, Y. Chen, and X. Sun, ‘‘The application
larly relevant in addressing challenges commonly faced in of data mining techniques in financial fraud detection: A classification
real-world settings, such as noisy or incomplete datasets. framework and an academic review of literature,’’ Decis. Support Syst.,
vol. 50, no. 3, pp. 559–569, Feb. 2011.
The adaptability of metaheuristic algorithms, demonstrated [2] L. Maiano, A. Montuschi, M. Caserio, E. Ferri, F. Kieffer, C. Germanò,
through evaluations across multiple training set ratios, L. Baiocco, L. R. Celsi, I. Amerini, and A. Anagnostopoulos, ‘‘A deep-
ensures that the models maintain robust performance even learning–based antifraud system for car-insurance claims,’’ Expert Syst.
Appl., vol. 231, Nov. 2023, Art. no. 120644.
in imbalanced or imperfect data conditions. Moreover, the [3] A. Singhal, N. Singhal, Divya, and K. Sharma, ‘‘Machine learning methods
automated nature of hyperparameter tuning minimizes the for detecting car insurance fraud: Comparative analysis,’’ in Proc. 3rd Int.
reliance on manual intervention, making the framework Conf. Intell. Technol. (CONIT), Jun. 2023, pp. 1–5.
[4] F. Aslam, A. I. Hunjra, Z. Ftiti, W. Louhichi, and T. Shams, ‘‘Insurance
scalable for integration into existing insurance fraud detection fraud detection: Evidence from artificial intelligence and machine
workflows. learning,’’ Res. Int. Bus. Finance, vol. 62, Dec. 2022, Art. no. 101744.
[5] F. Gao, J. W. Lien, and J. Zheng, ‘‘Exaggerating to break-even: Reference- [28] H. I. Okagbue and O. Oyewole, ‘‘Prediction of automobile insurance fraud
dependent moral hazard in automobile insurance claims,’’ Available SSRN, claims using machine learning,’’ Sci. Temper, vol. 14, no. 3, pp. 756–762,
vol. 2021, pp. 1–51, Feb. 2021. Sep. 2023.
[6] A. Ahmed, A. F. M. Sadullah, and A. S. Yahya, ‘‘Errors in accident data, [29] E. Nabrawi and A. Alanazi, ‘‘Fraud detection in healthcare insurance
its types, causes and methods of rectification-analysis of the literature,’’ claims using machine learning,’’ Risks, vol. 11, no. 9, p. 160, Sep. 2023.
Accident Anal. Prevention, vol. 130, pp. 3–21, Sep. 2019. [30] B. P. Verma, V. Verma, and A. Badholia, ‘‘Hyper-tuned ensemble machine
[7] H. Lando, ‘‘Optimal rules of negligent misrepresentation in insurance learning model for credit card fraud detection,’’ in Proc. Int. Conf. Inventive
contract law,’’ Int. Rev. Law Econ., vol. 46, pp. 70–77, Jun. 2016. Comput. Technol. (ICICT), Jul. 2022, pp. 320–327.
[8] A. M. Macedo, C. V. Cardoso, J. S. M. Neto, and [31] O. R. Sanchez, M. Repetto, A. Carrega, and R. Bolla, ‘‘Evaluating ML-
C. A. da Costa Brás da Cunha, ‘‘Car insurance fraud: The role of based DDoS detection with grid search hyperparameter optimization,’’
vehicle repair workshops,’’ Int. J. Law, Crime Justice, vol. 65, Jun. 2021, in Proc. IEEE 7th Int. Conf. Netw. Softwarization (NetSoft), Jun. 2021,
Art. no. 100456. pp. 402–408.
[9] J. Jung and B.-J. Kim, ‘‘Insurance fraud in korea, its seriousness, and policy [32] M. Tayebi and S. E. Kafhali, ‘‘Hyperparameter optimization using genetic
implications,’’ Frontiers Public Health, vol. 9, Nov. 2021, Art. no. 791820. algorithms to detect frauds transactions,’’ in Proc. Int. Conf. Artif. Intell.
[10] W. Hilal, S. A. Gadsden, and J. Yawney, ‘‘Financial fraud: A review of Comput. Vis. Springer, Jan. 2021, pp. 288–297.
anomaly detection techniques and recent advances,’’ Expert Syst. Appl., [33] A. Mehdary, A. Chehri, A. Jakimi, and R. Saadane, ‘‘Hyperparameter
vol. 193, May 2022, Art. no. 116429. optimization with genetic algorithms and XGBoost: A step forward in
[11] T. Badriyah, L. Rahmaniah, and I. Syarif, ‘‘Nearest neighbour and statistics smart grid fraud detection,’’ Sensors, vol. 24, no. 4, p. 1230, Feb. 2024.
method based for detecting fraud in auto insurance,’’ in Proc. Int. Conf. [34] M. Tayebi and S. El Kafhali, ‘‘Performance analysis of metaheuristics
Appl. Eng. (ICAE), Oct. 2018, pp. 1–5. based hyperparameters optimization for fraud transactions detection,’’
[12] L. Rukhsar, W. H. Bangyal, K. Nisar, and S. Nisar, ‘‘Prediction of Evol. Intell., vol. 17, no. 2, pp. 921–939, Apr. 2024.
insurance fraud detection using machine learning algorithms,’’ Mehran [35] N. Ding, X. Ruan, H. Wang, and Y. Liu, ‘‘Automobile insurance fraud
Univ. Res. J. Eng. & Technol., vol. 41, no. 1, pp. 33–40, Jan. 2022. detection based on PSO-XGBoost model and interpretable machine
[13] C. Gomes, Z. Jin, and H. Yang, ‘‘Insurance fraud detection with unsu- learning method,’’ Insurance, Math. Econ., vol. 120, pp. 51–60, Jan. 2025.
pervised deep learning,’’ J. Risk Insurance, vol. 88, no. 3, pp. 591–624, [36] P. Mrozek, J. Panneerselvam, and O. Bagdasar, ‘‘Efficient resampling
Sep. 2021. for fraud detection during anonymised credit card transactions with
[14] S. Mirjalili, ‘‘Evolutionary algorithms and neural networks,’’ in Evolution- unbalanced datasets,’’ in Proc. IEEE/ACM 13th Int. Conf. Utility Cloud
ary Algorithms and Neural Networks: Theory and Applications (Studies in Comput. (UCC), Dec. 2020, pp. 426–433.
Computational Intelligence), vol. 780. Cham, Switzerland: Springer, 2019, [37] C. G. Tekkali and K. Natarajan, ‘‘Smart fraud detection in e-transactions
pp. 43–53. using synthetic minority oversampling and binary Harris hawks optimiza-
[15] M. Rakhshaninejad, M. Fathian, B. Amiri, and N. Yazdanjue, ‘‘An tion,’’ in Proc. Comput., Mater. & Continua, Jan. 2023, vol. 75, no. 2,
ensemble-based credit card fraud detection algorithm using an efficient pp. 3171–3187.
voting strategy,’’ Comput. J., vol. 65, no. 8, pp. 1998–2015, Aug. 2022.
[38] M. Hanafy and R. Ming, ‘‘Using machine learning models to compare
[16] I. M. Nur Prasasti, A. Dhini, and E. Laoh, ‘‘Automobile insurance fraud
various resampling methods in predicting insurance fraud,’’ J. Theor. Appl.
detection using supervised classifiers,’’ in Proc. Int. Workshop Big Data
Inf. Technol., vol. 99, no. 12, pp. 2819–2833, 2021.
Inf. Secur. (IWBIS), Oct. 2020, pp. 47–52.
[39] A. A. Khalil, Z. Liu, A. Fathalla, A. Ali, and A. Salah, ‘‘Machine learning
[17] D. Trisanto, N. Rismawati, M. Mulya, and F. Kurniadi, ‘‘Effectiveness
based method for insurance fraud detection on class imbalance datasets
undersampling method and feature reduction in credit card fraud
with missing values,’’ IEEE Access, vol. 12, pp. 155451–155468, 2024.
detection,’’ Int. J. Intell. Eng. Syst., vol. 13, no. 2, pp. 173–181, Apr. 2020.
[40] X. Li, ‘‘Identifying the optimal machine learning model for predicting
[18] B. Itri, Y. Mohamed, B. Omar, and Q. Mohamed, ‘‘Empirical oversampling
car insurance claims: A comparative study utilising advanced techniques,’’
threshold strategy for machine learning performance optimisation in
Academic J. Bus. Manag., vol. 5, no. 3, pp. 112–120, 2023.
insurance fraud detection,’’ Int. J. Adv. Comput. Sci. Appl., vol. 11, no. 10,
pp. 1–6, 2020. [41] E. Ileberi, Y. Sun, and Z. Wang, ‘‘A machine learning based credit card
fraud detection using the GA algorithm for feature selection,’’ J. Big Data,
[19] X. Liang, Y. Gao, and S. Xu, ‘‘ASE: Anomaly scoring based ensemble
vol. 9, no. 1, p. 24, Dec. 2022.
learning for highly imbalanced datasets,’’ Expert Syst. Appl., vol. 238,
Mar. 2024, Art. no. 122049. [42] A. Ghorbani and S. Farzai, ‘‘Fraud detection in automobile insurance using
[20] S. Dalal, B. Seth, M. Radulescu, C. Secara, and C. Tolea, ‘‘Predicting fraud a data mining based approach,’’ Int. J. Mechatronics, Elektrical Comput.
in financial payment services through optimized hyper-parameter-tuned Technol., vol. 8, no. 27, pp. 3764–3771, 2018.
XGBoost model,’’ Mathematics, vol. 10, no. 24, p. 4679, Dec. 2022. [43] H. Ahmad, B. Kasasbeh, B. Al-Dabaybah, and E. Rawashdeh, ‘‘EFN-
[21] A. Y. Kusdiyanto and Y. Pristyanto, ‘‘Machine learning models for SMOTE: An effective oversampling technique for credit card fraud
classifying imbalanced class datasets using ensemble learning,’’ in Proc. detection by utilizing noise filtering and fuzzy c-means clustering,’’ Int.
5th Int. Seminar Res. Inf. Technol. Intell. Syst. (ISRITI), Dec. 2022, J. Data Netw. Sci., vol. 7, no. 3, pp. 1025–1032, 2023.
pp. 648–653. [44] H. Kaur, H. S. Pannu, and A. K. Malhi, ‘‘A systematic review on
[22] A. A. Feitosa-Neto, J. C. Xavier-Júnior, A. M. P. Canuto, and imbalanced data challenges in machine learning: Applications and
A. C. M. Oliveira, ‘‘A study of model and hyper-parameter selection solutions,’’ ACM Comput. Surv., vol. 52, no. 4, pp. 1–36, Jul. 2020.
strategies for classifier ensembles: A robust analysis on different [45] S. Das, S. Datta, and B. B. Chaudhuri, ‘‘Handling data irregularities
optimization algorithms and extended results,’’ Natural Comput., vol. 20, in classification: Foundations, trends, and future challenges,’’ Pattern
no. 4, pp. 805–819, Dec. 2021. Recognit., vol. 81, pp. 674–693, Sep. 2018.
[23] M. A. Caruana and L. Grech, ‘‘Automobile insurance fraud detection,’’ [46] F. G. Rebitschek, G. Gigerenzer, and G. G. Wagner, ‘‘People underestimate
Commun. Statist., Case Stud., Data Anal. Appl., vol. 7, no. 4, pp. 520–535, the errors made by algorithms for credit scoring and recidivism prediction
Oct. 2021. but accept even fewer errors,’’ Sci. Rep., vol. 11, no. 1, p. 20171, Oct. 2021.
[24] A. Ali, S. Abd Razak, S. H. Othman, T. A. E. Eisa, A. Al-Dhaqm, [47] M. E. Lokanan and V. Maddhesia, ‘‘Supply chain fraud prediction with
M. Nasser, T. Elhassan, H. Elshafie, and A. Saif, ‘‘Financial fraud detection machine learning and artificial intelligence,’’ Int. J. Prod. Res., vol. 63,
based on machine learning: A systematic literature review,’’ Appl. Sci., no. 1, pp. 286–313, Jan. 2025.
vol. 12, no. 19, p. 9637, Sep. 2022. [48] S. Subudhi and S. Panigrahi, ‘‘Effect of class imbalanceness in detecting
[25] S. Subudhi and S. Panigrahi, ‘‘Use of optimized fuzzy C-means clustering automobile insurance fraud,’’ in Proc. 2nd Int. Conf. Data Sci. Bus. Anal.
and supervised classifiers for automobile insurance fraud detection,’’ (ICDSBA), Sep. 2018, pp. 528–531.
J. King Saud Univ.-Comput. Inf. Sci., vol. 32, no. 5, pp. 568–575, Jun. 2020. [49] B. Itri, Y. Mohamed, Q. Mohammed, and B. Omar, ‘‘Performance com-
[26] M. K. Severino and Y. Peng, ‘‘Machine learning algorithms for fraud parative study of machine learning algorithms for automobile insurance
prediction in property insurance: Empirical evidence using real-world fraud detection,’’ in Proc. 3rd Int. Conf. Intell. Comput. Data Sci. (ICDS),
microdata,’’ Mach. Learn. Appl., vol. 5, Sep. 2021, Art. no. 100074. Oct. 2019, pp. 1–4.
[27] Z. S. Rubaidi, B. Ben Ammar, and M. Ben Aouicha, ‘‘Vehicle insurance [50] S. Padhi and S. Panigrahi, ‘‘Decision templates based ensemble classifiers
fraud detection based on hybrid approach for data augmentation,’’ J. Inf. for automobile insurance fraud detection,’’ in Proc. Global Conf.
Assurance & Secur., vol. 18, no. 5, pp. 135–146, 2023. Advancement Technol. (GCAT), Oct. 2019, pp. 1–5.
[51] S. Harjai, S. K. Khatri, and G. Singh, ‘‘Detecting fraudulent insur- AFSANEH GHEYSARBEIGI received the M.S.
ance claims using random forests and synthetic minority oversampling degree in information technology engineering,
technique,’’ in Proc. 4th Int. Conf. Inf. Syst. Comput. Netw. (ISCON), majoring in e-commerce (machine learning in car
Nov. 2019, pp. 123–128. insurance fraud detection) from Iran University
[52] N. S. Patil, S. Kamanavalli, S. Hiregoudar, S. Jadhav, S. Kanakraddi, of Science and Technology, Iran, in 2024. Her
and N. D. Hiremath, ‘‘Vehicle insurance fraud detection system using research interests include decision support sys-
robotic process automation and machine learning,’’ in Proc. Int. Conf. tems, industrial engineering, artificial intelligence,
Intell. Technol. (CONIT), Jun. 2021, pp. 1–5.
machine learning, and deep learning.
[53] S.-Z. Shareh Nordin, Y. B. Wah, N. K. Haur, A. Hashim, N. Rambeli, and
N. A. Jalil, ‘‘Predicting automobile insurance fraud using classical and
machine learning models,’’ Int. J. Electr. Comput. Eng., vol. 14, no. 1,
p. 911, Feb. 2024.
[54] M. Zhu, Y. Zhang, Y. Gong, C. Xu, and Y. Xiang, ‘‘Enhancing credit card
fraud detection a neural network and SMOTE integrated approach,’’ 2024,
arXiv:2405.00026.
[55] M. Abdul Salam, K. M. Fouad, D. L. Elbably, and S. M. Elsayed,
‘‘Federated learning model for credit card fraud detection with data bal-
MORTEZA RAKHSHANINEJAD received the
ancing techniques,’’ Neural Comput. Appl., vol. 36, no. 11, pp. 6231–6256,
Apr. 2024. M.S. degree in information technology engineer-
[56] S. K. Majhi, ‘‘Fuzzy clustering algorithm based on modified whale ing, majoring in e-commerce (machine learning
optimization algorithm for automobile insurance fraud detection,’’ Evol. in fraud detection systems) from Iran University
Intell., vol. 14, no. 1, pp. 35–46, Mar. 2021. of Science and Technology, Iran, in 2019. His
[57] S. Subudhi and S. Panigrahi, ‘‘Detection of automobile insurance fraud research interests include applied machine learn-
using feature selection and data mining techniques,’’ Int. J. Rough Sets ing, information systems, computational biology,
Data Anal., vol. 5, no. 3, pp. 1–20, Jul. 2018. bioinformatics, and big data.
[58] E. Ileberi, Y. Sun, and Z. Wang, ‘‘Performance evaluation of machine
learning methods for credit card fraud detection using SMOTE and
AdaBoost,’’ IEEE Access, vol. 9, pp. 165286–165294, 2021.
[59] M. Zakariah, S. A. AlQahtani, and M. S. Al-Rakhami, ‘‘Machine learning-
based adaptive synthetic sampling technique for intrusion detection,’’ Appl.
Sci., vol. 13, no. 11, p. 6504, May 2023.
[60] X. Zhao and S. Guan, ‘‘CTCN: A novel credit card fraud detection
method based on conditional tabular generative adversarial networks and
temporal convolutional network,’’ PeerJ Comput. Sci., vol. 9, Oct. 2023, MOHAMMAD FATHIAN received the M.S. and
Art. no. e1634. Ph.D. degrees in industrial engineering from Iran
[61] H. Zamani, M. H. Nadimi-Shahraki, and A. H. Gandomi, ‘‘QANA: University of Science and Technology, Tehran.
Quantum-based avian navigation optimizer algorithm,’’ Eng. Appl. Artif. He is currently a Professor with the School of
Intell., vol. 104, Sep. 2021, Art. no. 104314. Industrial Engineering, Iran University of Science
[62] M. H. Nadimi-Shahraki, A. Fatahi, H. Zamani, and S. Mirjalili, ‘‘Binary and Technology. He is working in the areas of
approaches of quantum-based avian navigation optimizer to select effective information technology and industrial engineer-
features from high-dimensional medical data,’’ Mathematics, vol. 10,
ing. He has more than 90 journal articles and five
no. 15, p. 2770, Aug. 2022.
books in the areas of industrial engineering and
[63] M. H. Nadimi-Shahraki, ‘‘An effective hybridization of quantum-based
information technology.
avian navigation and bonobo optimizers to solve numerical and mechanical
engineering problems,’’ J. Bionic Eng., vol. 20, no. 3, pp. 1361–1385,
May 2023.
[64] R. R. Mostafa, O. Kisi, R. M. Adnan, T. Sadeghifar, and A. Kuriqi,
‘‘Modeling potential evapotranspiration by improved machine learning
methods using limited climatic data,’’ Water, vol. 15, no. 3, p. 486,
Jan. 2023.
[65] A. Singh, A. Jain, and S. E. Biable, ‘‘Financial fraud detection approach FARNAZ BARZINPOUR received the Ph.D.
based on firefly optimization algorithm and support vector machine,’’ Appl.
degree from Tarbiat Modares University, in 2004.
Comput. Intell. Soft Comput., vol. 2022, pp. 1–10, Jun. 2022.
She is currently an Associate Professor of indus-
[66] I.-S. Oh, J.-S. Lee, and B.-R. Moon, ‘‘Hybrid genetic algorithms for
feature selection,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 11,
trial engineering with Iran University of Science
pp. 1424–1437, Nov. 2004. and Technology. She teaches operations research,
[67] A. Kuznetsov, M. Karpinski, R. Ziubina, S. Kandiy, E. Frontoni, meta-heuristics algorithms, principles of logistics
O. Peliukh, O. Veselska, and R. Kozak, ‘‘Generation of nonlinear and supply chain engineering, research method-
substitutions by simulated annealing algorithm,’’ Information, vol. 14, ology, and integer programming. Her research
no. 5, p. 259, Apr. 2023. interests include optimization and meta-heuristic
[68] Y. Abakarim, M. Lahby, and A. Attioui, ‘‘A bagged ensemble convolutional algorithms, health systems engineering, and
neural networks approach to recognize insurance claim frauds,’’ Appl. Syst. supply chain management.
Innov., vol. 6, no. 1, p. 20, Jan. 2023.
The study's BQANA method was compared against other methodologies such as PSO-based optimization for XGBoost, stacking models with oversampling for class imbalance, and fuzzy clustering with advanced classifiers. BQANA was found to be superior in classification accuracy, especially in dealing with imbalanced datasets .
After applying hyperparameter optimization with algorithms like BQANA, the machine learning models showed marked improvements in predictive accuracy. For example, the ensemble model's accuracy increased from 99.77% without tuning to 99.94% with BQANA at a 1:1 ratio .
Dataset ratio variations influence model performance by altering the balance between training and testing sets. In this study, models achieved higher accuracy at a 1:1 ratio after hyperparameter tuning, as demonstrated by the BQANA algorithm's performance, which provided a comprehensive balance of fraud detection capabilities .
The study validated BQANA's effectiveness by comparing its tuned ensemble model at a 1:1 ratio against models from other advanced methodologies. The comparative metrics demonstrated its superiority in accuracy and recall, positioning BQANA as the most effective among evaluated techniques for insurance fraud detection .
The study underscores the limitations of traditional methods like Grid Search or manual tuning, advocating for quantum-inspired algorithms like BQANA. These advanced methods not only reduce manual labor but also significantly boost model performance in fraud detection, enabling more accurate and efficient detection processes .
With hyperparameter tuning using BQANA, the precision and recall metrics improved significantly compared to the untuned models. Precision rose to 98.93% and recall reached a perfect score of 100% in ensemble models at a 1:1 ratio, demonstrating the method's ability to correctly identify fraudulent claims while minimizing false positives .
Metaheuristic algorithms like BQANA, SA, and GA optimize hyperparameters of machine learning models, significantly improving their accuracy and robustness in fraud detection tasks by fine-tuning model parameters beyond default or manually-set options .
BQANA achieved the highest accuracy for hyperparameter tuning in insurance fraud detection models, outperforming SA and GA in most metrics. Specifically, the ensemble method with BQANA at a 1:1 ratio recorded an accuracy of 99.94%, surpassing SA's 99.87% and GA's 99.84% .
Using BQANA offers superior accuracy and recall in model performance but incurs a higher computational cost. For instance, BQANA's runtime for SVM at a 1:1 ratio is approximately 512 seconds, compared to GA's 398 seconds and SA's 312 seconds, reflecting a trade-off between computational efficiency and performance .
Ensemble models, which combine multiple classifiers, enhance detection capabilities by leveraging diverse algorithms like SVM, RF, and XGBoost. This study found that using an ensemble model tuned with BQANA improved performance metrics across accuracy, precision, recall, and F1 score, particularly when different algorithms are optimized and used together .