Enhancing Performance of Financial Fraud
Enhancing Performance of Financial Fraud
(2023) 07
DOI: 10.47991/2996-4954/JCETAI-101
Abstract
Despite attempts to reduce it, financial fraud continues to be a major problem in many industries, including healthcare, banking,
and insurance. Traditional fraud detection techniques, which are often manual, are inefficient, time-consuming, and costly. As
a result, methods that use AI and ML have been implemented to improve fraud detection procedures. This study examines the
application of ML algorithms for credit card fraud detection using a dataset consisting of 284,807 transactions made by
European cardholders in 2013, out of which 492 were fraudulent. Preprocessing steps, including Label Encoding, SMOTE for
handling class imbalance, and PCA for feature reduction, were applied to the dataset. On the training dataset have applied ML
based classification models like DT, SVM, and ANNs were employed to evaluate their performance. The models were assessed
using accuracy, precision, and recall as key metrics. The ANN model emerged as the best-performing model, achieving
98.41%precision, 98.69%accuracy, and 98.98%recall, outperforming both Decision Trees and SVM. This study highlights the
effectiveness of ML models, particularly ANNs, in improving financial fraud detection.
Keywords: Financial Fraud, Machine Learning, Credit Card Transaction Dataset, Detection.
findings on model performance and effectiveness in 85% with an accuracy of above 90%. According to our research,
detecting fraudulent activities. the tiered random forest achieves a recall of 72% and a precision
• It demonstrates how preprocessing techniques like SMOTE of 85%, making it the most effective algorithm compared to
for handling class imbalance and PCA for feature reduction SVM and logistic regression [9].
can improve model performance in fraud detection.
• The study shows that Artificial Neural Networks (ANNs) In this paper Erfani, Shoeleh and Ghorbani, (2020), provide a
outperform other models according to accuracy, precision, streamlined system for identifying fraudulent activities. In order
and recall, making it a highly effective technique for to identify fraud, our methodology employs deep support vector
financial fraud detection. data description after a unique preprocessing and subsampling
phase. They offer a trend analysis that takes into account the
Structure of the paper dimensions of the training and test datasets as well as the
This research is organised in the following way: Predicting model's performance as measured by ROC-AUC and AP. Last
online sales is the focus of Section 2, which summarises current but not least, our method beats the advance binary classifiers,
approaches. The approach, including data management and RF and SVM, in several tests. The best values are 90% for AP
model application, is described in Section 3. The outcomes of and 93% for ROC-AUC, demonstrating its outstanding
the experiments are detailed and discussed in Section 4. Section performance [10].
5 presents the important findings and suggests areas for further
research. This paper Arun and Venkatachalapathy, (2020), announces a
new C-LSTM model for detecting credit fraud that is based on
Literature Review DL. Two steps are involved in the suggested C-LSTM model:
This section reviews key machine learning research on similar preprocessing and classification. Using a German Credit and
datasets and challenges, highlighting influential methods and Kaggle's CCFD datasets, we verify a performance of a C-LSTM
studies. Table 1 summarizes the relevant literature for financial model. The obtained experimental outcomes demonstrated that,
fraud detection. when applied to a German credit and CCFD dataset, the C-
LSTM model performed well with accuracy of 94% and 94.65%
This paper Rai and Dwivedi, (2020), proposes a way to detect
[11].
fraudulent activity in credit card data by using a NN based
unsupervised learning methodology. This new approach
Table 1: Comparative research of Financial Fraud
outperforms the state-of-the-art AE, LOF, IF, and K-Means
Detection using machine and deep learning techniques
clustering algorithms. In comparison to the existing approaches,
Ref Methodolog Datase Result Limitation and
which include AE, IF, LOF, and K Means, the proposed NN-
y t future work
based fraud detection system achieves an accuracy of 99.87%
[6]. [1] Neural Credit 99.87% May not
Network card accurac generalize to
(NN) based data y different types
This study Hidayattullah, Surjandari and Laoh, (2020), uses a
unsupervised of fraud or data
variety of ML methods grounded on meta-heuristic optimisation
learning variations
to construct reliable financial statement fraud prediction models.
technique
Two different types of classification algorithms were employed:
SVMs and Back Propagation Neural Networks. This study's top [2] Meta- Financ SVM May be limited
classifier is a SVM, with 96.15% accuracy achieved by heuristic ial with by the quality
optimising its parameters using a Genetic Algorithm [7]. optimization statem Genetic and
with Back ents Algorit representativene
This paper Mubalaike and Adali, (2018), seeks to comprehend Propagation hm: ss of financial
the ways in which DL models may be helpful in accurately Neural 96.15% data used
identifying fraudulent transactions. The preprocessed data is Networks accurac
then subjected to the best ML and DL methods, including and SVM y
ensembles of decision trees (EDTs) and SAEs and RBM [3] Ensemble of One EDT: Performance
classifiers. An optimum accuracy value are 90.49%, 80.52%, Decision month 90.49%, might vary with
and 90.49%, respectively. A closer look at the findings shows Tree (EDT), of SAE: different
that RBM outperforms the alternatives [8]. Stacked financi 80.52%, datasets and the
Auto- al logs RBM: extensive
This study Gardner et al. (2019), stress the need of creating a Encoders from a 91.53% computation
system that can identify anomalies in financial transactions (SAE), RBM mobile required
using three different components. In order to build the system, money
many RFC with distinct fitness functions are fine-tuned. By service
optimising the RF parameters to meet the fitness function, the [4] Three-tiered Not 96% High
procedure is carried out using a randomised grid search. When anomaly specifi correct complexity of
all of the models are finished, they are compared to create three detection ed fraud tuning and
levels of discovered frauds, with varying degrees of accuracy in system with classific parameter
each level. Detected frauds may be categorised into multiple randomized ation; optimization;
levels for improved recall and precision. Using this method, we grid search Precisio may not scale
are able to accurately classify 96% of frauds while detecting and multiple n over well
Research gaps
While existing studies on fraud detection have demonstrated
significant advancements with various methodologies—such as Figure 1: Proposed flowchart of financial fraud detection
NN-based unsupervised learning, meta-heuristic optimization, system.
and deep learning techniques—there remains a notable research A following data flowchart of financial fraud detection system
gap in generalizing these methods across diverse datasets and steps are listed, shows in figure 1. Each level of data processing
real-world scenarios. Many approaches are optimized for in the system is explained in depth.
specific datasets, such as mobile money transactions or financial 1) Data collection
statements, limiting their applicability to broader contexts. The process of data collection involves collecting relevant
Additionally, the complexity of tuning and preprocessing information from many sources, such as sensors and databases,
methods, along with the variability in performance metrics like in order to construct a large and representative dataset for
accuracy and precision, indicates a need for more robust and analysis. There is a notable disparity between the 284,807 credit
adaptable frameworks. Future research should focus on card transactions recorded for European cardholders in
developing universal models that integrate advanced techniques September 2013 and the 492 fraudulent transactions.
and improve generalization, while also addressing the resource- 2) Data preprocessing
intensive nature of current optimization processes. The term "data processing" refers to the steps used to make raw
data more suitable for analysis. This includes resolving missing
Research Methodology data, removing duplicates, and encoding category variables.
In this research aim to provide the efficient ML based financial Data preparation for modelling involves checking for quality
fraud detection system. Beginning with the gathering of a and consistency in order to boost the efficiency of later analyses
dataset consisting of 284,807 transactions by European cards, and ML algorithms. Here are some of the most important pre-
including 492 instances of fraud, the technique for evaluating processing steps:
this information for the purpose of detecting fraud uses a
• Label Encoding: Label Encoding is a way to make
structured approach. The data is carefully prepared for analysis
numerical values out of category data. Algorithms that work
by filling in missing values, eliminating duplicates,
with numerical input can handle categorical data since each
standardising it, and encoding categorical variables employing
category is given its own distinct integer.
Label Encoding. Using SMOTE and PCA for feature selection
• SMOTE: The SMOTE, which generates synthetic samples
helps with class imbalance. The dataset is then divided into
for the minority class, is one way to remedy class
training (80%) and testing (20%) subsets. An assortment of
imbalance. By interpolating between preexisting data
classification models, such as DT, NB, and ANNs, are tested and
points, it generates additional, synthetic data points, hence
assessed for their accuracy and efficacy in differentiating
enhancing the performance of ML models on unbalanced
between genuine and fraudulent transactions. Figure 1 shows a
datasets.
flow diagram of the system that detects financial fraud.
• Feature Selection: Principle Component Analysis converts
a dataset's characteristics into a new collection of variables
known as principle components, therefore lowering a
number of features in the dataset. These elements capture
the majority of the volatility in the data, enabling a more
condensed depiction without sacrificing crucial details.
in%
98.6 98.41
• True Positive (TP): This situation occurs when the 98.5
98.4
expected and actual classes of a data point are both 1. 98.3
98.2
• True Negative (TN): A data point is considered to have 98.1
this property when its anticipated and actual classes are both Accuracy Precision Recall
0.
• False Positive (FP): This happens when a data point has a
Performance measures
predicted class of 1 but a real class of 0.
• False Negative (FN): To put it simply, this happens when
a data point has a real class of 1 but an anticipated class of
0. Figure 4: Financial Fraud Detection performance with ANN
Comparative analysis
The table 2 below provides a comparative analysis of multiple
DL and ML models used for financial fraud detection,
specifically applied to a Credit Card Transaction dataset. For a
thorough evaluation of the models' efficacy in identifying
fraudulent behaviour in this domain, it details their performance
indicators.
IN %
SVM [15] 72.3 60 96.4 50
ANN 98.69 98.41 98.98
0
Decision SVM ANN
Comparison of different models of Tree
Accuracy MODELS
88 98.69
72.3 The three models' recall scores are compared in Figure 8. The
model that performs best in terms of memory is the ANN, which
stands out with the greatest recall score of 98.98%. The
DECISION TREE SVM ANN DT model, on the other hand, has the lowest recall, at 81%.
MODELS
This study's results show that when compared to more
conventional ML models like DT and SVM, the ANN model
performs far better in detecting financial fraud. The confusion
Figure 6: Comparison of different models of Accuracy matrix shows that the ANN achieves an outstanding
accuracy98.69%, precision98.41%, and recall98.98% when it
Figure 6 is a bar graph that shows how various models' accuracy comes to properly recognising both fraudulent and non-
is compared. The graph indicates that the ANN achieves the fraudulent transactions. The comparative analysis highlights the
highest accuracy with a score of 98.69%, while the SVM model ANN's significant advantages over the SVM, which exhibited
exhibits the lowest accuracy, scoring 72.3%. the lowest accuracy (72.3%) and precision (60%), and the
Decision Tree model, which, despite better precision (95%),
C ompar iso n of dif f e r e nt mode ls of lagged in recall (81%). These findings indicate that the ANN not
Pr e c ision only minimizes false positives and negatives effectively but also
ensures a balanced performance across key metrics, suggesting
120 that deep learning approaches are more adept at handling
95 98.41
100 imbalanced datasets, making them particularly suitable for
financial fraud detection applications.
80
60 Conclusion and Future Work
IN %
60
Frauds are said to be dynamic and lacking in patterns, making
40 them difficult to identify. Fraudsters profit from new
technological developments. They manage to circumvent all
20 security measures, which leads to a massive financial loss. One
0
way to keep tabs on fraudulent transactions is to use data mining
Precision techniques to analyze and detect unusual behaviors. Credit card
transaction analysis has been the primary focus of this study's
Decision Tree SVM ANN investigation into ML methods for financial fraud detection.
ML's efficacy in differentiating between genuine and fraudulent
Figure 7: Comparison of different models of Precision. transactions was proved via the application of classification
models such as DT, SVM, and ANN. The most resilient model
Figure 7 illustrates the precision comparison among the models. for fraud detection was ANN, which had the greatest
The bar graph shows that the SVM model has the lowest performance among the models with accuracy (98.69), precision
precision score at 60%, while the ANN attains a highest (98.41), and recall (98.98). The findings point to the value of
precision with a value of 98.41%, respectively. ML for detecting financial crime and provide a way forward for
creating better fraud prevention systems.
exploring methods for reducing false positives, which are and Information Security, IWBIS 2020, 2020. doi:
critical in practical implementations. Model generalizability 10.1109/IWBIS50925.2020.9255563.
may be further improved by using more and more varied 8. A. M. Mubalaike and E. Adali, “Deep Learning Approach
datasets. for Intelligent Financial Fraud Detection System,” in
UBMK 2018 - 3rd International Conference on Computer
References Science and Engineering, 2018. doi:
1. A. Mousa, “Detecting Financial Fraud Using Data Mining 10.1109/UBMK.2018.8566574.
Techniques: A Decade Review from 2004 to 2015,” J. Data 9. C. Gardner, D. C.-T. Lo, J.-C. Chern, P. Paschos, and C.
Sci., vol. 14, no. 3, pp. 553–570, 2016, doi: Ng, “Tiered Financial Fraud Detection Utilizing Precision
10.6339/jds.201607_14(3).0010. Stratified Random Forest Assembly,” in 2019 IEEE 5th
2. N. F. Ryman-Tubb, P. Krause, and W. Garn, “How International Conference on Big Data Intelligence and
Artificial Intelligence and machine learning research Computing (DATACOM), 2019, pp. 254–257. doi:
impacts payment card fraud detection: A survey and 10.1109/DataCom.2019.00047.
industry benchmark,” Engineering Applications of 10. M. Erfani, F. Shoeleh, and A. A. Ghorbani, “Financial
Artificial Intelligence. 2018. doi: Fraud Detection using Deep Support Vector Data
10.1016/j.engappai.2018.07.008. Description,” in Proceedings - 2020 IEEE International
3. M. R. Kishore Mullangi, Vamsi Krishna Yarlagadda, Conference on Big Data, Big Data 2020, 2020. doi:
Niravkumar Dhameliya, “Integrating AI and Reciprocal 10.1109/BigData50022.2020.9378256.
Symmetry in Financial Management: A Pathway to 11. G. K. Arun and K. Venkatachalapathy, “Convolutional
Enhanced Decision-Making,” Int. J. Reciprocal Symmetry Long Short Term Memory Model for Credit Card
Theor. Phys., vol. 5, no. 1, pp. 42–52, 2018. Detection,” in Proceedings of the 4th International
4. S. K. R. A. Sai Charan Reddy Vennapusa, Takudzwa Conference on Electronics, Communication and Aerospace
Fadziso, Dipakkumar Kanubhai Sachani, Vamsi Krishna Technology, ICECA 2020, 2020. doi:
Yarlagadda, “Cryptocurrency-Based Loyalty Programs for 10.1109/ICECA49313.2020.9297606.
Enhanced Customer Engagement,” Technol. Manag. Rev., 12. A. Khashman and K. Dimililer, “Neural networks
vol. 3, no. 1, pp. 46–62, 2018. arbitration for optimum DCT image compression,” in
5. H. Zhou et al., “A distributed approach of big data mining EUROCON 2007 - The International Conference on
for financial fraud detection in a supply chain,” Comput. Computer as a Tool, 2007. doi:
Mater. Contin., 2020, doi: 10.32604/CMC.2020.09834. 10.1109/EURCON.2007.4400236.
6. A. K. Rai and R. K. Dwivedi, “Fraud Detection in Credit 13. H. M and S. M.N, “A Review on Evaluation Metrics for
Card Data using Unsupervised Machine Learning Based Data Classification Evaluations,” Int. J. Data Min. Knowl.
Scheme,” in Proceedings of the International Conference Manag. Process, 2015, doi: 10.5121/ijdkp.2015.5201.
on Electronics and Sustainable Communication Systems, 14. K. Kumain, “Analysis of Fraud Detection on Credit Cards
ICESC 2020, 2020. doi: using Data Mining Techniques,” Turkish J. Comput. Math.
10.1109/ICESC48915.2020.9155615. Educ., 2020, doi: 10.17762/turcomat.v11i1.13590.
7. S. Hidayattullah, I. Surjandari, and E. Laoh, “Financial 15. D. Zhang, B. Bhandari, and D. Black, “Credit Card Fraud
statement fraud detection in indonesia listed companies Detection Using Weighted Support Vector Machine,” Appl.
using machine learning based on meta-heuristic Math., vol. 11, no. 12, pp. 1275–1291, 2020, doi:
optimization,” in 2020 International Workshop on Big Data 10.4236/am.2020.1112087.