Credit Card Fraud Detection Using State-Of-The-Art Machine Learning and Deep Learning Algorithms
Credit Card Fraud Detection Using State-Of-The-Art Machine Learning and Deep Learning Algorithms
ABSTRACT People can use credit cards for online transactions as it provides an efficient and easy-to-use
facility. With the increase in usage of credit cards, the capacity of credit card misuse has also enhanced. Credit
card frauds cause significant financial losses for both credit card holders and financial companies. In this
research study, the main aim is to detect such frauds, including the accessibility of public data, high-class
imbalance data, the changes in fraud nature, and high rates of false alarm. The relevant literature presents
many machines learning based approaches for credit card detection, such as Extreme Learning Method,
Decision Tree, Random Forest, Support Vector Machine, Logistic Regression and XG Boost. However, due
to low accuracy, there is still a need to apply state of the art deep learning algorithms to reduce fraud
losses. The main focus has been to apply the recent development of deep learning algorithms for this
purpose. Comparative analysis of both machine learning and deep learning algorithms was performed to
find efficient outcomes. The detailed empirical analysis is carried out using the European card benchmark
dataset for fraud detection. A machine learning algorithm was first applied to the dataset, which improved the
accuracy of detection of the frauds to some extent. Later, three architectures based on a convolutional neural
network are applied to improve fraud detection performance. Further addition of layers further increased the
accuracy of detection. A comprehensive empirical analysis has been carried out by applying variations in
the number of hidden layers, epochs and applying the latest models. The evaluation of research work shows
the improved results achieved, such as accuracy, f1-score, precision and AUC Curves having optimized
values of 99.9%,85.71%,93%, and 98%, respectively. The proposed model outperforms the state-of-the-art
machine learning and deep learning algorithms for credit card detection problems. In addition, we have
performed experiments by balancing the data and applying deep learning algorithms to minimize the false
negative rate. The proposed approaches can be implemented effectively for the real-world detection of credit
card fraud.
INDEX TERMS Fraud detection, deep learning, machine learning, online fraud, credit card frauds,
transaction data analysis.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
39700 VOLUME 10, 2022
F. K. Alarfaj et al.: CCF Detection Using State-of-the-Art ML and DL Algorithms
that the future is heading towards a cashless culture. As a studies have examined the application of deep neural net-
result, typical payment methods will no longer be used in the works in identifying CCF. [3]. It uses a number of deep
future, and therefore they will not be helpful for expanding learning algorithms for detecting CCF. However, in this study,
a business. Customers will not always visit the business with we choose the CNN model and its layers to determine if the
cash in their pockets. They are now placing a premium on original fraud is the normal transaction of qualified datasets.
debit and credit card payments. As a result, companies will Some transactions are common in datasets that have been
need to update their environment to ensure that they can labelled fraudulent and demonstrate questionable transaction
take all types of payments. In the next years, this situation behaviour. As a result, we focus on supervised and unsuper-
is expected to become much more severe [1]. vised learning in this research paper.
In 2020, there were 393,207 cases of CCF out of approx- The class imbalance is the problem in ML where the total
imately 1.4 million total reports of identity theft [4]. CCF is number of a class of data (positive) is far less than the total
now the second most prevalent sort of identity theft recorded number of another class of data (negative). The classification
as of this year, only following government documents and challenge of the unbalanced dataset has been the subject of
benefits fraud [5]. In 2020, there were 365,597 incidences of several studies. An extensive collection of studies can provide
fraud perpetrated using new credit card accounts [10]. The several answers. Therefore, to the best of our knowledge,
number of identity theft complaints has climbed by 113% the problem of class imbalance has not yet been solved.
from 2019 to 2020, with credit card identity theft reports We propose to alter the DL algorithm of the CNN model
increasing by 44.6% [14]. Payment card theft cost the global by adding the additional layers for features extraction and
economy $24.26 billion last year. With 38.6% of reported the classification of credit card transactions as fraudulent or
card fraud losses in 2018, the United States is the most otherwise. The top attributes from the prepared dataset are
vulnerable country to credit theft. ranked using feature selection techniques. After that, CCF is
As a result, financial institutions should prioritize equip- classified using several supervised machine-driven and deep
ping themselves with an automated fraud detection system. learning models.
The goal of supervised CCF detection is to create a machine In this study, the main aim is to detect fraudulent trans-
learning (ML) model based on existing transactional credit actions using credit cards with the help of ML algorithms
card payment data. The model should distinguish between and deep learning algorithms. This study makes the following
fraudulent and nonfraudulent transactions, and use this infor- contributions:
mation to decide whether an incoming transaction is fraud- • Feature selection algorithms are used to rank the top
ulent or not. The issue involves a variety of fundamental features from the CCF transaction dataset, which help
problems, including the system’s quick reaction time, cost in class label predictions.
sensitivity, and feature pre-processing. ML is a field of arti- • The deep learning model is proposed by adding a num-
ficial intelligence that uses a computer to make predictions ber of additional layers that are then used to extract
based on prior data trends [1] the features and classification from the credit card farad
ML models have been used in many studies to solve detection dataset.
numerous challenges. Deep learning (DL) algorithms applied • To analyse the performance CNN model, apply different
applications in computer network, intrusion detection, bank- architecture of CNN layers.
ing, insurance, mobile cellular networks, health care fraud • To perform a comparative analysis between ML with
detection, medical and malware detection, detection for video DL algorithms and proposed CNN with baseline model,
surveillance, location tracking, Android malware detection, the results prove that the proposed approach outperforms
home automation, and heart disease prediction. We explore existing approaches.
the practical application of ML, particularly DL algorithms, • To assess the accuracy of the classifiers, performance
to identify credit card thefts in the banking industry in this evaluation measures, accuracy, precision, and recall are
paper. For data categorisation challenges, the support vector used. Experiments are performed on the latest credit
machine (SVM) is a supervised ML technique. It is employed cards dataset.
in a variety of domains, including image recognition [25], The rest of the paper is structured as follows: The second
credit rating [5], and public safety [16]. SVM can tackle section examines the related works. The proposed model and
linear and nonlinear binary classification problems, and it its methodology are described in depth in Section 3. The
finds a hyperplane that separates the input data in the support dataset and evaluation measures are described in Section 4.
vector, which is superior to other classifiers. Neural networks It also shows the outcomes of our tests on a real dataset,
were the first method used to identify credit card theft in as well as the analysis. Finally, Section 5 concludes the paper.
the past [4]. As a result, (DL), a branch of ML, is currently
focused on DL approaches. II. RELATED WORK
In recent years, deep learning approaches have received In the field of CCF detection, several research studies have
significant attention due to substantial and promising out- been carried out. This section presents different research stud-
comes in various applications, such as computer vision, nat- ies revolving around CCF detection. Moreover, we strongly
ural language processing, and voice. However, only a few emphasise the research that reported fraud detection in the
regards the size of neural networks, and it is considered takes two main modules. In training, all of the modules make
the backpropagation model [8], [16]. The efficiency of the up a model of DL, which is a neural network.
backpropagation algorithm decreases greatly, increasing the The main two methods used are a generator (G) and a
depth of the neural networks, which can cause problems, discriminator (D). The network of the generator can generate
such as insufficient local goals and a dilution of errors. Deep the data as simulated, and the difference between the simu-
designs should be considered to be an achievement. They can lated data and the target data determines the discriminator,
theoretically address the optimisation struggle in a profound yielding a determination that is true and false around the
manner within the training parameters [17], [18]. virtual data. Finally, the model may generate higher-quality
The training technique of the deep belief network is often simulation data to finish the data creation process [22], [23].
considered the effective primary case of deep architecture A VAE is a variational autoencoder with regularised training
training. Traditional ML algorithms, such as SVM, DT and circulation to guarantee that its hidden space has adequate
LR, have been extensively proposed for CCF detection [3]. assets, allowing us to create fresh data. A VAE is generated
These traditional algorithms are not very well suited for large by introducing variation on the basis of the autoencoder. The
datasets. A CNN is a DL method; it can deeply relate to three- VEG and the GAN are extremely similar. Once again, the goal
dimensional data, such as image processing. This method is is to change and match the data distribution to generate virtual
similar to the ANN; the CNN has the same structure hidden data that is near the target [8], [22].
layer and a different number of channels in each layer in Usually, the number of samples is similar to that of a
addition to special convolution layers. The idea of moving normal distribution. If all examples are found, the work can be
filters through word convolution is linked to the data that very successful. Consequently, investigators frequently use
can be used to capture the key information and automatically neural networks to approximate the mean and modification
performs feature reduction. Thus, the CNN is widely used of normal distribution. Long short-term memory (LSTM) is
in image processing. The CNN does not require heavy data an artificial recurrent neural network (RNN) architecture used
pre-processing for training. in DL models [24], [25]. The LSTM network is compatible
For image processing, the purpose of using a CNN is to with categorising, processing and building predictions based
minimise processing without losing key features by reducing on time sequence data. The most common type of RNN is
the image to make predictions [4], [6]. The main terms in the LSTM. An ordinary neural network (NN) cannot keep
the CNN are feature maps, channels, pooling, stride, and track of the preceding information of a learning task every
padding. For text, image and video processing, CNN models time they have to perform a task. In very simple words, with
are conventionally used and take two-dimensional data as memory, the RNN is a neural network [26], [27]. RNNs tend
input, which is called the 2DCNN. To learn the internal to have short-term memory because of the vanishing gradient
representation, the feature mapping process is used from problem. The backbone of neural networks is backpropaga-
the input data. The location of features is not relevant, and tion, as it reduces the loss by weights of network adjustment
the same procedure can be used for one-dimensional data. by using gradients that it originated. In RNNs, as the gradient
Natural language processing is a very popular example of a moves the backbone in the network, it shrinks, and then there
1DCNN application where sequence classification becomes is a minor update in weight. These small updates are affected
a problem. In a 1DCNN, the kernel filter moves top to bottom by the earlier layers in the network. They do not learn more,
in a sequence of a data sample, rather than moving left to right and the RNN loses the ability to recall early examples in long
and top to bottom in the 2DCNN [17], [18]. sequences, making it a short-term memory network [28].
Raghavan [16] defined an autoencoder as an actual neural The use of DL methods is still very limited, and methods,
network. An autoencoder can also encrypt the data the same such as CNN and LSTM are encouraged for image classifica-
way as it would decrypt the data. In this method, for no tion, natural language processing (NLP), and RBM because
anomalous points, the autoencoders are trained. According to of their ability to handle massive datasets. The way these DL
the reconstruction error, it would present the anomaly ideas methods perform CCF classification is the major focus of this
classify it as ’fraud’ or ‘no fraud,’ meaning that the system has study [29]. In addition, data pre-processing is an important
not been trained, which is predicted to have a higher amount stage in the ML process. How the classification performance
of anomalies [19], [20]. However, a slight value overhead the is affected in response to data pre-processing when detecting
higher bound value or considers the threshold an anomaly. credit cards is another question that needs to be answered.
This technique is also used in [8], an autoencoder-based Table 2 presents the summary of deep learning algorithms.
network detection of an anomaly. A ML model is a generative
adversarial network where two neural networks collaborate to III. RESEARCH METHODOLOGY
improve their prediction accuracy. GANs are often unsuper- Research is said to be methodical, and research methodol-
vised and learn using an obliging zero-sum game framework. ogy is predicated by the applied research method. Applied
The fundamental category of the deep-learning model is a research is administered to unravel the issues. Before real-
GAN [11], [21], and the perception of development for DL world experimentation, the research covers all fundamentals
progress it can offer is the most promising direction. GAN by performing these steps:
TABLE 2. Accuracy based results of deep learning algorithms. TABLE 3. The list of features available in the CCF dataset.
1) EXPERIMENTAL STEP-UP
We discuss the dataset to be cast-off and the achievement
evaluation measurements to be applied.
a: DESCRIPTION OF DATASET
The credit card dataset is accessible for research purposes.
The dataset [11] holds transactions made by a cardholder
over a two-day period, i.e., September 2018. There were
284,807 transactions in total, of which 492, or 0.172 percent, b: APPLIED MACHINE LEARNING & ENSEMBLE LEARNING
were fraudulent. Because disclosing a consumer’s transaction TECHNIQUES
details is considered a problem of confidentiality, the main We use and apply the following machine and ensemble learn-
component analysis is applied to the majority of the dataset’s ing algorithm.
features using principal component analysis (PCA). PCA is a
standard and widely used technique in the relevant literature i) EXTREME LEARNING METHOD
for reducing the dimensionality of such datasets, increasing The extreme learning method (ELM) is a neural network
interpretability but at the same time minimizing information for classification, clustering, regression and feature learning.
loss [2], [4], [19]. It does so by creating new uncorrelated vari- It can be used with one or a multilayer of unseen notes.
ables that successively maximize variance. Table 4 presents Parameters of unseen nodes are tuned. The weights of the
the detail of the dataset containing 31 columns, including output are hidden nodes learned in a single step. This is the
time, V1, V2, V3. . . . . . V28 as PCA applied features, amount, essential amount that is needed to properly learn a linear
and class labels. model. Given a single hidden layer of ELM, we assume that
n
the output function of the j-unseen node is h(z) = G (p, X
αi yi = 0; 0≤α≤C (5)
q, z) wherever the parameters of the jth node are. The output
j=1
function is as follows:
Xn vi) LOGISTIC REGRESSION
fL (z) = γi hi (z) (1)
j=1 Logistic regression is an easy algorithm that estimates the
association between one dependent binary variable and inde-
γi Is the weight of the output the ith hidden node?
pendent variables, computing the probability of the occur-
h (z) = |Ghi (z) , . . . . . . , hL (z)| (2) rence of an event. The regulation parameter C controls the
trade-off between increasing complexity (overfitting) and
ii) DECISION TREE keeping the model simple (underfitting). For large values of
C, the power of regulation is reduced, and the model increases
As a result, the decision tree classifier is used to create the
its complexity, thus overfitting the data. The parameter ‘C’
model, starting with the decision tree. We set the ‘max depth’
is tuned using Randomised Search CV () for the different
to ’4’ in the algorithm, which indicates that the tree can split
datasets: the original, the standardised and the dataset with the
four times, and the ‘criterion’ to ‘entropy,’ which is similar
most important features. Once the parameter ‘C’ is defined
to ‘max depth’ but decides when to stop splitting the tree.
for each dataset, the logistic regression model is initiated and
We have thus finished installing and storing everything.
then fitted to the training data, as described in the methodol-
ogy. The logistic regression hypothesis function can be seen
iii) K-NEAREST NEIGHBOURS (KNN)
below, where the function g(z) is also shown as follows:
Supervised Learning is the learning that the amount or the
result that we want or expect inside the training data (labelled
data), and the amount in the data that we need to learn is hθ (x) = g θ T x (6)
known as the Target or the Dependent Variable. Next, for
the K-Nearest Neighbours (KNN), we build the model using The logistic Regression for the hypothesis can be seen as
the ‘K-Neighbours Classifier’ model and take the value of k, follows:
which represents the nearest neighbour, as ‘5’. The value of
1
the ‘n-neighbours’ is arbitrarily selected, but it can be selected h (x :) = (7)
positively through iterating a range of values, surveyed by 1 + e − θTx
fitting and storing the predicted values into the ’knn-yhat’ Here θ (theta) is a vector of restrictions that our model
variable. calculates to appropriate to our classifier.
five batches (20/2 = 10). All batches are run through the
algorithm; then, we have five iterations per epoch. This
method is often an improvement over the sequential model.
The most modification comes from the Stalk group and a few
slight changes within the module of the sequential model.
d: PERFORMANCE-EVALUTION MEASURES
Traditional methods of estimating ML classifiers can use
confusion metrics relating to the difference between the rock
bottom dataset truth and the model’s prediction where TP,
TN, FP, and FN denote true positive, true negative, false-
positive and false negative, respectively.
FIGURE 5. Class distribution of fraudulent and nonfraud transactions.
i) ACCURACY
Another insight about the data is that there are no null
Accuracy is used to measure the performance in the evidence values; hence, there is no need to fill in missing values.
domain recovery and processing of the data. The fraction of
the results that are successfully classified can be represented B. TOP 10 ALGORITHMS IN MACHINE LEARNING FOR
by equation (9) as follows: FRAUD DETECTION
TP + TN In the study [3], the top ten ML algorithms are incorporated
Accuracy = (9)
TP + FP + TN + FN for the detection of credit card frauds. The list of these
algorithms is given below:
ii) PRECISION
1. Linear Regression
Precision is a performance assessment that measures the 2. Logistic Regression
ratio of correctly identified positives and the total number of 3. Decision Tree
identified positives. This can be seen as follows: 4. SVM
TP 5. Naïve Bayes
Precision = (10) 6. CNN
TP + FP
7. K-Means
iii) F-MEASURE/F1-SCORE 8. Random Forest
The f-measure considers both the precision and the recall. The 9. Dimensionality Reduction Algorithms
f-measure may be assumed to be the average weight of all 10. Gradient Boosting Algorithms
values, which can be seen as follows: These algorithms can also encompass association analysis,
2X precision × Recall clustering, classification, statistical learning, and link mining.
F= (11) This is among all the critical topics covered by ML research
precision + Recall
and development.
iv) RECALL
The recall is also referred to as the sensitivity, which is the 1) THE CONFUSION METRICS FOR MODELS
ratio of connected instances retrieved over the total number A classification model visualisation is a confusion metric that
of retrieved instances and can be seen as follows: displays how fit the model is projected to be to the results once
TP associated with the earliest ones. Frequently, the anticipated
Recall = (12) results are deposited in a variable that is then changed into an
TP + FN
association table. Utilizing the association table in the form of
IV. RESULTS AND DISCUSSIONS a heatmap, the confusion metrics can be plotted. Even though
A. DATA VISUALISATION there are numerous built-in methods to envision confusion
The dataset covers credit cards transactions in October metrics, we can define and visualize them based on the score
2018 by European cardholders. The dataset includes trans- to allow for better correlation. Figure 6 depicts the confusion
actions that happened in two days, and it includes 492 frauds metrics of machine learning algorithms.
out of 284,807 transactions. It covers only mathematical input
variables, which are the outcome of a PCA transformation. 2) THE ACCURACY OF MACHINE LEARNING ALGORITHMS
Due to the issue of concealment, we cannot offer the struc- In this phase, we structure six distinct kinds of classification
tures of the original dataset and the data more background models. We could use numerous other models to resolve
information. The feature ‘Time’ covers the seconds elapsed classification problems; however, these are the most popular
between the first transaction in the dataset and each transac- models in use. Using the algorithms, all these models can
tion. Figure 5 shows the class distribution of the CCF dataset be built workably provided by the sci-kit-learn package. The
into a fraudulent and nonfraud transactions. results of applied ML algorithms are presented in Table 5.
FIGURE 7. The case count statistics for fraud and non-fraud transactions.
TABLE 6. The result of CNN model using epoch size as 35 and 14.
4. Baseline (BL)
5. Generative adversarial networks (GAN)
6. Radial basis function network (RBFN)
7. Multilayer perception (MLP)
8. Self-organise map (SOM)
9. Deep belief network (DBN)
10. Restricted Boltzmann machine (RBM)
11. Autoencoders is applied by varying the layers from 11 to 20 and comparing
the result with baseline 5-layer architecture.
1) THE EVALUATION METRICS
We can use confusion metrics to summarise the labels of 3) THE SUMMARY OF THE CNN MODEL
actual vs. predicted, wherever the X-axis is the label of the Once a model is ‘‘built’’, the summary () method can be
predicted, and the Y-axis is the label of the actual: called to show its details as shown in Table 8. However,
If the model had projected the whole thing accurately, this it can be beneficial when constructing a sequential model
would be a diagonal metric whose values would be away from incrementally to show the summary of the model thus far with
the main diagonal and demonstrate an incorrect prediction the current output.
value of zero. In this case, the metrics display that because The total number of parameters is 119,457 and the total
of the comparatively rare false-positives, it is determined that number of trainable parameters is 119,265. Finally, the num-
a few legitimate transactions were flagged incorrectly. This ber of nontrainable parameters is 192.
trade-off might be desirable because false negatives would
permit more fraudulent transactions to go through. 4) THE SUMMARY OF THE BASELINE MODEL
By using the function, we now develop and train the pre-
2) THE ACCURACY OF DEEP LEARNING ALGORITHMS viously defined model. Note that the model is best suited
Table 7 shows the training and validation accuracy of pro- to using a batch size larger than 2048; this is important for
posed CNN and baseline CNN algorithms. The CNN model confirming that each batch has a decent chance of comprising
6) VARIATION OF EPOCHS
We train the model for 20 and 30 epochs, with and with- precision recall accuracy (prc), precisions and recall over
out careful initialisation, and compare the losses. The figure 35 epochs.
clearly shows that careful initialisation gives a clear advan- Table 10 presents the training and validation results of
tage in regard to validation loss. Figure 13 shows the valida- baseline deep learning model using 35 and 14 epochs.
tion loss using zero bias and careful bias.
8) THE DIAGNOSIS MODEL BEHAVIOUR
7) RECORD OF THE TRAINING DATASET The behaviour of a ML and DL model can be used to diagnose
In this section, we construct schemes of the model’s accuracy the shape and dynamics of a learning curve and to possibly
and loss on the training and validation sets. We check for recommend the best configuration changes for improving
overfitting; these measurements are valuable too, as they can performance and learning. There are four learning curves:
help us learn more about the overfitting and underfitting of Underfit, Overfit, Good Fit, Epoch. The learning curve is used
the model. Figure 14 depicts the training and validation loss, to plot the model for training and validation accuracy and
FIGURE 13. Validation loss using zero bias and careful bias.
FIGURE 15. Training and validation history of accuracy and loss of CNN
model using 100 epochs.
FIGURE 16. Model accuracy when epoch sizes are 20 and 50.
10) PLOT TRAINING & VALIDATION ACCURACY VALUE dense layer has a ReLU activation function of (100). The
Figure 16 depicts the training and validation accuracy of second dense has a ReLU activation function of (50). The
proposed model over 20 and 50 epochs. third dense layer has a ReLU activation function of (25).
Finally, we add a dense layer for classification with a sigmoid
11) RESULT OF THE CNN LAYERS IMPLEMENTATION activation function. At 100 epochs, the accuracy is 96.34%.
Our proposed sequential model has a convolutional layer with
32 filters of size 3 and a ReLU activation function, which is b: ARCHITECTURE OF 17 LAYERS
followed by a batch normalisation layer and a dropout layer Our proposed model has 17 layers: a convolutional layer with
with a dropout rate of 0.25. Figure 17 depicts the accuracy of a kernel size of 32 × 2 and a ReLU activation function,
CNN model using different layers architecture. The architec- followed by a batch normalisation layer and a dropout layer
tures of our proposed model are as follows. with a dropout rate of 0.2. Then, we add another convo-
lutional layer with a kernel size of 64 × 2 and a ReLU
a: ARCHITECTURE OF 14 LAYERS activation function, followed by a batch normalisation layer
Our proposed model has 14 layers: a convolutional layer with and a dropout layer with a dropout rate of 0.5. Then, we add
a kernel size of 32 × 2 and a ReLU activation function, another convolutional layer with a kernel size of 64 × 2 and a
followed by a batch normalisation layer and a dropout layer ReLU activation function, followed by a batch normalisation
with a dropout rate of 0.2. Then, we add another convolutional layer and a dropout layer with a dropout rate of 0.25.
layer with a kernel size of 64 × 2 and a ReLU activation func- Then, we add a flattened layer with a kernel size of 64 × 2
tion, followed by a batch normalisation layer and a dropout and a ReLU activation function, followed by a dense layer and
layer with a dropout rate of 0.5. Then, we add a flattened a dropout layer with a dropout rate of 0.5, followed by 3 dense
layer with a kernel size of 64 × 2 and a ReLU activation layers. The first dense layer has a ReLU activation function of
function, followed by a dense layer and a dropout layer with (100). The second dense layer has a ReLU activation function
a dropout rate of 0.5, followed by 3 dense layers. The first of (50). The third dense layer has a ReLU activation function
to increase the performance of existing examples, but they [18] N. Kousika, G. Vishali, S. Sunandhana, and M. A. Vijay,
significantly decrease on the unseen data. The performance ‘‘Machine learning based fraud analysis and detection system,’’
J. Phys., Conf., vol. 1916, no. 1, May 2021, Art. no. 012115,
on unseen data increased as the class imbalance increased. doi: 10.1088/1742-6596/1916/1/012115.
Future work associated may explore the use of more state of [19] R. F. Lima and A. Pereira, ‘‘Feature selection approaches to fraud detection
art deep learning methods to improve the performance of the in e-payment systems,’’ in E-Commerce and Web Technologies, vol. 278,
D. Bridge and H. Stuckenschmidt, Eds. Springer, 2017, pp. 111–126, doi:
model proposed in this study. 10.1007/978-3-319-53676-7_9.
[20] Y. Lucas and J. Jurgovsky, ‘‘Credit card fraud detection using machine
learning: A survey,’’ 2020, arXiv:2010.06479.
REFERENCES
[21] H. Zhou, H.-F. Chai, and M.-L. Qiu, ‘‘Fraud detection within bankcard
[1] Y. Abakarim, M. Lahby, and A. Attioui, ‘‘An efficient real time model enrollment on mobile device based payment using machine learning,’’
for credit card fraud detection based on deep learning,’’ in Proc. 12th Frontiers Inf. Technol. Electron. Eng., vol. 19, no. 12, pp. 1537–1545,
Int. Conf. Intell. Systems: Theories Appl., Oct. 2018, pp. 1–7, doi: Dec. 2018, doi: 10.1631/FITEE.1800580.
10.1145/3289402.3289530. [22] S. Makki, Z. Assaghir, Y. Taher, R. Haque, M.-S. Hacid, and H. Zeineddine,
[2] H. Abdi and L. J. Williams, ‘‘Principal component analysis,’’ Wiley Inter- ‘‘An experimental study with imbalanced classification approaches for
discipl. Rev., Comput. Statist., vol. 2, no. 4, pp. 433–459, Jul. 2010, doi: credit card fraud detection,’’ IEEE Access, vol. 7, pp. 93010–93022, 2019,
10.1002/wics.101. doi: 10.1109/ACCESS.2019.2927266.
[3] V. Arora, R. S. Leekha, K. Lee, and A. Kataria, ‘‘Facilitating user [23] I. Matloob, S. A. Khan, and H. U. Rahman, ‘‘Sequence mining and
authorization from imbalanced data logs of credit cards using artificial prediction-based healthcare fraud detection methodology,’’ IEEE Access,
intelligence,’’ Mobile Inf. Syst., vol. 2020, pp. 1–13, Oct. 2020, doi: vol. 8, pp. 143256–143273, 2020, doi: 10.1109/ACCESS.2020.3013962.
10.1155/2020/8885269. [24] I. Mekterović, M. Karan, D. Pintar, and L. Brkić, ‘‘Credit card fraud
[4] A. O. Balogun, S. Basri, S. J. Abdulkadir, and A. S. Hashim, ‘‘Performance detection in card-not-present transactions: Where to invest?’’ Appl. Sci.,
analysis of feature selection methods in software defect prediction: A vol. 11, no. 15, p. 6766, Jul. 2021, doi: 10.3390/app11156766.
search method approach,’’ Appl. Sci., vol. 9, no. 13, p. 2764, Jul. 2019, [25] D. Molina, A. LaTorre, and F. Herrera, ‘‘SHADE with iterative local search
doi: 10.3390/app9132764. for large-scale global optimization,’’ in Proc. IEEE Congr. Evol. Comput.
[5] B. Bandaranayake, ‘‘Fraud and corruption control at education system (CEC), Jul. 2018, pp. 1–8, doi: 10.1109/CEC.2018.8477755.
level: A case study of the Victorian department of education and early [26] M. Muhsin, M. Kardoyo, S. Arief, A. Nurkhin, and H. Pramus-
childhood development in Australia,’’ J. Cases Educ. Leadership, vol. 17, into, ‘‘An analyis of student’s academic fraud behavior,’’ in Proc.
no. 4, pp. 34–53, Dec. 2014, doi: 10.1177/1555458914549669. Int. Conf. Learn. Innov. (ICLI), Malang, Indonesia, 2018, pp. 34–38,
[6] J. Błaszczyński, A. T. de Almeida Filho, A. Matuszyk, M. Szelģ, and doi: 10.2991/icli-17.2018.7.
R. Słowiński, ‘‘Auto loan fraud detection using dominance-based rough set [27] H. Najadat, O. Altiti, A. A. Aqouleh, and M. Younes, ‘‘Credit card
approach versus machine learning methods,’’ Expert Syst. Appl., vol. 163, fraud detection based on machine and deep learning,’’ in Proc. 11th
Jan. 2021, Art. no. 113740, doi: 10.1016/j.eswa.2020.113740. Int. Conf. Inf. Commun. Syst. (ICICS), Apr. 2020, pp. 204–208, doi:
[7] B. Branco, P. Abreu, A. S. Gomes, M. S. C. Almeida, J. T. Ascensão, 10.1109/ICICS49469.2020.239524.
and P. Bizarro, ‘‘Interleaved sequence RNNs for fraud detection,’’ in Proc. [28] A. Pumsirirat and L. Yan, ‘‘Credit card fraud detection using deep
26th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2020, learning based on auto-encoder and restricted Boltzmann machine,’’
pp. 3101–3109, doi: 10.1145/3394486.3403361. Int. J. Adv. Comput. Sci. Appl., vol. 9, no. 1, pp. 18–25, 2018, doi:
[8] F. Cartella, O. Anunciacao, Y. Funabiki, D. Yamaguchi, T. Akishita, and 10.14569/IJACSA.2018.090103.
O. Elshocht, ‘‘Adversarial attacks for tabular data: Application to fraud [29] P. Raghavan and N. E. Gayar, ‘‘Fraud detection using machine
detection and imbalanced data,’’ 2021, arXiv:2101.08030. learning and deep learning,’’ in Proc. Int. Conf. Comput. Intell.
Knowl. Economy (ICCIKE), Dec. 2019, pp. 334–339, doi:
[9] S. S. Lad, I. Dept. of CSERajarambapu Institute of TechnologyRa-
10.1109/ICCIKE47802.2019.9004231.
jaramnagarSangliMaharashtra, and A. C. Adamuthe, ‘‘Malware clas-
[30] M. Ramzan, A. Abid, H. U. Khan, S. M. Awan, A. Ismail, M. Ahmed,
sification with improved convolutional neural network model,’’ Int.
M. Ilyas, and A. Mahmood, ‘‘A review on State-of-the-Art violence detec-
J. Comput. Netw. Inf. Secur., vol. 12, no. 6, pp. 30–43, Dec. 2021,
tion techniques,’’ IEEE Access, vol. 7, pp. 107560–107575, 2019, doi:
doi: 10.5815/ijcnis.2020.06.03.
10.1109/ACCESS.2019.2932114.
[10] V. N. Dornadula and S. Geetha, ‘‘Credit card fraud detection using machine
[31] M. Ramzan, H. U. Khan, S. M. Awan, A. Ismail, M. Ilyas, and
learning algorithms,’’ Proc. Comput. Sci., vol. 165, pp. 631–641, Jan. 2019,
A. Mahmood, ‘‘A survey on state-of-the-art drowsiness detection
doi: 10.1016/j.procs.2020.01.057.
techniques,’’ IEEE Access, vol. 7, pp. 61904–61919, 2019, doi:
[11] I. Benchaji, S. Douzi, and B. E. Ouahidi, ‘‘Credit card fraud detection 10.1109/ACCESS.2019.2914373.
model based on LSTM recurrent neural networks,’’ J. Adv. Inf. Technol., [32] A. Rb and S. K. Kr, ‘‘Credit card fraud detection using artificial neural
vol. 12, no. 2, pp. 113–118, 2021, doi: 10.12720/jait.12.2.113-118. network,’’ Global Transitions Proc., vol. 2, no. 1, pp. 35–41, Jun. 2021,
[12] Y. Fang, Y. Zhang, and C. Huang, ‘‘Credit card fraud detection based on doi: 10.1016/j.gltp.2021.01.006.
machine learning,’’ Comput., Mater. Continua, vol. 61, no. 1, pp. 185–195, [33] N. F. Ryman-Tubb, P. Krause, and W. Garn, ‘‘How artificial intelligence
2019, doi: 10.32604/cmc.2019.06144. and machine learning research impacts payment card fraud detection:
[13] J. Forough and S. Momtazi, ‘‘Ensemble of deep sequential models for A survey and industry benchmark,’’ Eng. Appl. Artif. Intell., vol. 76,
credit card fraud detection,’’ Appl. Soft Comput., vol. 99, Feb. 2021, pp. 130–157, Nov. 2018, doi: 10.1016/j.engappai.2018.07.008.
Art. no. 106883, doi: 10.1016/j.asoc.2020.106883. [34] I. Sadgali, N. Sael, and F. Benabbou, ‘‘Adaptive model for credit card fraud
[14] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Deep residual learning for image detection,’’ Int. J. Interact. Mobile Technol., vol. 14, no. 3, p. 54, Feb. 2020,
recognition,’’ 2015, arXiv:1512.03385. doi: 10.3991/ijim.v14i03.11763.
[15] X. Hu, H. Chen, and R. Zhang, ‘‘Short paper: Credit card fraud detec- [35] Y. Sahin and E. Duman, ‘‘Detecting credit card fraud by ANN and logis-
tion using LightGBM with asymmetric error control,’’ in Proc. 2nd tic regression,’’ in Proc. Int. Symp. Innov. Intell. Syst. Appl., Jun. 2011,
Int. Conf. Artif. Intell. for Industries (AII), Sep. 2019, pp. 91–94, doi: pp. 315–319, doi: 10.1109/INISTA.2011.5946108.
10.1109/AI4I46381.2019.00030. [36] I. Sohony, R. Pratap, and U. Nambiar, ‘‘Ensemble learning for credit card
[16] J. Kim, H.-J. Kim, and H. Kim, ‘‘Fraud detection for job placement fraud detection,’’ in Proc. ACM India Joint Int. Conf. Data Sci. Manage.
using hierarchical clusters-based deep neural networks,’’ Int. Data, Jan. 2018, pp. 289–294, doi: 10.1145/3152494.3156815.
J. Speech Technol., vol. 49, no. 8, pp. 2842–2861, Aug. 2019, [37] B. Stojanović, J. Božić, K. Hofer-Schmitz, K. Nahrgang, A. Weber,
doi: 10.1007/s10489-019-01419-2. A. Badii, M. Sundaram, E. Jordan, and J. Runevic, ‘‘Follow the trail:
[17] M.-J. Kim and T.-S. Kim, ‘‘A neural classifier with fraud density map for Machine learning for fraud detection in fintech applications,’’ Sensors,
effective credit card fraud detection,’’ in Intelligent Data Engineering and vol. 21, no. 5, p. 1594, Feb. 2021, doi: 10.3390/s21051594.
Automated Learning, vol. 2412, H. Yin, N. Allinson, R. Freeman, J. Keane, [38] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi, ‘‘Inception-v4,
and S. Hubbard, Eds. Berlin, Germany: Springer, 2002, pp. 378–383, doi: inception-ResNet and the impact of residual connections on learning,’’
10.1007/3-540-45675-9_56. 2016, arXiv:1602.07261.
[39] H. Tingfei, C. Guangquan, and H. Kuihua, ‘‘Using variational auto HIKMAT ULLAH KHAN received the master’s
encoding in credit card fraud detection,’’ IEEE Access, vol. 8, and Ph.D. degrees in computer science from Inter-
pp. 149841–149853, 2020, doi: 10.1109/ACCESS.2020.3015600. national Islamic University, Islamabad. He has
[40] D. Varmedja, M. Karanovic, S. Sladojevic, M. Arsenovic, and A. Anderla, been an Active Researcher for the last ten years.
‘‘Credit card fraud detection–machine learning methods,’’ in Proc. 18th He is currently an Assistant Professor with the
Int. Symp. INFOTEH-JAHORINA (INFOTEH), Mar. 2019, pp. 1–5, doi: Department of Computer Science, COMSATS
10.1109/INFOTEH.2019.8717766. University Islamabad, Wah Cantt, Pakistan. He has
[41] S. Warghade, S. Desai, and V. Patil, ‘‘Credit card fraud detection from
authored more than 50 papers in top peer-
imbalanced dataset using machine learning algorithm,’’ Int. J. Com-
reviewed journals and international conferences.
put. Trends Technol., vol. 68, no. 3, pp. 22–28, Mar. 2020, doi:
10.14445/22312803/IJCTT-V68I3P105. His research interests include social web mining,
[42] N. Yousefi, M. Alaghband, and I. Garibay, ‘‘A comprehensive survey on semantic web, data science, information retrieval, and scientometrics. He is
machine learning techniques and user authentication approaches for credit an editorial board member of a number of prestigious impact factor journals.
card fraud detection,’’ 2019, arXiv:1912.02629.
[43] X. Zhang, Y. Han, W. Xu, and Q. Wang, ‘‘HOBA: A novel feature NAIF ALMUSALLAM received the B.S. degree
engineering methodology for credit card fraud detection with a deep
in computer science from King Faisal Univer-
learning architecture,’’ Inf. Sci., vol. 557, pp. 302–316, May 2021, doi:
sity, Hofuf, Saudi Arabia, in 2009, the M.S.
10.1016/j.ins.2019.05.023.
degree from Monash University, Melbourne, VIC,
Australia, in 2013, and the Ph.D. degree in com-
puter science from RMIT University, Melbourne,
in 2019. He is currently an Assistant Professor
with Imam Mohammad Ibn Saud Islamic Univer-
sity (IMSIU), Riyadh, Saudi Arabia. His research
interests include machine learning, data science,
and security.
FAWAZ KHALED ALARFAJ received the M.Sc.
and Ph.D. degrees in computer science from Essex
University, U.K. He is currently an Assistant Pro- MUHAMMAD RAMZAN is currently pursuing
fessor with the Computer and Information Sci- the Ph.D. degree with the University of Manage-
ences Department, Imam Muhammad Ibn Saud ment and Technology, Lahore, Pakistan.
Islamic University (IMSIU). His research inter- He is also a Lecturer with the University of Sar-
ests include information retrieval, natural language godha, Pakistan. He has authored several research
processing, machine learning, big data, and cloud articles published in well reputed peer-reviewed
computing. journals. His research interests include algorithms,
machine learning, software engineering, and com-
puter vision.