Credit Card Fraud Detection (Book) 15
Credit Card Fraud Detection (Book) 15
Faculty of Engineering
Department of Computer and Software
Final year project
Credit Card Fraud Detection Using Machine Learning
Submitted by:
NAME ID
Mohamed Ebrahim 76946
Ahmed Ebrahim 76953
Mahmoud Mohamed 80506
Mohamed Hassan 80834
Supervised by:
Prof. Dr. Heba Elnemr
2022-2023
1
Declaration
We hereby declare that the work presented in this thesis has not been
submitted for any other degree or professional qualification, and that it is
the result of my own independent work.
Names:
Date:
2
Acknowledgement
This endeavor would not been possible without
3
Abstract
Credit card fraud is a significant problem that can result in significant financial losses
for both individuals and institutions. In recent years, machine learning algorithms
have proved to be an effective solution for detecting fraudulent transactions. In this
project, we investigate the use of various machine learning techniques such as
logistic regression, random forests, and support vector machines to identify
fraudulent credit card transactions. We evaluate the performance and accuracy of
these techniques using a publicly available credit card fraud dataset. The results
reveal that the Random Forest algorithm outperforms other algorithms when applied
to imbalanced data. Furthermore, three sampling techniques were devoted; random
undersampling, random oversampling, and a hybrid technique that merges
oversampling and undersampling methods to enhance the system's performance.
This study demonstrates that machine learning, along with the utilization of
appropriate data sampling techniques, can be effective in detecting credit card fraud,
emphasizing the importance of developing and deploying such systems to protect
against fraudulent activities in financial institutions.
4
Contents
1 Chapter 1 Introduction .....................................................................................10
1.1 Introduction .................................................................................................10
1.1.1 What is credit card fraud ....................................................................11
1.1.2 Types of credit card fraud ..................................................................11
1.1.3 What is credit card fraud detection ....................................................12
1.1.4 Anomaly detection .............................................................................13
1.1.5 Data used to create the user profile includes .....................................13
1.2 Importance of machine learning in credit card fraud detection ..................14
2 Chapter 2 Literature Review ............................................................................16
2.1 First paper [4] ..............................................................................................16
2.2 Second paper [5] ..........................................................................................16
2.3 Third paper [6].............................................................................................17
2.4 Fourth paper [7] ...........................................................................................17
2.5 Fifth paper [8] ..............................................................................................17
3 Chapter 3 Methodology ...................................................................................19
3.1 Data set ........................................................................................................19
3.2 Data Preprocessing ......................................................................................20
3.2.1 Normalization ....................................................................................21
3.3 Machine Learning........................................................................................21
3.3.1 Types of Machine learning methods..................................................22
3.3.2 supervised learning techniques ..........................................................24
3.3.3 Regression ..........................................................................................24
3.4 Machine learning techniques .......................................................................25
3.4.1 Logistic regression: ............................................................................25
3.4.2 Decision tree: .....................................................................................26
3.4.3 Random forest: ...................................................................................26
3.4.4 K-nearest neighbor:............................................................................28
5
3.4.5 Naïve Bayesian Classifier: .................................................................28
3.4.6 Support Vector Machines: .................................................................29
3.5 Data Balancing ............................................................................................30
4 Chapter 4 Experimental results ........................................................................32
4.1 Evaluation Metrics: .....................................................................................33
4.1.1 Confusion Matrix ...............................................................................33
4.1.2 Accuracy ............................................................................................34
4.1.3 Precision .............................................................................................34
4.1.4 Recall .................................................................................................34
4.1.5 F1-score..............................................................................................34
4.2 Training ............................................................................................................35
4.3 Evaluation Strategy ..........................................................................................35
4.3.1 Evaluation of imbalanced Dataset .....................................................35
4.3.2 Evaluation of Undersampling Dataset ...............................................39
4.3.3 Evaluation of Oversampling Dataset .................................................42
4.3.4 Evaluation of Hybrid Sampling Dataset ............................................46
5 Conclusion and Future work ............................................................................54
5.1 Conclusion ...................................................................................................54
5.2 Recommendation .........................................................................................55
5.3 Future work .................................................................................................56
6 References ........................................................................................................57
Appendix A: .........................................................................................................60
Appendix B: .........................................................................................................60
Appendix C: .........................................................................................................60
Appendix D: .........................................................................................................60
Appendix E: .........................................................................................................61
Appendix F:..........................................................................................................61
Appendix G: .........................................................................................................64
Appendix H: .........................................................................................................67
6
Appendix I: ..........................................................................................................70
Table of Abbreviation
CCF Credit Card Fraud
CCRD Credit Card Fraud Detection
ML Machine Learning
LR Logistic Regression
DT Decision Tree
RF Random Forest
KNN K-Nearest Neighbor
SVM Support Vector Machine
7
List of Tables
Table 4-1 Confusion Matrix _________________________________________ 33
Table 4-2 Training performance for imbalanced dataset. ___________________ 36
Table 4-3 Testing performance for imbalanced dataset. ____________________ 37
Table 4-4 Training performance for Under sampling Dataset. _______________ 40
Table 4-5 Testing performance for Under sampling Dataset ________________ 40
Table 4-6 Training performance for Oversampling Dataset _________________ 43
Table 4-7 Testing performance for Oversampling Dataset. _________________ 44
Table 4-8 Training performance for Hybrid Sampling Dataset. ______________ 47
Table 4-9 Testing performance for Hybrid Sampling Dataset. _______________ 47
8
List Of Figures
Figure 3-1 a snapshot of the utilized features ____________________________ 20
Figure 3-2 Types of Machine learning _________________________________ 22
Figure 4-1 Computer Properties ______________________________________ 32
Figure 4-2 Imbalance Data __________________________________________ 36
Figure 4-3 confusion matrix of imbalanced data __________________________ 37
Figure 4-4 Random Forest confusion matrix of imbalanced data ____________ 38
Figure 4-5 SVM confusion matrix of imbalanced data ____________________ 38
Figure 4-6 Under sampling Data ______________________________________ 39
Figure 4-7 Logistic Regression confusion matrix of under sampling data ______ 41
Figure 4-8 Random Forest confusion matrix of undersampling data __________ 41
Figure 4-9 SVM confusion matrix of undersampling data __________________ 42
Figure 4-10 Over sampling __________________________________________ 43
Figure 4-11 Logistic Regression confusion matrix of Oversampling__________ 44
Figure 4-12 Random Forest confusion matrix of Oversampling _____________ 45
Figure 4-13 SVM confusion matrix of Oversampling ____________________ 45
Figure 4-14 Hybrid sampling _______________________________________ 46
Figure 4-15 Logistic Regression confusion matrix of Hybrid sampling _______ 48
Figure 4-16 Random Forest confusion matrix of Hybrid sampling ___________ 48
Figure 4-17 SVM confusion matrix of Hybrid sampling ___________________ 49
Figure 4-18 comparison of under sampling results ________________________ 49
Figure 4-19 comparison of oversampling results _________________________ 51
Figure 4-20 comparison of hyper sampling results ________________________ 52
9
1 CHAPTER 1 INTRODUCTION
1.1 Introduction
In the last decade, there has been an exponential growth of the Internet. This has
sparked the proliferation and increase in the use of services such as e-commerce,
tap-and-pay systems, online bill payment systems, etc. However, fraudsters have
also increased activities to attack transactions that are made using credit cards. As a
result, various protection mechanisms, such as credit card data encryption and
tokenization, have been implemented to protect credit card transactions [1].
E-commerce has come a long way since its inception. It has become an essential
tool for most organizations, companies, and government agencies to increase their
productivity in global trade. One of the main reasons for the success of e-commerce
is the easy online credit card transaction. Whenever we talk about monetary
transactions, we also must consider financial fraud. Recently, credit card
transactions are believed to be the most common payment method. Consequently,
fraud activities have increased rapidly.
Losses related to credit card fraud will grow to $43 billion within five years and
climb to $408.5 billion globally within the next decade, according to a recent Nilson
Report ]2[, meaning that credit card fraud detection has become more crucial than
ever.
All parties involved in the payment lifecycle will experience the impact of these
increasing costs, from banks and credit card companies who foot the bill of such
fraud, the consumers who pay higher fees or receive lower credit scores, to
merchants and small businesses who are slapped with chargeback fees.
With digital crime and online fraud of all kinds on the rise, it’s more important
than ever for organizations to take firm and unambiguous steps to prevent payment
card fraud through advanced technology and strong security measures.
10
1.1.1 What is credit card fraud
Credit card fraud is the act of using another person’s credit card to make
purchases or request cash advances without the cardholder’s knowledge or consent.
These criminals may obtain the card itself through physical theft, though nowadays,
they are more often using digital methods to steal both the credit card number and
personal details to carry out fraudulent transactions.
There is some overlap between identity theft and credit card theft. Credit card
theft is one of the most common forms of identity theft. In such cases, a fraudster
uses an individual’s personal information, which is often stolen as part of a
cyberattack or data breach, to open a new account that the victim does not know
about. This activity is considered both identity fraud and credit card fraud.
11
1.1.2.2 card-not-present fraud
Card-not-present fraud is when the criminal uses the details associated with
the card, such as the card number, accountholder name, and CVV code, without
having the card in their possession.
In some cases, card-not-present crime is accompanied by account takeover
techniques. This is when fraudsters contact a credit card issuer and purport to be a
legitimate cardholder to change information associated with the account, such as a
phone number or address. This will allow them to verify purchases and authenticate
activity, evading many fraud detection tools [3].
12
1.1.4 Anomaly detection
Anomaly detection is the process of analyzing massive amounts of data points
from both internal and external sources to identify unusual or unexpected patterns
or data points in a dataset that deviate from the norm or expected behavior, it produce
a framework of “normal” activity for each individual user and establish regular
patterns in their activity. It is used to identify outliers, anomalies, or suspicious
events that may indicate fraudulent or abnormal behavior, system malfunctions, or
other issues. Anomaly detection techniques are commonly used in various fields
such as cyber security, fraud detection, medical diagnosis, and predictive
maintenance.
When a transaction falls outside the scope of normal activity, the anomaly
detection tool will then alert the card issuer and, in some cases, the user. Depending
on the transaction details and risk score assigned to the action, these fraud detection
systems may flag the purchase for review or put a hold on the transaction until the
user verifies their activity.
Credit card fraud detection is an important problem in the financial industry, and
machine learning techniques can be used to help identify fraudulent transactions.
Machine learning algorithms can analyze patterns in transaction data and
automatically detect anomalies that may be indicative of fraud. This approach is
particularly useful for detecting fraud in real-time, allowing financial institutions to
13
quickly respond to suspicious activity and protect their customers.
15
2 CHAPTER 2 LITERATURE REVIEW
In this section, we will explore multiple research and advancements in the use
of machine learning algorithms for credit card fraud detection.
16
2.3 Third paper [6]
This study aims to explore fifteen various techniques for detecting credit card
fraud, which includes Neural Networks, Decision Trees, genetic algorithm, case-
Based Reasoning, Bayesian Network, SVM, KNN, Artificial Immune System,
Hidden Markov Model, fuzzy neural network, fuzzy Darwinian system, Inductive
Logic programming, Clustering Techniques, Logistic Regression, and Outlier
Detection. The study focuses on investigating the relative effectiveness of these
methods to achieve the fundamental goal of detecting fraud in credit cards, along
with the advantage and disadvantages of every technique.
18
3 CHAPTER 3 METHODOLOGY
All models were created in both Visual studio code and anaconda programs.
This chapter outlines the approach we used to develop a credit card fraud
detection model. First, we selected a dataset consisting of credit card transactions,
which served as the foundation for our analysis. Subsequently, data preprocessing
technique is implemented. We then utilized several machine learning algorithms,
including Logistic Regression, Random Forest, and SVM, to develop our model. To
address the issue of data imbalance, a common challenge in fraud detection, we
experimented with various sampling techniques such as oversampling, under
sampling, and hybrid method that incorporates oversampling and under sampling, to
enhance the performance of our models. Throughout the process, we analyzed and
compared the performance of different models, assessed their strengths and
weaknesses, and ultimately selected the most suitable model for credit card fraud
detection. In the following sections, we provide a detailed account of the research
methodology we employed to develop our credit card fraud detection framework.
21
3.3.1 Types of Machine learning methods
The learning algorithms can be categorized into three major types, such as
supervised, unsupervised, and reinforcement learning [14]. Figure 3-2 displays the
basic types of machine learning approaches.
22
Supervised learning feeds historical input and output data in machine learning
algorithms, with processing in between each input/output pair that allows the
algorithm to shift the model to create outputs as closely aligned with the desired
result as possible. Common algorithms used during supervised learning include
linear regression, SVM, Decision tree, Random Forest, and KNN.
The machine learning techniques proceed by partitioning the input data into
two sets, training and test set. Afterward, the input features of the training data are
extracted. These features are used to build and train the model using a suitable
machine learning algorithm. Training is the process through which the model learns
or recognizes the patterns in the given data for making suitable predictions. The test
set contains already predicted values. Hence, the model is trained on the training set
and tested on the test set [15].
3.3.2.1 Classification
Classification refers to the problem of identifying the category to which an
input belongs to among a possible set of categories. The possible set of categories
are labelled, and models are generally learned from training data. Classification
models can be created using simple thresholds, regression techniques, or other
machine learning techniques like Neural Networks, Random Forests, or Markov
models. Classification is a supervised learning algorithm where a training set of
correctly identified or labelled data is available. The model learned from training
data to identify the category or class of the input feature or data is called a classifier.
Types of classifiers
➢ Binary classifier that identifies the input as belonging to one of the two output
categories.
➢ Multi-class classification has at least two mutually exclusive class labels,
where the goal is to predict to which class a given input example belongs.
➢ Multi-label classification can predict more than one class for each input
example. In this case, there is no mutual exclusion because the input example
can have more than one label.
3.3.3 Regression
Example: Suppose we want to do weather forecasting, so for this, we will use the
regression algorithm. In weather prediction, the model is trained on the past data,
24
and once the training is completed, it can easily predict the weather for future days
[19].
There are three types of logistic regression models, which are defined based
on categorical response:
25
regression, this is the most used approach, and more generally, it is one of the
most common classifiers for binary classification.
2- Multinomial logistic regression: In this type of logistic regression model, the
dependent variable has three or more outcomes; however, these values have
no specified order. For example, movie studios want to predict what genre of
film a moviegoer is likely to see to market films more effectively. A
multinomial logistic regression model can help the studio to determine the
strength of influence a person's age, gender, and dating status may have on the
type of film that they prefer. The studio can then orient an advertising
campaign for a specific movie toward a group of people likely to see it.
3- Ordinal logistic regression: This type of logistic regression model is
leveraged when the response variable has three or more possible outcomes,
but in this case, these values have a defined order. Examples of ordinal
responses include grading scales from A to F or rating scales from 1 to 5.
26
One advantage of using Random Forest for credit card fraud detection is that
it can handle high-dimensional data with many features, which is often the case in
fraud detection. Additionally, Random Forest is a powerful ensemble algorithm that
can reduce the risk of overfitting and improve the accuracy of the model.
To use Random Forest for credit card fraud detection, the algorithm is trained
on a dataset of historical credit card transactions, where each transaction is labeled
as either fraudulent or non-fraudulent. The algorithm learns to classify new
transactions based on the patterns and relationships found in the historical data.
Once the Random Forest model is trained, it can be used to classify new
transactions as either fraudulent or non-fraudulent in real-time. The model will
analyze the features of each transaction and predict whether it is likely to be
fraudulent or not. If the model predicts a transaction as fraudulent, the transaction
can be flagged for further investigation by the relevant authorities.
Overall, Random Forest is a powerful algorithm that can be used to improve
the accuracy of credit card fraud detection. By analyzing the features of credit card
transactions, the algorithm can learn to identify patterns and relationships that are
indicative of fraud and make accurate predictions in real-time.
28
3.4.6 Support Vector Machines:
Support Vector Machine” (SVM) is a supervised machine learning algorithm
that can be used for both classification and regression challenges. However, it is
mostly used in classification problems. The main idea behind SVM is to find the best
line (or hyperplane) that separates the data into different classes in a high-
dimensional space. The line that provides the largest margin between the classes is
considered the best. SVM are commonly used to detect cancerous cells based on
millions of images or may be used to predict future driving routes with a well-fitted
regression model [25].
29
maps the data into a higher-dimensional space where a linear separation can be
achieved. Common kernels used in SVM include the radial basis function (RBF)
kernel, the polynomial kernel, and the sigmoid kernel.
Based on the previous review, logistic regression, random forest, and SVM
classifiers were adopted to develop the proposed credit card fraud detection system.
31
4 CHAPTER 4 EXPERIMENTAL RESULTS
32
4.1 Evaluation Metrics:
The proposed credit card fraud detection models are assessed using the following
metrics:
33
4.1.2 Accuracy
The accuracy is used to find the portion of correctly classified values. It tells us how
often our classifier is right. It is the sum of all true values divided by total values.
𝑇𝑃 + 𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁
4.1.3 Precision
𝑇𝑃
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑃 + 𝐹𝑃
4.1.4 Recall
It is used to calculate the model's ability to predict positive values. "How often
does the model predict the correct positive values?". It is the true positives divided
by the total number of actual positive values.
𝑇𝑃
𝑅𝑒𝑐𝑎𝑙𝑙 =
𝑇𝑃 + 𝐹𝑁
4.1.5 F1-score
It is the harmonic mean of Recall and Precision. It is useful when you need to take
both Precision and Recall into account.
34
2 ∗ 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑅𝑒𝑐𝑎𝑙𝑙
𝐹1 − 𝑆𝑐𝑜𝑟𝑒 =
𝑅𝑒𝑐𝑎𝑙𝑙 + 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛
4.2 Training
To prepare the dataset for training, validation, and testing, it was divided into
70% for training and 30% for validation and testing. Based on these percentages, the
number of genuine and fraudulent transactions in the training set are 192622 and
311, respectively. In the validation set, there are 48143 genuine and 91 fraudulent
transactions. Finally, in the test set, there are 42488 genuine and 71 fraudulent
transactions[Appendix E].
35
Figure 4-2 Imbalance Data
36
Model precision Recall F1 Score Accuracy
Logistic 0.999 0.563 0.720 0.781
Regression
Random 0.999 0.788 0.88 0.894
Forest
SVM 0.999 0.661 0.796 0.830
37
Figure 4-4 Random Forest confusion matrix of imbalanced data
38
The results show that all three models appear to be performing inadequately,
but Random Forest has the best outcomes regarding precision, recall, F1 score, and
accuracy as 99.9%, 78.8%, 88%, and 89.4%, respectively indicating that it is the
best-performing model in this experiment. These results infer biased nature of
algorithms while validating, suggesting the imbalanced nature of data.
39
Model precision Recall F1 Score Accuracy
Logistic 0.966 0.916 0.940 0.942
Regression
Random 0.981 1.0 0.990 0.990
Forest
SVM 0.985 0.871 0.924 0.929
40
Figure 4-7 Logistic Regression confusion matrix of under sampling data
41
Figure 4-9 SVM confusion matrix of undersampling data
42
Figure 4-10 Over sampling
43
Model precision Recall F1 Score Accuracy
Logistic 0.976 0.943 0.959 0.960
Regression
Random 0.999 0.774 0.872 0.887
Forest
SVM 0.944 0.901 0.942 0.944
44
Figure 4-12 Random Forest confusion matrix of Oversampling
45
4.3.4 Evaluation of Hybrid Sampling Dataset
In this section, we highlight the outcomes of implementing a hybrid technique
that combines both under sampling and oversampling methods with fraud and non-
fraud transactions of 134835. The hybrid sampling dataset is visualized in Figure 4-
14. Table 4-8 and Table 4-9 present the results of utilizing logistic regression,
random forest, and SVM classifiers on the hybrid sampling dataset for the training
and testing stages, respectively. These tables provide details on precision, recall, F1
score, and accuracy. Additionally, the confusion matrices for the three machine-
learning models are included in Figure 4-15, Figure 4-16, and Figure 4-17.
[Appendix I]
46
Model precision Recall F1 Score Accuracy
Logistic 0.973 0.919 0.945 0.947
Regression
Random 0.999 1.0 0.999 0.999
Forest
SVM 0.987 0.964 0.975 0.976
47
Figure 4-15 Logistic Regression confusion matrix of Hybrid sampling
48
Figure 4-17 SVM confusion matrix of Hybrid sampling
49
From Figure 4-18 For the logistic regression model, we can see that it has a
precision of 0.967, meaning that it correctly identifies 96.7% of the positive cases.
The recall for this model is 0.943, indicating that it correctly identifies 94.3% of the
actual positive cases. The F1 score for this model is 95.5%, which is a balanced
measure of precision and recall. The accuracy of this model is 95.6%, indicating that
it is overall effective at making correct predictions. The random forest model has a
precision of 0.969, indicating that it correctly identifies 96.9% of the positive cases.
The recall for this model is 0.929, indicating that it correctly identifies 92.9% of the
actual positive cases. The F1 score for this model is 94.9%, which is a balanced
measure of precision and recall. The accuracy of this model is 95%, indicating that
it is overall effective at making correct predictions. The SVM model has a precision
of 0.981, indicating that it correctly identifies 98.1% of the positive cases. The recall
for this model is 0.887, indicating that it correctly identifies 88.7% of the actual
positive cases. The F1 score for this model is 93.2%, which is a balanced measure
of precision and recall. The accuracy of this model is 93.5%, indicating that it is
overall effective at making correct predictions.
According to the results, it seems that all three models are performing
satisfactorily; however, Logistic Regression demonstrates the highest recall, F1
score, and accuracy rates at 96.9%, 92.9%, 94.9%, and 95% respectively, indicating
its superiority as the top-performing model in this specific experiment. However,
SVM has the highest precision (98.1%), with an insignificant difference (1.4%) from
that of Logistic Regression.
50
Figure 4-19 comparison of oversampling results
From Figure 4-19 For the logistic regression model, we can see that it has a
precision of 0.976, meaning that it correctly identifies 97.6% of the positive cases.
The recall for this model is 0.943, indicating that it correctly identifies 94.3% of the
actual positive cases. The F1 score for this model is 95.9%, which is a balanced
measure of precision and recall. The accuracy of this model is 96.0%, indicating that
it is overall effective at making correct predictions. The random forest model has a
precision of 99.9%, which is a very high value, indicating that it correctly identifies
almost all positive cases. However, the recall for this model is lower than the other
two models, at 77.4%, indicating that it may miss some positive cases. The F1 score
for this model is 87.2%, which is lower than the other two models. Despite the lower
F1 score, the model has a relatively high accuracy, which is not provided. The SVM
model has a precision of 0.944, indicating that it correctly identifies 94.4% of the
positive cases. The recall for this model is 0.901, indicating that it correctly identifies
90.1% of the actual positive cases. The F1 score for this model is 94.2%, which is a
balanced measure of precision and recall. The accuracy of this model is 94.4%,
indicating that it is overall effective at making correct predictions.
Based on the results, all three models demonstrate strong performance.
However, Logistic Regression outperforms the others in terms of recall, F1 score,
and accuracy, achieving 94.3%, 95.9%, and 96% respectively. This indicates that
Logistic Regression is the top-performing model in this specific experiment. On the
other hand, Random Forest boasts the highest precision of 99.9%, with only a
marginal difference of 2.3% compared to Logistic Regression.
51
Figure 4-20 comparison of hyper sampling results
From Figure 4-20 For the logistic regression model, we can see that it has a
precision of 0.974, meaning that it correctly identifies 97.4% of the positive cases.
The recall for this model is 0.943, indicating that it correctly identifies 94.3% of the
actual positive cases. The F1 score for this model is 0.958, which is a balanced
measure of precision and recall. The accuracy of this model is 0.959, indicating that
it is overall effective at making correct predictions. The random forest model has a
precision of 99.9%, which is a very high value, indicating that it correctly identifies
almost all positive cases. However, the recall for this model is lower than the other
two models, at 77.4%, indicating that it may miss some positive cases. The F1 score
for this model is 88.7%, which is lower than the other two models. Despite the lower
F1 score, the model has a very high accuracy of unknown value. The SVM model
has a precision of 98.5%, indicating that it correctly identifies 98.5% of the positive
cases. The recall for this model is 90.1%, indicating that it correctly identifies 90.1%
of the actual positive cases. The F1 score for this model is 94.1%, which is a balanced
measure of precision and recall. The accuracy of this model is 94.4%, indicating that
it is overall effective at making correct predictions.
The results indicate that all three models exhibit robust performance.
However, Logistic Regression surpasses the others in terms of recall, F1 score, and
accuracy, achieving 94.3%, 95.8%, and 95.9%, respectively. This suggests that
Logistic Regression is the top-performing model in this specific experiment.
52
Conversely, Random Forest demonstrates the highest precision of 99.9%, with a
slight difference of only 2.5% compared to Logistic Regression.
In addition, the results exhibit that oversampling and hybrid sampling
techniques yield the most satisfactory performance for Logistic Regression and
SVM. However, Logistic Regression surpasses SVM in terms of performance.
Logistic regression is a probabilistic model that estimates the probability of a
particular outcome. In the case of class imbalance, oversampling can help balance
the dataset by increasing the number of instances in the minority class. Logistic
regression can better utilize this information to estimate the probability and make
predictions. Additionally, Logistic Regression is computationally less expensive
compared to SVM, especially for large datasets. Oversampling can increase the
number of samples, making the dataset even larger. This can impact the performance
of SVM due to increased training time and memory requirements. As a result,
logistic regression may outperform SVM in terms of speed. SVM performs best
when the decision boundary is well-separated and when the number of samples is
smaller. Oversampling can lead to overlapping regions between classes, making the
decision boundary more complex. Logistic regression is more flexible in handling
overlapping classes and can adapt to the increased complexity of the dataset.
In contrast, Random Forest exhibits superior performance when utilizing the
undersampling technique. Nonetheless, its performance is comparatively weaker
when employing oversampling and hybrid sampling methods than undersampling.
This is because oversampling may lead to an over-representation of the minority
class and cause the Random Forest algorithm to be biased towards this class.
Moreover, introducing duplicate samples that may be similar or identical to existing
instances in the minority class can lead to overfitting, where the model memorizes
the training instances and performs poorly on unseen data. Random Forests, which
inherently have the potential to overfit, can be particularly prone to this issue.
53
5 CONCLUSION AND FUTURE WORK
5.1 Conclusion
This work presents the application of three supervised machine learning
techniques, including Logistic Regression, Random Forest, and SVM. When
comparing Logistic Regression, Random Forest, and SVM for credit card fraud
detection, it is necessary to consider the effectiveness of different sampling strategies
such as oversampling, undersampling, and hybrid sampling.
Oversampling, which involves replicating the minority class instances, can help
improve the performance of Logistic Regression and SVM models. By increasing
the presence of fraudulent cases, oversampling provides more information for the
models to learn from and hence improves their ability to accurately detect fraud.
However, in the case of Random Forest, this causes deficient performance due to
overfitting and potentially biased results.
Undersampling, on the other hand, reduces the number of majority class
instances, which can help mitigate bias in heavily imbalanced datasets. This
technique ensures a more balanced training set, allowing the models to better
understand and detect the minority class. However, undersampling can lead to loss
of information due to the removal of majority class instances, potentially negatively
impacting the model's generalization and overall performance.
Hybrid sampling combines oversampling and undersampling techniques to
address the limitations of each method. It tries to strike a balance by oversampling
the minority class and undersampling the majority class, which can result in a more
robust and accurate model. Hybrid sampling provides an opportunity to capture the
essence of both classes and reduce biases while maintaining the integrity of the data.
In terms of the models themselves, Logistic Regression is a simple yet widely
used algorithm that tends to perform well when the classes are well separated.
Logistic Regression benefits from undersampling, oversampling, and hybrid
sampling techniques in improving its performance.
Random forest is an ensemble classifier that combines multiple decision trees,
leading to better generalization and robustness. It handles imbalanced datasets
reasonably well. The random forest also benefits from the undersampling technique
as it improves its performance. While its performance is inadequate using
oversampling and hybrid sampling.
54
SVM, with its ability to classify data into different classes based on a
hyperplane, can also be effective in credit card fraud detection. It can handle
complex relationships and nonlinearities while minimizing the influence of outliers.
SVM can be improved through oversampling, undersampling, or hybrid sampling,
which provide a broader representation of both classes, enabling the model to learn
better decision boundaries.
In conclusion, the choice of model and sampling technique depends on the
characteristics of the dataset, level of imbalance, and desired trade-offs between
accuracy, and computational complexity. Implementing oversampling,
undersampling, or hybrid sampling can significantly enhance the performance of
logistic regression, random forest, and SVM in credit card fraud detection tasks.
That leads us to believe that using supervised machine-learning techniques will
help in decreasing the amount of credit card fraud and increase customers’
satisfaction as it will provide them with a better experience in addition to feeling
secure.
After the comparative analysis of the various supervised learning models, we
can infer that Logistic Regression is the best approach to be used for detecting credit
card fraud detection.
5.2 Recommendation
There are many ways to improve the model, such as using it on different
datasets with various sizes, different data types or by changing the data splitting
ratio, in addition to viewing it from different algorithm perspective. An example can
be merging telecom data to calculate the location of people to have better knowledge
of the location of the card owner while his/her credit card is being used, this will
ease the detection because if the card owner is in New Valley and a transaction of
his card was made in Cairo it will easily be detected as fraud.
55
5.3 Future work
• Explore advanced anomaly detection techniques: Investigate unsupervised
methods like autoencoders, clustering, and one-class SVMs to detect fraud
patterns without labeled data.
• Exploring deep learning approaches: Deep learning techniques, such as
recurrent neural networks (RNNs) or convolutional neural networks (CNNs),
have shown promise in various fields. Future work could investigate the
application of deep learning models to credit card fraud detection, considering
their ability to capture complex patterns and dependencies in data.
• Real-time fraud detection: The report primarily focused on offline fraud
detection using pre-processed datasets. A valuable extension would be to
develop a real-time fraud detection system that can analyze transactions as
they occur, providing immediate alerts for suspicious activities. This could
involve the use of streaming data processing frameworks and adaptive
learning algorithms.
• Incorporate domain knowledge: Collaborate with industry experts or fraud
analysts to incorporate their insights and domain-specific knowledge into the
fraud detection process.
• Research and development: Continuously enhance and refine the fraud
detection model by staying updated with the latest techniques and
advancements in machine learning. Publish research papers and contribute to
academic conferences to establish credibility and attract potential partnerships
or funding opportunities.
• Collaboration with financial institutions: Collaborate with banks, credit card
companies, or other financial institutions to integrate our fraud detection
model into their systems. This partnership could involve licensing
agreements, revenue-sharing models, or other mutually beneficial
arrangements.
• Customization and integration: Provide customization options to businesses
based on their specific needs. Offer additional features or modules that can be
integrated into their existing fraud detection systems, allowing them to adapt
the model to their unique requirements. Charge an additional fee for
customization and integration services.
56
6 REFERENCES
[1] G. Babatunde Iwasokun, “Encryption and Tokenization-Based System for
Credit Card Information Security,” International Journal of Cyber-Security
and Digital Forensics, vol. 7, no. 3, pp. 283–293, 2018, doi:
10.17781/P002462.
[2] N. Report, “Payment Card Fraud Losses Reach $27.85 Billion.” Cision PR
Newswire, 2019.
[3] Credit card fraud detection: Everything you need to know, “Credit card fraud
detection: Everything you need to know.” https://www.inscribe.ai/fraud-
detection/credit-fraud-detection (accessed Feb. 14, 2023).
[4] Z. Faraji, “A Review of Machine Learning Applications for Credit Card Fraud
Detection with A Case study,” SEISENSE Journal of Management, vol. 5, no.
1, pp. 49–59, 2022.
[5] J. Gao, Z. Zhou, J. Ai, B. Xia, and S. Coggeshall, “Predicting Credit Card
Transaction Fraud Using Machine Learning Algorithms,” Journal of
Intelligent Learning Systems and Applications, vol. 11, no. 03, pp. 33–63,
2019, doi: 10.4236/jilsa.2019.113003.
[6] H. E. M. Abd El-Hamid, A. Abdou, W. Khalifa, M. I. Roushdy, and A.-B. M.
Salem, “Future Computing and Informatics Journa l,” Computing and
Informatics, vol. 4, no. 2, p. 5.
[7] S. Khatri, A. Arora, and A. P. Agrawal, “Supervised machine learning
algorithms for credit card fraud detection: a comparison,” in 2020 10th
International Conference on Cloud Computing, Data Science & Engineering
(Confluence), IEEE, 2020, pp. 680–683.
[8] P. Gupta, A. Varshney, M. R. Khan, R. Ahmed, M. Shuaib, and S. Alam,
“Unbalanced Credit Card Fraud Detection Data: A Machine Learning-
Oriented Comparative Study of Balancing Techniques,” Procedia Comput Sci,
vol. 218, pp. 2575–2584, 2023.
[9] Kaggle, “Dataset.” https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud
(accessed Feb. 14, 2023).
[10] “Credit Card Fraud Detection - Great Learning Blog.”
https://www.mygreatlearning.com/blog/credit-card-fraud-detection/ (accessed
Jul. 03, 2023).
57
[11] “What is Normalization in Machine Learning | Deepchecks.”
https://deepchecks.com/glossary/normalization-in-machine-learning/
(accessed Jun. 29, 2023).
[12] “Normalize Data: Component Reference - Azure Machine Learning |
Microsoft Learn.” https://learn.microsoft.com/en-us/azure/machine-
learning/component-reference/normalize-data?view=azureml-api-2 (accessed
Jun. 29, 2023).
[13] Editorial Board: A. Bundy J. G. Carbonell M. Pinkal H. Uszkoreit M. Veloso
W. Wahlster M. J. Wooldridge, “Machine Learning Techniques for
Multimedia”.
[14] E. F. Morales and H. J. Escalante, “A brief introduction to supervised,
unsupervised, and reinforcement learning,” in Biosignal Processing and
Classification Using Computational Learning and Intelligence, Elsevier,
2022, pp. 111–129.
[15] W. Zhang, X. Gu, L. Tang, Y. Yin, D. Liu, and Y. Zhang, “Application of
machine learning, deep learning and optimization algorithms in
geoengineering and geoscience: Comprehensive review and future challenge,”
Gondwana Research, 2022.
[16] E. F. Morales and H. J. Escalante, “A brief introduction to supervised,
unsupervised, and reinforcement learning,” in Biosignal Processing and
Classification Using Computational Learning and Intelligence, Elsevier,
2022, pp. 111–129.
[17] E. F. Morales and H. J. Escalante, “A brief introduction to supervised,
unsupervised, and reinforcement learning,” in Biosignal Processing and
Classification Using Computational Learning and Intelligence, Elsevier,
2022, pp. 111–129.
[18] L. P. Coelho and W. Richert, Building machine learning systems with Python.
Packt Publishing Ltd, 2015.
[19] M. Bkassiny, Y. Li, and S. K. Jayaweera, “A survey on machine-learning
techniques in cognitive radios,” IEEE Communications Surveys & Tutorials,
vol. 15, no. 3, pp. 1136–1159, 2012.
[20] H. Paruchuri, “Credit Card Fraud Detection using Machine Learning: A
Systematic Literature Review,” ABC Journal of Advanced Research, vol. 6,
no. 2, pp. 113–120, 2017.
58
[21] P. Tiwari, S. Mehta, N. Sakhuja, J. Kumar, and A. K. Singh, “Credit card fraud
detection using machine learning: a study,” arXiv preprint arXiv:2108.10005,
2021.
[22] “What is Random Forest? | IBM.” https://www.ibm.com/topics/random-forest
(accessed Jul. 03, 2023).
[23] M. Kuhkan, “A method to improve the accuracy of k-nearest neighbor
algorithm,” International Journal of Computer Engineering and Information
Technology, vol. 8, no. 6, p. 90, 2016.
[24] K. M. Leung, “Naive bayesian classifier,” Polytechnic University Department
of Computer Science/Finance and Risk Engineering, vol. 2007, pp. 123–156,
2007.
[25] Understanding Support Vector Machine(SVM) algorithm from examples
(along with code), “Support Vector Machine.”
https://www.analyticsvidhya.com/blog/2017/09/understaing-support-vector-
machine-example-code/ (accessed Feb. 14, 2023).
[26] SVM Machine Learning Tutorial, “freecodecamp,” SVM Machine Learning
Tutorial – What is the Support Vector Machine Algorithm.
https://www.freecodecamp.org/news/svm-machine-learning-tutorial-what-is-
the-support-vector-machine-algorithm-explained-with-code-examples/
(accessed Feb. 14, 2023).
[27] “Random Oversampling and Undersampling for Imbalanced Classification -
MachineLearningMastery.com.”
https://machinelearningmastery.com/random-oversampling-and-
undersampling-for-imbalanced-classification/ (accessed Jun. 24, 2023).
59
Appendix A:
### Imports
import pandas as pd
import numpy as np
import pickle
from sklearn.linear_model import LogisticRegression
Appendix B:
### Exploratory Data Analysis
data = pd.read_csv('creditcard.csv')
pd.options.display.max_columns = None
data
data.shape
data.info()
Appendix C:
#### Data Cleaning
data.isnull().sum()
data.duplicated().any()
data = data.drop_duplicates()
Appendix D:
#### Data Normalization
60
data.hist(bins=30, figsize=(20, 20))
sc = StandardScaler()
data['Amount']=sc.fit_transform(pd.DataFrame(data['Amount']))
sc = StandardScaler()
data['Time']=sc.fit_transform(pd.DataFrame(data['Time']))
data.hist(bins=30, figsize=(20, 20))
Appendix E:
### Train Test Split
X_train_val,X_test,y_train_val,y_test=train_test_split(X,y,test_size=0.15,ra
ndom_state=22)
X_train,X_val,y_train,y_val=train_test_split(X_train_val,y_train_val,test_si
ze=0.20 , random_state = 42 )
Appendix F:
#### Pure DataSet
def resultOfPureDataset (model, X_train, y_train, X_val, y_val, X_test,
y_test, Model_Path):
x = model()
x.fit(X_train, y_train)
y_predtrain = x.predict(X_train)
cm = confusion_matrix(y_val, y_predval)
sns.heatmap(cm, annot=True, fmt='d')
61
plt.title('Unnormalized validation confusion matrix')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.show()
cm_normalized = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
sns.heatmap(cm_normalized, fmt='.11g', annot=True, linewidths = 0.01)
plt.title('normalized validation confusion matrix')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.show()
precision_val=tp_val/(tp_val+fp_val)
print ( "precision_val", precision_val )
recall_val=tp_val/(tp_val+fn_val)
False_positive_rate_val=fp_val/(fp_val+tn_val)
print ( "False_positive_rate_val", False_positive_rate_val )
False_negative_rate_val=fn_val/(fn_val+tp_val)
print ( "False_negative_rate_val", False_negative_rate_val )
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.show()
precision_test=tp_test/(tp_test+fp_test)
specificity_test=tn_test/(tn_test+fp_test)
False_positive_rate_test=fp_test/(fp_test+tn_test)
print ( "False_positive_rate_test", False_positive_rate_test )
False_negative_rate_test=fn_test/(fn_test+tp_test)
save_model(x, Model_Path)
63
print("Evaluation of Logistic regression Classifier")
resultOfPureDataset(Lr_model_pure, X_train, y_train, X_val, y_val, X_test,
y_test , lr_pure)
Appendix G:
#### UnderSampling DataSet
undersample = RandomUnderSampler(sampling_strategy='majority')
X_train_under, y_train_under = undersample.fit_resample(X_under, y_under)
def resultOfUndersamplingDataset (model, X_train_under, y_train_under,
X_val_under, y_val_under, X_test_under, y_test_under, Model_Path):
x = model()
x.fit(X_train_under, y_train_under)
print('\t\tTrain Classification Report:')
y_predtrain_under = x.predict(X_train_under)
print(classification_report(y_train_under, y_predtrain_under))
print('Train_accuracy_score',accuracy_score(y_train_under,
y_predtrain_under))
print('Train_precision_score',precision_score(y_train_under,
y_predtrain_under))
print('Train_recall_score',recall_score(y_train_under,
y_predtrain_under))
y_predval_under = x.predict(X_val_under)
print('\t\tValidation Classification Report:')
metrics(y_val_under, y_predval_under)
cm_val_under = confusion_matrix(y_val_under, y_predval_under)
sns.heatmap(cm_val_under, annot=True, fmt='d')
64
plt.ylabel('Actual')
plt.show()
cm_normalized_val_under=cm_val_under.astype('float')/
cm_val_under.sum(axis=1)[:, np.newaxis]
sns.heatmap(cm_normalized_val_under,fmt='.11g', annot=True, linewidths =
0.01)
plt.title('normalized validation confusion matrix')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.show()
recall_val_under=tp_val_under/(tp_val_under+fn_val_under)
print ( "recall_val_under", recall_val_under )
F1_score_val_under=
(2*precision_val_under*recall_val_under)/(precision_val_under+recall_va
l_under)
print ( "F1-score_val_under", F1_score_val_under )
specificity_val_under=tn_val_under/(tn_val_under+fp_val_under)
False_negative_rate_val_under=fn_val_under/(fn_val_under+tp_val_under)
print ( "False_negative_rate_val_under", False_negative_rate_val_under )
65
metrics(y_test_under, y_predtest_under)
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.show()
recall_test_under=tp_test_under/(tp_test_under+fn_test_under)
print ( "recall_test_under", recall_test_under )
F1_score_test_under=
(2*precision_test_under*recall_test_under)/(precision_test_under+recall
_test_under)
66
print ( "False_positive_rate_test_under", False_positive_rate_test_under
)
False_negative_rate_test_under=fn_test_under/(fn_test_under+tp_test_und
er)
print ( "False_negative_rate_test_under", False_negative_rate_test_under
)
save_model(x, Model_Path)
print("Evaluation of Logistic regression Classifier")
resultOfUndersamplingDataset(Lr_model_under, X_train_under, y_train_under,
X_val_under, y_val_under, X_test_under, y_test_under, lr_under)
Appendix H:
#### OverSampling DataSet
oversample = RandomOverSampler(sampling_strategy=0.9)
print(classification_report(y_train_over, y_predtrain_over))
print('Train_accuracy_score',accuracy_score(y_train_over,
y_predtrain_over))
print('Train_precision_score',precision_score(y_train_over,
y_predtrain_over))
print('Train_recall_score',recall_score(y_train_over, y_predtrain_over))
print('Train_f1_score',f1_score(y_train_over, y_predtrain_over))
67
y_predval_over = x.predict(X_val_over)
plt.ylabel('Actual')
plt.show()
cm_normalized_val_over = cm_val_over.astype('float') /
cm_val_over.sum(axis=1)[:, np.newaxis]
sns.heatmap(cm_normalized_val_over,fmt='.11g', annot=True, linewidths =
0.01)
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.show()
recall_val_over=tp_val_over/(tp_val_over+fn_val_over)
print ( "recall_val_over", recall_val_over )
F1_score_val_over=
(2*precision_val_over*recall_val_over)/(precision_val_over+recall_val_o
ver)
print ( "F1-score_val_over", F1_score_val_over )
specificity_val_over=tn_val_over/(tn_val_over+fp_val_over)
print ( "specificity_val_over", specificity_val_over )
68
False_positive_rate_val_over=fp_val_over/(fp_val_over+tn_val_over)
y_predtest_over = x.predict(X_test_over)
metrics(y_test_over, y_predtest_over)
plt.ylabel('Actual')
plt.show()
cm_normalized_test_over = cm_test_over.astype('float') /
cm_test_over.sum(axis=1)[:, np.newaxis]
sns.heatmap(cm_normalized_test_over, fmt='.11g',annot=True, linewidths =
0.01)
plt.ylabel('Actual')
plt.show()
tn_test_over,fp_test_over,fn_test_over,tp_test_over=
np.ravel(cm_normalized_test_over)
print("True Negatives (TN):", tn_test_over)
69
F1_score_test_over=
(2*precision_test_over*recall_test_over)/(precision_test_over+recall_te
st_over)
print ( "F1-score_test_over", F1_score_test_over )
specificity_test_over=tn_test_over/(tn_test_over+fp_test_over)
save_model(x, Model_Path)
Appendix I:
#### Hybrid Sampling DataSet
oversample2 = RandomOverSampler(sampling_strategy=0.7)
Xover, yover = oversample2.fit_resample(X_comb,y_comb)
X_train_comb, y_train_comb = undersample.fit_resample(Xover, yover)
def resultOfCombDataset (model, X_train_comb, y_train_comb, X_val_comb,
y_val_comb, X_test_comb, y_test_comb, Model_Path):
x = model
x.fit(X_train_comb, y_train_comb)
70
print('Train_accuracy_score',accuracy_score(y_train_comb
y_predtrain_comb))
print('Train_precision_score',precision_score(y_train_comb,
y_predtrain_comb))
print('Train_recall_score ',recall_score(y_train_comb, y_predtrain_comb))
y_predval_comb = x.predict(X_val_comb)
print('\t\tValidation Classification Report:')
metrics(y_val_comb, y_predval_comb)
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.show()
cm_normalized_val_comb=cm_val_comb.astype('float')/
cm_val_comb.sum(axis=1)[:, np.newaxis]
plt.ylabel('Actual')
plt.show()
precision_val_comb=tp_val_comb/(tp_val_comb+fp_val_comb)
print ( "precision_val_comb", precision_val_comb )
recall_val_comb=tp_val_comb/(tp_val_comb+fn_val_comb)
print ( "recall_val_comb", recall_val_comb )
71
F1_score_val_comb=
(2*precision_val_comb*recall_val_comb)/(precision_val_comb+recall_val_comb)
print ( "F1-score_val_comb", F1_score_val_comb )
specificity_val_comb=tn_val_comb/(tn_val_comb+fp_val_comb)
print ( "specificity_val_comb", specificity_val_comb )
False_positive_rate_val_comb=fp_val_comb/(fp_val_comb+tn_val_comb)
print ( "False_positive_rate_val_comb", False_positive_rate_val_comb )
False_negative_rate_val_comb=fn_val_comb/(fn_val_comb+tp_val_comb)
print ( "False_negative_rate_val_comb", False_negative_rate_val_comb )
y_predtest_comb = x.predict(X_test_comb)
metrics(y_test_comb, y_predtest_comb)
cm_test_comb = confusion_matrix(y_test_comb, y_predtest_comb)
sns.heatmap(cm_test_comb, annot=True, fmt='d')
plt.ylabel('Actual')
plt.show()
cm_normalized_test_comb=cm_test_comb.astype('float')/
cm_test_comb.sum(axis=1)[:, np.newaxis]
sns.heatmap(cm_normalized_test_comb,fmt='.11g', annot=True, linewidths =
0.01)
plt.show()
tn_test_comb,fp_test_comb,fn_test_comb,tp_test_comb=
np.ravel(cm_normalized_test_comb)
72
precision_test_comb=tp_test_comb/(tp_test_comb+fp_test_comb)
False_positive_rate_test_comb=fp_test_comb/(fp_test_comb+tn_test_comb)
print ( "False_positive_rate_test_comb", False_positive_rate_test_comb )
False_negative_rate_test_comb=fn_test_comb/(fn_test_comb+tp_test_comb)
save_model(x, Model_Path)
print("Evaluation of Logistic regression Classifier")
resultOfCombDataset(Lr_model_Comb, X_train_comb, y_train_comb, X_val_comb,
y_val_comb, X_test_comb, y_test_comb, lr_Comb)
print("Evaluation of Random Forest Classifier")
resultOfCombDataset(RF_model_Comb, X_train_comb, y_train_comb, X_val_comb,
y_val_comb, X_test_comb, y_test_comb, RF_Comb)
print("Evaluation of Support Vector Machine")
resultOfCombDataset(SVC_model_Comb, X_train_comb, y_train_comb, X_val_comb,
y_val_comb, X_test_comb, y_test_comb, SVC_Comb).
73