Credit Card Fraud Detection Based on Machine Learning and Deep Learning[1][1] (AutoRecovered)
Credit Card Fraud Detection Based on Machine Learning and Deep Learning[1][1] (AutoRecovered)
A PROJECT REPORT
Submitted by
DHARSHINI M (960121243013)
MONISHA M (960121243030)
SAJITHA C (960121243039)
SALINI I (960121243040)
BACHELOR OF TECHNOLOGY
IN
MAY 2025
ANNA UNIVERSITY: CHENNAI 600 025
BONAFIDE CERTIFICATE
SIGNATURE SIGNATURE
Azhagappapuram, Azhagappapuram,
ii
ABSTRACT
With the rapid evolution of the technology, the world is turning to use
credit cards instead of cash in their daily life, which opens the door to many
new ways for fraudulent people to use these cards in a fraudulent way. The
Federal Trade Commission estimates that 10 million people are victimized by
credit card theft each year. Credit card companies lose close to $50 billion
dollars per year because of fraud. This is a very relevant problem that demands
the attention of communities such as machine learning and data science where
the solution to this problem can be automated. The main objective of this paper
is to predict the chances of a fraudulent activity, to improve the prediction
accuracy, and to accomplish self-learning ability. In this article, we focus on
obtaining deep feature representations of legal and fraud transactions from the
aspect of the loss function of a deep neural network by using Squirrel
Optimization and Advance DLMNN classifier. The purpose of this paper is to
obtain better separability and discrimination of features so that it can improve
the performance of our fraud detection model and keep its stability.
iii
ACKNOWLEDGEMENT
iv
TABLE OF CONTENTS
1 INTRODUCTION 1
1.1 OVERVIEW 1
1.2 EXISTING CHALLENGES 2
1.3 SIGNIFICIANCE RESAERCH 3
1.4 APPLICATIONS 4
1.5 OBJECTIVE 5
2 LITERATURE REVIEW 6
2.1 Existing system 12
2.2 Drawback 14
2.3 Proposed system 15
3 SYSTEM SPECIFICATION 18
3.1 HARDWARE REQUIREMENTS 18
3.2 SOFTWARE REQUIREMENTS 18
3.3 OPERATING SYSTEM 18
3.4 PYTHON 20
3.5 JUPYTER NOTEBOOK 22
4 SYSTEM DESIGN 24
v
4.1 ARCHITECTURE DESIGN 24
5 PROJECT DESCRIPTION 26
5.1 DATA COLLECTION AND PRE PROCESSING 26
5.2 FEATURE SELECTION 26
5.3 FRAUD CLASSIFICATION 26
5.4 MODEL TRAINING AND OPTIMIZATION 27
5.5 REAL TIME FRAUD DETECTION SYSTEM 27
5.6PERFORMANCEEVALUATIONANDMONITORING 27
5.7 USER INTERFACE AND REPORTING 27
6 SYSTEM TESTING 29
6.1 SYSTEM TESTING 29
7 SYSTEM IMPLEMENTATION 32
7.1 DATASET 32
7.2 FEATURE ENGINEERING 33
7.3 DATA BALANCING 34
7.4 MODEL TRAINING 34
7.5 MODEL EVALUATION 36
8 FUTURE ENHANCEMENT 38
9 CONCLUSION 43
10 BIBILOGRAPHY 45
10.1 JOURNAL REFERENCES 45
vi
LIST OF FIGURES
vii
LIST OF ABBREVATIONS
viii
CHAPTER 1
INTRODUCTION
1.1 OVERVIEW
In this paper, the aim is to build a credit card fraud detection model based
on deep representation learning methods that can learn effective representations
of transaction behaviors. Simultaneously, we hope that our model can have
good stability. For the class imbalance problem, there are many methods to
handle it. This article pays more attention to a better learning representation
that can both enhance the performance of fraud detection and keep the stability
of performance. As mentioned in the literature, a representation learning
method is to learn representations of the data that can easily extract useful
information when building classifiers or other predictors. Representation
learning has been applied widely such as person reidentification and face
recognition.
2
instant fraud classification, leading to potential delays or missed fraudulent
activities.
3
detection by analyzing transaction patterns and identifying fraudulent activities
with higher accuracy. By incorporating Squirrel Optimization and the
Advanced DLMNN classifier, the model aims to improve feature
representation and achieve superior discrimination between fraudulent and
legitimate transactions. This advancement not only enhances the prediction
accuracy but also ensures adaptability and self-learning capabilities, allowing
the system to dynamically evolve with emerging fraud patterns. Furthermore,
an efficient fraud detection system helps financial institutions reduce financial
losses, improve customer trust, and strengthen overall security in the digital
payment ecosystem. The broader significance extends beyond just economic
benefits; it contributes to cybersecurity advancements, minimizes identity theft,
and protects consumers from the distress caused by fraudulent transactions.
Ultimately, this project provides a cutting-edge approach to fraud detection,
leveraging deep learning methodologies to create a more robust and reliable
system for safeguarding financial transactions in an increasingly digital world.
1.4 APPLICATIONS
4
1.5 OBJECTIVE
5
CHAPTER 2
LITERATURE REVIEW
2.1. Transaction fraud detection based on total order relation and behavior
diversity, Zheng, G. Liu, C. Yan, and C. Jiang [2018]:
6
2.2 Credit card fraud detection: A realistic modeling and a novel learning
strategy, A. D. Pozzolo, G. Boracchi, O. Caelen, C. Alippi, and G.
Bontempi [2018]:
Achieved three major contributions. First, they propose, with the help of
their industrial partner, a formalization of the fraud-detection problem that
realistically describes the operating conditions of FDSs that everyday analyze
massive streams of credit card transactions. They also illustrate the most
appropriate performance measures to be used for fraud-detection purposes.
Second, they design and assess a novel learning strategy that effectively
addresses class imbalance, concept drift, and verification latency. Third, in their
experiments, they demonstrate the impact of class unbalance and concept drift
in a real-world data stream containing more than 75 million transactions,
authorized over a time window of three years.
7
2.4. Representation learning: A review and new perspectives, Y. Bengio,
A. Courville, and P. Vincent [2013]:
2.5. Deep representation learning with part loss for person re-
identification, H. Yao, S. Zhang, R. Hong, Y. Zhang, Q. Tian [2019]:
2.6. A light CNN for deep face representation with noisy labels, X. Wu, R.
He, Z. Sun, and T. Tan [2018]:
9
and not on historic databases of past cardholder activities. Among the main
characteristics of credit card traffic are the great imbalance between proper and
fraudulent operations, and a great degree of mixing between both. To ensure
proper model construction, a nonlinear version of Fisher's discriminant
analysis, which adequately separates a good proportion of fraudulent operations
away from other closer to normal traffic, has been used.
10
relevant global issue. Recently, there has been major interest for applying
machine learning algorithms as data mining technique for credit card fraud
detection. However, number of challenges appear, such as lack of publicly
available data sets, highly imbalanced class sizes, variant fraudulent behavior
etc. In this paper they compare performance of three machine learning
algorithms: Random Forest, Support Vector Machine and Logistic Regression
in detecting fraud on real-life data containing credit card transactions. To
mitigate imbalanced class sizes, they use SMOTE sampling method. The
problem of ever-changing fraud patterns is considered with employing
incremental learning of selected ML algorithms in experiments. The
performance of the techniques is evaluated based on commonly accepted
metric: precision and recall.
11
2.1EXISTING SYSTEM
2.1.1 Introduction
In the existing method there are some challenging issues for supervised
learning and unsupervised learning in fraud detection. On the other hand,
machine learning (ML) techniques were employed to predict the suspicious and
non-suspicious transactions automatically by using classifiers. Therefore, the
combination of machine learning and data mining techniques were able to
identify the genuine and non-genuine transactions by learning the patterns of
the data in accurate labeled dataset. The most commonly techniques used fraud
detection method is K-Nearest Neighbor (KNN) Algorithm. This technique can
be used alone or in collaboration using ensemble or meta-learning techniques
to build classifiers.
2.1.2Theoretical Background
12
2.1.3 Methodology
The first step is to prepare the data by cleaning it, normalizing it, and
splitting it into training and testing datasets. The training dataset is used to train
the KNN model, while the testing dataset is used to evaluate its performance.
The next step is to choose the value of K, which determines the number
of neighbors that are considered in the classification. A small value of K can
lead to overfitting, while a large value can lead to underfitting. A common
approach is to use cross-validation to choose the optimal value of K that
maximizes the accuracy of the model.
Compute distances:
For each new transaction in the testing dataset, the distances to all
transactions in the training dataset are computed using a distance metric such
as Euclidean distance or Manhattan distance.
The K transactions with the smallest distances are selected as the nearest
neighbors of the new transaction.
13
Evaluate the performance:
The advantages of using KNN algorithm for credit card fraud detection
are: It is a simple and intuitive algorithm that does not require extensive
training or parameter tuning. It can be effective in detecting fraud patterns that
are similar to past frauds and have been previously labeled. It can handle
imbalanced datasets where the number of fraudulent transactions is much
smaller than the number of legitimate transactions.
2.3 DRAWBACK
This system does not perform very well when the data set has more noise
i.e., target classes are overlapping.
14
This method may require a large amount of historical transaction data to
learn the user behavior and detect abnormal patterns, which can be a challenge
in situations where data is scarce or incomplete.
15
activities cause a financial loss for both company and customer. Thus, the
challenges of fraudulent activities increased the demand for systems to detect
credit card fraud. The researchers try to build fraud detection systems using
machine learning, deep learning, and data mining techniques to detect the
transaction whether it is fraudulent transactions or genuine based on datasets
that include information about the transactions. However, credit card fraud
detection is becoming more complex since the fraudulent transactions for the
cards are more and more like legal ones.
This paper is proposed to tackle the problem of credit card fraud using
machine learning and deep learning models performed on the Fraud Detection
dataset provided by Kaggle. The main contribution of this work is to develop a
fraud detection model using deep learning modified neural network (DLMNN)
classifier. Finally, evaluate the performance the proposed system with existing
methods. Model Evaluation Metrics like Precision, Recall, F1-Score are used
for performance evaluation.
16
transactions are greatly affected by the sampling approach on data-set, selection
of variables and detection techniques used. Dataset of credit card transactions
is collected from kaggle and it contains a total of 2,84,807 credit card
transactions of a European bank data set. It considers fraud transactions as the
“positive class” and genuine ones as the “negative class”.
17
CHAPTER 3
SYSTEM SPECIFICATIONS
3.1HARDWARE REQUIREMENTS
Windows 11:
18
from the taskbar as a group, and new gaming technologies inherited from Xbox
Series X and Series S similar as Auto HDR and Direct Storage on compatible
tackle. Internet Discoverer ( IE) has been replaced by the Chromium- grounded
Microsoft Edge as the dereliction web cyber surfer, like its precursor,
Windows 10, and Microsoft brigades is integrated into the Windows shell.
Microsoft also blazoned plans to allow further inflexibility in software that can
be distributed via the Microsoft Store and to support Android apps on Windows
11(including a cooperation with Amazon to make its app store available for the
function).
Windows 11 comes with cutting-edge features that help protect you from
malware. While staying vigilant is the most important protective measure you
can take, security features in Windows 11 also help provide real-time detection
and protection.
3.4 PYTHON
Python's syntax and dynamic codifying with its interpreted nature, makes
it an ideal language for scripting and rapid-fire operation development.
20
Python isn't intended to work on special areas similar as web
programming. That's why it's known as multipurpose because it can be used
with web, enterprise, 3D CAD etc.
We do not need to use data types to declare variable because it's stoutly
compartmented so it can be written as a = 10 to assign an integer value in an
integer variable.
Python Features
2) Expressive Language
3) Interpreted Language
4) Cross-platform Language
21
5) Free and Open Source
6) Object-Oriented Language
7) Extensible
It implies that other languages such as C/C++ can be used to compile the
code and thus it can be used further in our python code.
Python has a large and broad library and provides rich set of module and
functions for rapid application development.
10) Integrated
It can be easily integrated with languages like C, C++, and JAVA etc.
22
interactive computing products Jupyter Notebook, JupyterHub, and
JupyterLab. Jupyter is financially patronized by NumFOCUS.
23
CHAPTER 4
SYSTEM DESIGN
4.1.1 Introduction
24
Min-max
Scalar
Data Data Transformation Squirrel
Correlation Optimization
collection Cleaning
Data
Normalization
Training Feature Selection
Phase
Pre processing
DLMNN
Based
Prediction
Model
Testing Training
Phase
Min-max Scalar
Data Transformation
Input Data
normalization
Pre processing
26
5.4 MODEL TRAINING AND OPTIMIZATION
Once the feature selection and classification model are defined, the
system undergoes extensive training using historical transactional data. This
module optimizes the model using techniques such as backpropagation,
dropout regularization, and hyperparameter tuning. Additionally, the model is
continuously updated with new fraud patterns to improve adaptability and
maintain high performance over time.
28
CHAPTER 6
SYSTEM TESTING
INTRODUCTION
In this section, I perform the system testing for each class in the dataset
acquired from Kaggle. This testing can be done by providing input dataset
containing two-day transactions made on 09/2013 by European cardholders.
The dataset contains 492 frauds out of 284,807 transactions. Thus, it is highly
unbalanced, with the positive (frauds) accounting for only 0.17%.
1. Nominal - 5%
2. Outer Race - 5%
29
3. Inner Race - 5%
Validation split
Cross-ValidationTesting
To ensure the model's performance is not biased due to the specific train-
test split, k-fold cross-validation was applied. This technique divides the dataset
into k subsets and iteratively trains and tests the model on different
combinations, providing a more reliable estimate of generalization accuracy.
HyperparameterSensitivityTesting
Different hyperparameters such as learning rate, number of layers, number
of neurons, and activation functions were varied to observe their effects on
model accuracy. This testing helped identify optimal hyperparameter
configurations that enhance model performance without overfitting.
NoiseRobustnessTesting
To simulate real-world conditions, random noise was introduced into the
dataset. This test evaluated how well the proposed approach could maintain
accuracy when presented with noisy or imperfect input data, indicating its
reliability in practical applications.
ScalabilityTesting
The system was tested using datasets of increasing size to assess how well
30
it scales. Performance metrics like training time, memory usage, and accuracy
were monitored to understand how the model behaves with larger volumes of
data.
ClassImbalanceTesting
To determine the system's robustness in handling imbalanced datasets,
experiments were conducted where certain classes had significantly fewer
samples than others. Techniques such as oversampling, undersampling, and
weighted loss functions were used to mitigate imbalance and evaluate the
system’s fairness and precision.
31
CHAPTER 7
SYSTEM IMPLEMENTATION
32
visualizing fraud reports, monitoring system performance, and assisting fraud
analysts in decision-making. The system is further evaluated using key
performance metrics, ensuring its reliability, adaptability to evolving fraud
patterns, and effectiveness in minimizing financial losses due to fraud.
7.1 DATASET
The dataset is the Kaggle Credit Card Fraud Detection dataset here. It
contains two-day transactions made on 09/2013 by European cardholders. The
dataset contains 492 frauds out of 284,807 transactions. Thus, it is highly
unbalanced, with the positive (frauds) accounting for only 0.17%.
Each neuron in the hidden layer receives weighted input plus bias from
each neuron in the previous layer.
35
j −1
Where XK denotes the input from k-th node in j-th layer, Wk,j is the
weight of the link between node k and all the nodes in the previous layers, and
bi is the bias to the node ,Nj-1 is the number of nodes in the layer j-1.
36
and false positives differ significantly. Using these metrics together ensures that the
proposed approach is not only accurate but also reliable and effective in real-world scenarios.
37
CHAPTER 8
FUTURE EHANCEMENT
38
Another advancement lies in the application of blockchain technology,
which can offer tamper-proof and transparent records of transactions through
decentralized ledgers. This can enhance the integrity of transaction data and
reduce the chances of manipulation by malicious actors. Additionally, the
adoption of Explainable AI (XAI) allows fraud detection systems to produce
interpretable decisions, fostering greater trust and facilitating regulatory
compliance. As data privacy laws become more stringent, implementing
privacy-preserving methods such as differential privacy and federated learning
will be essential. These technologies enable collaborative model training across
institutions without exposing sensitive customer data, thereby strengthening
fraud prevention while maintaining user privacy.
39
OUTPUT:
40
41
42
CHAPTER 9
CONCLUSION
Credit card fraud is one of the most important problems that financial
institutions are currently facing. In this paper, a deep representation learning
model is proposed for credit card fraud detection that has the advantage to
achieve a good and stable performance. This paper uses a deep learning model
to detect whether an online transaction is legitimate or fraud. This paper uses
deep learning techniques like, Squirrel Optimization and Advance DLMNN
classifier for the detection of credit card fraudulent activities. For this work
Jupyter Notebook tool is used. The Kaggle dataset for credit card transactions
is used in this paper. This work achieves maximum accuracy of 99.66%.
Although the proposed method obtains good results on small set data, there are
still some problems such as imbalanced data. The future work will focus on
solving these problems and improving the algorithm.
43
terms of precision, recall, and overall accuracy. However, to enhance its
practicality, future work should explore the deployment of the model in real-
time transaction environments and test it against dynamic, evolving fraud
patterns. Incorporating real-time data streams, developing adaptive learning
mechanisms, and employing more extensive datasets with class-balancing
strategies will further improve the model’s applicability and resilience in real-
world scenarios.
44
CHAPTER 10
BIBLIOGRAPHY
45
7. Y. Wen, K. Zhang, Z. Li, and Y. Qiao, “A discriminative feature learning
approach for deep face recognition,” in Proc. Eur. Conf. Comput. Vis.
(ECCV). Cham, Switzerland: Springer, 2016, pp. 499–515.
8. J. Dorronsoro, F. Ginel, C. Sgnchez, and C. Cruz, “Neural fraud
detection in credit card operations,” IEEE Trans. Neural Netw., vol. 8,
no. 4, pp. 827–834, Jul. 1997.
9. D. Dighe, S. Patil, and S. Kokate, “Detection of credit card fraud
transactions using machine learning algorithms and neural networks: A
comparative study,” in 2018 Fourth International Conference on
Computing Communication Control and Automation (ICCUBEA).
IEEE, 2018, pp. 1–6.
10.M. Puh and L. Brkic´, “Detecting credit card fraud using selected
machine learning algorithms,” in 2019 42nd International Convention on
Information and Communication Technology, Electronics and
Microelectronics (MIPRO). IEEE, 2019, pp. 1250–1255.
46