0% found this document useful (0 votes)
11 views48 pages

Miniproject Group E 1

Uploaded by

bhavanamath09
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views48 pages

Miniproject Group E 1

Uploaded by

bhavanamath09
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 48

Credit crad fraud detection 2023-2024

DECLARATION

I hereby declare that this project report titled "Credit Card Fraud Detection" is my original work
and has not been submitted previously for any degree or diploma at any other educational institution.
This report is a result of my own research and the information provided is accurate to the best of my
knowledge.
Credit card fraud detection leverages a range of sophisticated algorithms to accurately identify
and prevent fraudulent transactions. It includes logistic regression, which provides probabilistic
classification by estimating the likelihood of fraud; decision trees, which offer a clear decision-making
process through hierarchical splitting of data; and ensemble methods like Random Forest and Gradient
Boosting, which combine multiple models to improve prediction accuracy. Machine learning
techniques such as Support Vector Machines (SVM) and Neural Networks are also employed to
capture complex patterns in transaction data. Unsupervised learning methods, including clustering
algorithms like k-means and anomaly detection techniques, help identify outliers without pre-labeled
data.
The output of these algorithms is typically evaluated , with many systems achieving accuracy
rates above 90%. For example, models like Random Forest scores in the range of 85-95%, reflecting
their effectiveness in balancing the detection of fraudulent transactions while minimizing false
positives. The continuous tuning of these models, along with feature engineering and behavioral
analytics, ensures that fraud detection systems remain robust and adaptive to emerging threats.

Our system also provides a thorough comparative analysis of different machine learning
algorithms. We benchmark a range of models, including Support Vector Machines (SVM), Random
Forests, Gradient Boosting, and Neural Networks. By evaluating these algorithms based on multiple
performance metrics—such as accuracy, precision, recall, F1-score, and ROC-AUC—we establish
comprehensive performance benchmarks. Systematic hyperparameter tuning is employed to optimize
each model’s configuration, ensuring that we achieve the best possible performance for fraud detection.

I would like to extend my gratitude to my project supervisor, Dr.S Krishna Anand for their
guidance and support throughout this project. Their expertise and feedback have been invaluable in
shaping this work. I also thank my peers and family for their encouragement and assistance.
Credit card fraud detection 2023-2024

ACKNOWLEDGE
MENT

I would like to express my sincere gratitude to all those who have supported and guided me
throughout the course of this project on "Credit Card Fraud Detection." This project would not
have been possible without the help, encouragement, and expertise of many individuals.

We would like to extend our heartfelt gratitude to everyone who contributed to the
development and success of the Credit Card Fraud Detection System. This innovative project
would not have been possible without the collective efforts of a dedicated team of professionals,
researchers, and collaborators.

Firstly, we acknowledge the invaluable contributions of our machine learning experts and
data scientists. Their expertise in developing and fine-tuning state-of-the-art algorithms, including
Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), was
instrumental in creating a system capable of accurate and efficient fraud predictions. Their
dedication to feature engineering, ensemble methods, and hyperparameter tuning has greatly
enhanced the system's performance.

Our thanks extend to the data providers that contributed valuable datasets for training and
validating the system. The integration of diverse and comprehensive data sources was vital for
developing a robust and reliable predictive model. We appreciate their collaboration and support
in making this project a success.

This project has been a significant learning experience, and I am grateful for the
opportunity to work on such a relevant and impactful topic. The knowledge and skills gained
during this project will undoubtedly benefit my future endeavors in the field of cloud computing
and monitoring.

Dept of AI&DS, SIET


Credit card fraud detection 2023-2024

NOMENCLATURE

ANN : Artificial Neural Network

ATM : Automated Teller Machine

CNP : Card not present

CRISP-DM : Cross industry standard process for data mining

DT : Decisions Tree

FN : False Negative

FP : False Positive

KNN : K-Nearest Neighbor

LR : Logistic Regression

NB : Naive Bayes
NN : Neural Network
PCA : Principal Component Analysis
RF : Random Forest

SVM : Support vector Machine

TN : True Negative

TP : True Positive

Dept of AI&DS, SIET I


Credit card fraud detection 2023-2024

Abstract
The craze for money has led to a huge increase in anonymous users exploiting the
vulnerabilities of innocent human beings by swindling their hard earned money by means of
carrying out fraudulent transactions through various means like tapping of ATM( Automated Teller
Machine) machines, forged signatures, unauthorized use of credit cards and bland theft. This work
exclusively focuses on detection of fraudulent transactions taking place through credit cards. This
work takes a small step in offsetting few of the difficulties faced by the customers. To make the
system more realistic and foolproof, four sets of models with different parameters have been
experimented with. The models include KNN ( k-Nearest Neighborhood ), Logistic Regression,
Naive Bayes and SVM ( Support Vector Machine). A comparison in accuracy levels is made
between these techniques and the most appropriate models has been chosen.

Dept of AI&DS, SIET II


Credit card fraud detection 2023-2024

Table of Contents
CHAPTER NO TITLE PG NO

ACKNOWLEDGMENTS I
NOMENCLATURE II
ABSTRACT III

LIST OF FIGURES IV

LIST OF TABLES IV

1. INTRODUCTION 1

1.1 GENERAL INTRODUCTION 1

1.2 OBJECTIVES 1

1.3 SCOPE OF THE PROJECT 2

1.4 CHARACTERISTICS /ADVANTAGES 3

1.5 HARDWARE AND SOFTWARE SPECIFICATIONS 5

2. LITERATURE REVIEW 7

3. CREDIT CARD FRAUD DETECTION 9

3.1 INTRODUCTIONS TO CONCEPT OF CREDIT CARD 9

3.2 TYPES OF CREDIT CARDS 9

3.3 COMMON METHODS OF CREDIT CARD FRAUD 10

3.4 IMPACT OF CREDIT CARD FRAUD 11

3.5 PREVENTION AND DETECTION 12

4. ALGORITHM APPROACH 14

4.1 INTRODUCTION 14

4.2 DATA SOURCE 14

4.3 DATA PREPARATION 15

4.4 DATA PRE-PROCESSING 18

4.5 DATA MODELING 18

4.6 KNN 18

Dept of AI&DS, SIET III


Credit card fraud detection 2023-2024

4.7 NAÏVE BAYES 23

4.8 LOGISTIC REGRESSION 26

4.9 SUPPORT VECTOR MACHINE 29

4.10 EVALUATION AND DEPLOYMENT 31

5. CONCLUSION 33

5.1 CONCLUSION 33

5.2 FUTUREWORK 33

5.3 APPENDIX A:SOURCE CODE 35

5.4 APPENDIX B:DATASET OF CREDIT CARD FRAUD 38


DETECTION
6. REFERENCES 40

Dept of AI&DS, SIET IV


Credit card fraud detection 2023-2024

List of Figures

Fig No Names Pg no

1 Dataset Structure 11
2 Class Distribution 12
3 Correlations 12
4 Variable 18 13
5 Variable 28 13
6 Weka K=3 15
7 RStudio K=3 15
8 RStudio K=7 16
9 Weka K=7 16
10 Weka Naïve Bayes 17
11 RStudio Naïve Bayes 17
12 Weka Logistic Regression 18
13 RStudio Logistic Regression 18
14 Support Vector Machine 19

List of Tables

Table no Title Pg No

1 Confusion Matrix 20
2 Table of Accuracies 21

Dept of AI&DS, SIET V


Credit card fraud detection 2023-2024

Chapter 1:Introduction
1.1 General Introduction

With the increase in people using credit cards in their daily lives, credit card companies
should take special care in the security and safety of the customers. According to the world bank,
the number of people using credit cards around the world was 2.8 billion in 2019, in addition 70%
of those users own a single card at least.

Reports of Credit card fraud in the US rose by 44.7% from 271,927 in 2019 to 393,207
reports in 2020. There are two kinds of credit card fraud, the first one is by having a credit card
account opened under your name by an identity thief, reports of this fraudulent behavior increased
48% from 2019 to 2020. The second type is by an identity thief uses an existing account that you
created, and it’s usually done by stealing the information of the credit card, reports on this type of
fraud increased 9% from 2019 to 2020 . Those statistics caught the attention of researchers as the
numbers are increasing drastically and rapidly throughout the years, which provided the motive to
them for trying to resolve the issue analytically by using different machine learning methods to
detect the credit card fraudulent transactions within numerous transactions.

1.2 Objectives
The primary objective of this work deals with accurate detection of fraudulent transactions
through usage of credit cards. The ideal choice of machine learning technique that leads to
accurate levels of detection helps a long way in increasing the trust levels of the customer. This
work is a pointer in that direction.

Dept of AI&DS, SIET 1


Credit card fraud detection 2023-2024

1.3 Scope of the project

Enhancing a fraud detection system involves several key strategies. Firstly, integrating
advanced techniques like deep learning, reinforcement learning, and ensemble methods can
significantly improve detection accuracy and reduce false positives. Implementing real-time
adaptation mechanisms allows the model to continuously learn from new transaction data, staying
ahead of emerging fraud patterns. Expanding data sources, including social media activity, device
information, and biometric data, enriches the model's ability to detect fraud, while incorporating
global transaction data ensures effectiveness across different regions and fraud types. System
integration should be enhanced to connect seamlessly with broader financial platforms, such as
customer service and risk management tools, ensuring a holistic approach to fraud prevention.
Ensuring cross-platform compatibility will further allow the system to work effortlessly with
various payment processing systems. Improving user experience through a more intuitive interface
and customization alerts helps fraud analysts and administrators interact efficiently with the
system. Performance optimization is essential, focusing on scalability to handle larger transaction
volumes and reducing latency for near-instantaneous response times. The system should also be
updated regularly to comply with evolving regulatory requirements, with robust audit trails to
ensure transparency. Enhanced fraud detection capabilities, including advanced behavioral
analytic and improved anomaly detection, are crucial for identifying subtle and new types of fraud.
Additionally, user education and training programs will ensure that end-users and analysts are
well-equipped to utilize the system effectively, with best practices guiding the interpretation of
alerts. Finally, fostering partnerships with other financial institutions and adopting industry
standards will enable shared insights and strategies, strengthening the overall approach to fraud
prevention.

Dept of AI&DS, SIET 2


Credit card fraud detection 2023-2024

1.4 Characteristics and Advantages

1.4.1 Characteristics:

Accuracy: The ability to correctly identify fraudulent transactions while minimizing false
positives (legitimate transactions flagged as fraud) and false negatives (fraudulent transactions not
detected).

Real-time Processing: The capability to analyze and flag transactions instantly as they occur,
enabling immediate intervention to prevent fraudulent activities.

Scalability: The system must handle large volumes of transactions efficiently, especially
during peak times, without compromising performance or accuracy.

Adaptability: The ability to update and adapt to new fraud patterns and techniques as
fraudsters continuously evolve their strategies.

Machine Learning Integration: Utilization of advanced machine learning algorithms to learn


from historical data and improve detection rates over time. Techniques include Neural Networks,
Support Vector Machines (SVM), K-Nearest Neighbors (KNN), and ensemble methods.

Feature Engineering: Extraction and selection of relevant features from transaction data,
such as transaction amount, location, time, frequency, and user behavior patterns, to enhance model
performance.

Anomaly Detection: Identifying deviations from normal transaction behavior that could
indicate potential fraud, such as unusual spending patterns or transactions from unexpected
locations.

Behavioral Analysis: Analyzing the spending habits and behaviors of individual


cardholders to detect inconsistencies that may suggest fraudulent activity.

Multi-layered Security: Incorporating multiple layers of security measures, such as


biometric authentication, device fingerprinting, and two-factor authentication, to enhance
fraud prevention.

Dept of AI&DS, SIET 3


Credit card fraud detection 2023-2024

Ensemble Methods: Combining multiple models to improve overall detection accuracy,


leveraging the strengths of different algorithms to produce a more robust system.

1.4.2 Advantages:

Increased Security: Fraud detection systems help to identify and prevent fraudulent
transactions in real-time, significantly reducing the risk of financial losses for both consumers and
financial institutions.

Financial Savings: By detecting and preventing fraudulent activities, these systems save
financial institutions millions of dollars annually that would otherwise be lost to fraud. This also
minimizes the financial impact on customers who might otherwise be liable for fraudulent charges.

Enhanced Customer Trust: Customers feel more secure knowing that their transactions are
being monitored and protected against fraud, which enhances their trust and confidence in the
financial institution and its services.

Real-time Fraud Detection: Modern fraud detection systems can analyze transactions in real-
time, allowing for immediate action to be taken to prevent fraud, such as blocking a suspicious
transaction or alerting the customer.

Reduction in False Positives: Advanced machine learning algorithms improve the accuracy
of fraud detection, reducing the number of legitimate transactions that are incorrectly flagged as
fraudulent, which in turn reduces customer inconvenience and dissatisfaction.

Dept of AI&DS, SIET 4


Credit card fraud detection 2023-2024

1.5 Hardware and Software Specifications

Hardware requirements:
CPU : Intel i7 or AMD Ryzen 7 and above
GPU : NVIDIA GTX 1080 Ti, RTX 2080, or
higher
RAM : At least 32GB
Storage : SSD with at least 1TB
Network : High-speed internet connection
Backup and Redundancy : External Hard Drives, NAS, Cloud
Storage
Peripherals : Dual Monitors, Ergonomic Keyboard
and Mouse
Software requirements:
Operating System : Windows 11
Programming Languages
and Libraries :

 Python

 NumPy

 Pandas

 Scikit-learn

 Matplotlib/Seaborn

Integrated Development Environment (IDE):

 Jupyter Notebook

Dept of AI&DS, SIET 5


Credit card fraud detection 2023-2024

Chapter 2: Literature Review


The journey of credit card fraud detection has been a remarkable evolution, beginning with
the rise of credit card usage in the mid-20th century and continuing to the present day with
sophisticated artificial intelligence systems.

In the bustling 1960s, as credit cards began to revolutionize consumer behavior, fraud
emerged as a significant concern. Early attempts to combat fraud were rudimentary and labor-
intensive. Merchants and banks relied on lists of stolen card numbers, manually scrutinizing
transactions to spot potential fraud. This manual process was inefficient and often inadequate.

By the 1980s, as computers became more integrated into business operations, the landscape
of fraud detection began to change. Simple automated systems emerged, employing rule-based
algorithms to flag suspicious activities. For example, if a card was used in New York and then
suddenly in Los Angeles within an hour, the system would raise an alert for potential fraud. These
early-automated systems marked the beginning of a more structured approach to fraud prevention.

The 1990s brought significant advancements with the rise of more sophisticated algorithms
and databases. Statistical methods and anomaly detection techniques were developed to better
analyze transaction data. These methods allowed for the identification of unusual spending patterns
that could indicate fraud, laying the groundwork for more advanced fraud detection mechanisms.

With the dawn of the 2000s, the internet era brought e-commerce to the forefront, presenting
new challenges and opportunities for fraud detection. Machine learning and data mining techniques
started to take center stage. Neural networks and decision trees were among the early machine
learning methods used, significantly enhancing the ability to detect fraudulent activities by learning
from vast amounts of transaction data.

The 2010s saw an explosion of big data and advancements in computing power, leading to a
quantum leap in fraud detection capabilities. More complex machine learning models, such as
Support Vector Machines (SVM), K-Nearest Neighbors (KNN), and ensemble methods, became
commonplace. These models could analyze large datasets in real-time, identifying even the subtlest
of fraudulent patterns with remarkable accuracy.

Dept of AI&DS, SIET 6


Credit card fraud detection 2023-2024

As one entered the third decade of the 21 st century, fraud detection systems became marvels of
modern technology. They now leverage deep learning, real-time analytics, and artificial intelligence
to stay ahead of increasingly sophisticated fraud tactics. Techniques such as deep neural networks,
convolutional neural networks (CNNs) and recurrent neural networks (RNNs) are employed to
achieve unprecedented levels of detection accuracy. Moreover, innovations in biometric
authentication, behavioral analytics, and blockchain technology are being integrated into these
systems, providing a multi-faceted defense against fraud.

The evolution of credit card fraud detection, from manual checks in the 1960s to today's
cutting-edge AI systems, is a testament to human ingenuity and the relentless pursuit of security in
the digital age. It remains a dynamic field, continually evolving to outpace the ever-advancing
strategies of fraudsters, ensuring that the convenience of credit cards remains safe and secure for
users worldwide.

Suraya Nurain Kalid and his team proposed a model for credit card fraud detection 2002. In
the proposed methodology researchers have used various machine learning algorithms such as
support vector machine (SVM), artificial neural network (ANN), Bayesian Networks, K-Nearest
Neighbors (KNN) Fuzzy Logic system and Decision Trees. In their paper, they have observed that
the algorithms k-nearest neighbor, decision tree, and the SVM give a medium level accuracy. The
Fuzzy Logic and Logistic Regression give the lowest accuracy among all the other algorithms.
Neural Networks, naive byes, fuzzy systems, and KNN offer a high detention rate. The Logistic
Regression, SVM, decision trees offer a high detection rate at the medium level. There are two
algorithms namely ANN and the Naive Bayesian Networks, which perform better at all parameters.
These are very much expensive to train. There is a major drawback in all the algorithms. The
drawback is that these algorithms do not give the same result in all types of environments. They
give improved results with one type of datasets and poor results with another kind of data set.
Algorithms like KNN and SVM give excellent results with small datasets and algorithms like
logistic regression and fuzzy logic systems give good accuracy with raw and unsampled data.

In the year 2001, Fraud A. Ghaleb and his team have used the decision tree, random forest,
SVM, and logistic regression algorithms for credit card fraud detection. Researchers have taken the
highly skewed data set. that random forest provides the best results and good accuracy as
comparison to others algorithm and also concluded that SVM algorithm has a data imbalance
problem.

Dept of AI&DS, SIET 7


Credit card fraud detection 2023-2024

Chapter 3: Credit Card Fraud Detection

3.1 Introduction to concept of credit cards


Credit card fraud is the unauthorized use of a credit card or its details to make
transactions or gain benefits without the card holder's consent. This type of fraud can occur in
several ways, including card-not-present (CNP) fraud, where stolen credit card details are used
for online or phone transactions; card-present fraud, which involves using a physical stolen or
counterfeit card at a point-of-sale terminal; and account takeover, where a fraudster gains access
to a card holder's account by stealing personal information and making unauthorized changes or
purchases. Other methods include application fraud, where false information is used to open new
credit card accounts, and identity theft, which involves using stolen personal information to
access or create credit accounts. The impact of credit card fraud can be severe, leading to
financial losses, damage to credit scores, emotional stress for the victims, and increased security
costs for financial institutions. Prevention and detection efforts include monitoring account
activity, using secure practices, and implementing advanced fraud detection systems to identify
and prevent fraudulent transactions.

3.2 Types of Credit Cards


Different types of credit cards are currently in use. They have been described I n this
section.
1. Standard Credit Cards
These are basic credit cards that offer a line of credit with no additional perks. They
typically come with a standard interest rate and basic features such as a grace period for
paying off the balance to avoid interest charges. They are used in general purpose credit cards
from major banks or financial institutions.

2. Rewards Credit Cards


These cards offer rewards such as cash back, points, or miles for every dollar spent. They
often have higher annual fees but provide valuable benefits for frequent spenders.

Dept of AI&DS, SIET 8


Credit card fraud detection 2023-2024

3. Cash Back Credit Cards


A type of rewards card that offers cash back on purchases, either as a percentage of the

transaction or through rotating categories. Cash back can be redeemed as statement credits,
deposits, or checks.

4. Travel Credit Cards


These cards are designed for frequent travelers, offering rewards in the form of travel
points or miles, which can be redeemed for flights, hotel stays, or other travel-related expenses.
They often come with travel-related benefits such as priority boarding and travel insurance.

5. Business Credit Cards


Designed for business owners, these cards offer features suited to business expenses, such
as higher credit limits, expense tracking tools, and rewards tailored to business spending. They
can help manage business finances and build business credit.

6. Secured Credit Cards


Secured cards require a cash deposit that serves as collateral and sets the credit limit. They
are typically used by individuals with poor credit or no credit history to build or rebuild their
credit score.

7. Premium Credit Cards


These cards offer high-end benefits such as access to exclusive events, concierge services,
higher rewards rates, and comprehensive travel insurance. They often come with a high annual
fee but provide significant perks and privileges.

8. Student Credit Cards


Tailored for college students or young adults with limited credit history, these cards often
have lower credit limits and fewer features but provide a way to build credit responsibly.

Dept of AI&DS, SIET 9


Credit card fraud detection 2023-2024

9. Store Credit Cards


Issued by retail stores, these cards offer rewards and discounts on purchases made at the
store or chain. They often come with high interest rates but can provide store-specific benefits.

10. Balance Transfer Credit Cards


These cards are designed to help individuals transfer high-interest credit card balances to a
new card with a lower interest rate or 0% introductory APR on balance transfers. They are useful
for managing and reducing debt.

Each type of credit card has its own set of features, benefits, and terms, making it important to
choose one that aligns with your financial goals and spending habits.

3.3 Common methods of credit card fraud


Credit card fraud encompasses various methods used by criminals to illegally access or
misuse credit card information. Here are some common methods:

1. Phishing
Fraudsters use fake emails, texts, or websites that appear legitimate to trick individuals into
revealing their credit card details or personal information. For instance, an email might claim to
be from a trusted financial institution, prompting the recipient to click on a link and enter their
credit card information on a fraudulent website.

2. Skimming
Skimming involves using a small device, called a skimmer, to capture credit card
information from the magnetic stripe of a card during legitimate transactions. The skimmer is
often placed on ATM's or gas station card readers, where it reads and stores card data while the
transaction appears normal.

Dept of AI&DS, SIET 10


Credit card fraud detection 2023-2024

3. Cloning
Cloning occurs when a fraudster copies the data from a stolen credit card and encodes it
onto a blank card, which can then be used to make fraudulent transactions. The fraudster uses
skimming or other methods to obtain the card data, then creates a counterfeit card using a card
reader/writer.

4. Account Takeover
This method involves gaining access to a person’s credit card account by stealing personal
information and using it to make unauthorized changes or transactions. Fraudsters may use stolen
login credentials or personal information obtained through social engineering to log into the
account, change account details, or make purchases.

5. Card Not Present (CNP) Fraud


In CNP fraud, the fraudster uses stolen credit card details to make online or phone
transactions where the physical card is not required. The fraudster may purchase card details from
the dark web or use information obtained through phishing or data breaches to make online
purchases.

6. Application Fraud
This type of fraud involves applying for a credit card using stolen or falsified information
to receive a new card. The fraudster uses fake or stolen identities to fill out credit card
applications, often with the intention of making fraudulent purchases or obtaining cash advances.

7. Identity Theft
Identity theft involves stealing someone’s personal information, such as Social Security
numbers or bank account details, to open credit card accounts or commit other forms of fraud.
The stolen identity information is used to apply for credit cards or loans in the victim’s name,
leading to unauthorized charges and damage to their credit.

Dept of AI&DS, SIET 11


Credit card fraud detection 2023-2024

8. Data Breaches
Data breaches occur when cyber criminals hack into databases of companies or financial
institutions to steal large amounts of credit card information. The stolen data is often sold on the
dark web or used directly by the criminals to make fraudulent transactions.

9. Social Engineering
Social engineering involves manipulating individuals into divulging confidential
information through deception or coercion. Fraudsters might pose as bank representatives or tech
support to trick victims into providing their credit card information over the phone or through
email.

10. Mail Theft


This method involves stealing credit card statements, pre-approved credit card offers, or
new credit cards from the victim’s mail. The stolen information is used to access accounts or
activate new cards without the victim’s knowledge. Understanding these methods can help
individuals and businesses take proactive measures to protect against credit card fraud. Regular
monitoring, secure practices, and awareness of common fraud tactics are essential in safeguarding
credit card information.

3.4 Impact of Credit Card Fraud


Credit card fraud has several serious consequences for both consumers and financial

Dept of AI&DS, SIET 12


Credit card fraud detection 2023-2024

institutions. Financially, it can lead to substantial losses for all parties involved, with fraudulent
transactions draining resources and profits. Legally, those responsible for such activities may face
severe repercussions, including fines and imprisonment. The reputational damage to organizations
affected by credit card fraud can be significant, as it undermines customer trust and credibility. For
consumers, the inconvenience is considerable, as they must deal with the aftermath of fraud,
including the process of disputing charges, obtaining new cards, and restoring their accounts to
normal. These combined impacts highlight the importance of robust fraud prevention measures.

3.5 Prevention and Detection


Preventive measures against credit card fraud are essential for protecting both consumers
and financial institutions. Secure transactions, ensured through encryption and the use of secure
payment gateways, are fundamental to safeguarding sensitive information. Implementing advanced
fraud detection systems that leverage machine learning and AI helps identify suspicious activities
early on. Regular monitoring of account statements and transaction histories allows for the timely
detection of unauthorized transactions. Educating consumers about safe online practices and how to
recognize phishing attempts is also crucial in preventing fraud. Overall, credit card fraud is a
serious issue that demands continuous vigilance and proactive strategies from all parties involved to
effectively prevent and detect fraudulent activities.

Dept of AI&DS, SIET 13


Credit card fraud detection 2023-2024

Chapter 4: Algorithmic Approaches

4.1 Introduction
In order to accomplish the objective and goal of the project which is to find the most
suitable model for detecting fraud in credit card transactions, several steps need to be taken.
Initially, the identification of most suitable data is carried out and then it is preprocessed. Later, a series
of algorithms like K-Nearest Neighbor (KNN), Naive Bayes, SVM and the Logistic Regression
have been incorporated. In the KNN model two Ks were chosen K=3 and K=7. All models were
created using both R and Weka tools. However, in SVM, only the Weka model has been taken into
account. In addition, all kinds of visualization have been taken from the applications.

4.2 Data Source


The data set was retrieved from an open-source website, Kaggle.com. It contains data of
transactions that were made in 2013 by credit card users in Europe, in two days only. The data set
consists of 31 attributes, 284,808 rows. Sample content for the same has been depicted in
Appendix – 2. 28 of those attributes are numeric variables. These in turn are transformed using
PCA(Principal Component Analysis). The three remaining attributes are “Time” which contains
the elapsed seconds between the first and other transactions of each attribute, “Amount” is the
amount of each transaction, and the final attribute “Class” which contains binary variables where
“1” is a case of fraudulent transaction, and “0” is not as case of fraudulent transaction.

Dept of AI&DS, SIET 14


Credit card fraud detection 2023-2024

4.3 Data Preparation

Fig.4.3.1 shows the structure of the data set. Here, all attributes are shown along with their
type. In addition to have a glimpse of the variables within each attribute, Class type is integer.

Fig 4.3. 1 - Data set Structure

Fig.4.3.2. shows the distribution of the class. Here, the red bar which contains
284,315 variables represents the non-fraudulent transaction. The blue bar with 492
variables represents the fraudulent transactions.

Dept of AI&DS, SIET 15


Credit card fraud detection 2023-2024

Fig 4.3.2 - Class Distribution

Fig 4.3.3 shows the correlation between attributes “Image from R” .It is necessary to find the

values of various principal components for a given data set. The graphical plot
representing the various principal components for each class has been depicted in Fig
4.3.3

Fig 4.3.3 - Correlations

Dept of AI&DS, SIET 16


Credit card fraud detection 2023-2024

Attribute with the most fraud

Figure 4.3.4 shows an attribute that is numbered V18 which represents the eighteenth
principal component which had already been shown in Fig.4.3.3 This attribute deals the attribute
with the most credit card fraudulent transactions. The blue line represents the variable 1 which
indicates the fraudulent transactions.

Fig 4.3.4 – Variable 18

Attribute with the less fraud

Fig.4.3.5. shows the variable that have the lowest number of fraudulent transactions. That
attribute has the number 28. As mentioned earlier, the blue line represents the fraudulent
instances within the data set.

Fig 4.3. 5 - Variable 28

Dept of AI&DS, SIET 17


Credit card fraud detection 2023-2024

4.4 Data Pre-processing

As there are neither unavailable nor duplicated variables, the preparation of the data set
was simple. The first alteration that was made to be able to open the data set on Weka program
deals with changing the type of the class attribute from Numeric to Class and identify the class as
{1,0} using the program Sublime Text. Another alteration was made on the type as well on the R
program to be able to create the model and the visualization.

4.5 Data Modeling


After making sure that the data is ready to get modeled, four models were created using
Weka and R. These models include SVM, KNN, Logistic Regression and Naive Bayes

4.6 KNN

K Nearest Neighbor (or KNN) is one of the machine learning algorithms, it is classified
under the supervised machine learning algorithms. This algorithm is popular by its simplicity and it
required no para metrical evaluation and no likelihood calculations. The k Nearest Neighbor
Algorithm can be work using three major steps.

Determination of number k is important. k has to be positive integer number. It represents


number of considered neighbors. For a known data set, preprocessing of the data is must be done at
the beginning. Then KNN process can be split into three main steps:

1. Evaluation of the distance. Distances are calculated between test data and training data. Most
common metrics for distance are Euclidean distance, Manhattan distance and Hamming distance.
Euclidean is the most frequent one.

2. Identification of the nearest Neighbor according to the distance information. Distances are sorted
in ascending order. Then top k neighbors are preferred.

3. According to the nearest Neighbors, the results that represent the prediction are made. Top k
number of distances are chosen from sorted list and a point is assigned a class to the test point
depend on most frequent class of the list.

The Euclidean distance between two points can be calculated using the equation (3.3).

(𝒙, 𝒙 ′ ) = √(𝑥1 − 𝑥′ 1) 2 + (𝑥2 − 𝑥′ 2) 2 (3.3)

Dept of AI&DS, SIET 18


Credit card fraud detection 2023-2024

where d represents distance between element of test data x and each training element x'. As the
Euclidean distance is being evaluated for each element, the classes are now ready for graphical
representation as demonstrated in Figure 4.6.1 illustrates three types of classes in the graph and
each class is far from a particular Euclidean distance from the other. each class is represented by
using specific color as circle, yellow, blue and green.

Fig .4.6.1 Graphical representation of three different classes

The second action to be taken by the K Nearest Neighbor algorithm is to evaluate the nearest
distance between the classes.

The distance values are sorted in ascending order and top k numbers of distance are selected.

The last step in the K Nearest Neighbor algorithm is to perform the classification. However,
the classification is to be made based on frequency of the nearest k neighbor's classes of training
data set. The class of the test data is assigned according to most frequent class in the nearest k
neighbors.

The similarity of the distance will lead to a decision that this entry of the test set is related to
the class with the best similarity. Figure 4.6.2 demonstrates the process of the K Nearest Neighbor
algorithm from the beginning until making the decision.

Dept of AI&DS, SIET 19


Credit card fraud detection 2023-2024

Fig 4.6.2 K nearest neighbors algorithm flow diagram.

 K=3

The K-Nearest Neighbor algorithm (KNN) is a supervised ML technique that can be


applied for both classification as well as regression problems. In this work two different values of
K namely 3 and 7 have been considered for analysis. To make the system more foolproof,
different K values were chosen. For the value of K=3 , it was found that the model scored an
accuracy of 99.83% and managed to correctly identify 91,719 transactions and missed 155. As for
the Weka program the model scored 99.94% for the accuracy and miss-classified 52 transactions.
As the level of accuracy changes with different types and number of transactions, the average value
has been chosen. It was found that the average of the accuracies is 99.89%. The various accuracy
levels have been shown in Fig.4.6.3 while the average accuracy value has been shown in Fig.4.6.4.

Dept of AI&DS, SIET 20


Credit card fraud detection 2023-2024

Fig 4.6.3- Weka K=3

Fig 4.6.4 - RStudio K=3

Dept of AI&DS, SIET 21


Credit card fraud detection 2023-2024

 K=7

For the value of K=7, it was found that the model scored an accuracy of 99.82% and
managed to correctly identify 91,719 transactions and missed 52. As for the Weka program the
model scored 99.88% for the accuracy and miss-classified 52 transactions. As the level of accuracy
changes with different types and number of transactions, the average value has been chosen. It was
found that the average of the accuracies is 99.88%. The various accuracy levels have been shown
in Fig.4.6.5 while the average accuracy value has been shown in Fig.4.6.6.

Fig 4.6.5 - Weka K=7

Fig 4.6.6 - RStudio K=7

Dept of AI&DS, SIET 22


Credit card fraud detection 2023-2024

4.7 Naive Bayes

Naive Bayes algorithm is one of the machine learning approaches. The algorithm is mainly
works based on the likelihood logic. This algorithm is termed as one of the best classification
techniques; it is also famous for processing the independent features of data. It is a lazy learning
algorithm but also It can be worked on unbalanced data clusters. The algorithm calculates each
probability degree for a record and classifies it according to the highest probability value. The
algorithm is not able to predict for data which is in test data set and is not in training data set. This
situation is called “Zero frequency”. There are regularization techniques as Laplace estimation to
solve the problem in the literature.

The concept of this algorithm can be derived using the equation (3.2).

(𝐿|𝑆) = (𝑆) 𝑃 × (𝑆 ) (𝑆|𝐿) (3.2)

It works basically in such a way the next event can be decided based on the previous events.
In other words, this logic works on the basis of the previous experiment (from this point of fact, it is
accepted a learning algorithm). In order to perform the Naive Bayes algorithm on some real-life
problems, the first step is identifying the data set. Data set classes must be clearly seen and hence
class abstraction can be performed easily. The probability of observing some factor resulting or
producing an event is the main likelihood term that to be calculated from the Naive Bayes algorithm
which is termed by P(L|S). The other Bayes low particulars can be defined as the following:

P(S): is a prior probability that states as the likelihood of observing the event S independent of any
other thing.

P(L): is the probability of observing the factor L independent of any other factor of the event.

P(S|L): is called as posterior probability and represents the probability that observing the even S
producing the factor L.

The algorithm is mainly used to evaluate parameters as described above and perform the
multiplication and division of them to evaluate the required probability. The higher probability
value is always taken as a prediction result. In order to apply this concept to the dat aset, firstly;
data set classes should be visible and clearly identifies. The class frequency means evaluating the
number of times that every class is generated. So, the classes frequency table is the first important
step in Naive Bayes algorithm.

Dept of AI&DS, SIET 23


Credit card fraud detection 2023-2024

Naive Bayes is a classification algorithm that consider the being of a certain trait within a
class is unrelated to the being of any different feature, the main use of it is for clustering and
classifications, depending on the conditional probability .
The second model created by R is Naive Bayes, figure 4.7.2 shows the performance of the
model, it scored an accuracy of 97.77% and misclassified a total of 2,051 transactions, 33
fraudulent as non fraudulent and 2018 non fraudulent as fraudulent. There is a slight difference in
the accuracy of the Naive Bayes model created within Weka as its 97.73% and the
misclassification instances are 1,938.

Fig 3.7.1 Naive Bayes Algorithm flow diagram.

The implementation of Naive Bayes model showed a much lower level of accuracy. The
performance of the model has been shown in Fig 4.7.3. The model scored an accuracy of 97.77%
and misclassified a total of 2,051 transactions, 33 fraudulent as non fraudulent and 2018 non
fraudulent as fraudulent. There is a slight difference in the accuracy of the Naive Bayes model

Dept of AI&DS, SIET 24


Credit card fraud detection 2023-2024

created within Weka as its 97.73% and the misclassification instances are 1,938.

Fig 4.7.2 - Weka Naive Bayes

Fig 4.7.3 - RStudio Naive Bayes

Dept of AI&DS, SIET 25


Credit card fraud detection 2023-2024

4.8 Logistic Regression

Logistic regression is a statistical method used for classification tasks where the goal is to
predict the probability of an instance belonging to a particular category. Unlike linear regression,
which predicts continuous values, logistic regression outputs probabilities ranging from 0 to 1. It
employs a sigmoid function to map linear combinations of input features to these probabilities.
By setting a threshold (typically 0.5), these probabilities can be converted into binary
predictions. Logistic regression is widely used in various domains, such as finance, healthcare,
and marketing, to model the relationship between predictors and categorical outcomes. Its
simplicity, interpretability, and efficiency make it a valuable tool in the machine learning arsenal.

The logistic function is defined as:

σ(x)=1/1+e^-x

where e is the base of the natural logarithm. Logistic regression estimates the parameters of
a logistic model, which can be used to determine the relationship between the independent
variables and the log-odds of the dependent variable. The model's parameters are typically
estimated using maximum likelihood estimation. In practice, logistic regression is widely used in
various fields such as medicine, finance, and social sciences for predicting binary outcomes.

Dept of AI&DS, SIET 26


Credit card fraud detection 2023-2024

Fig 4.8.1. Logistic Regression algorithm flow diagram.

The last model created using both R and Weka is Logistic Regression, the model managed
to score and accuracy of 99.92% in R in figure 4.8.3 with 70 misclassified instances, while it
scored 99.91% in Weka with 77 misclassified instances as presented in figure 4.8.2.

Dept of AI&DS, SIET 27


Credit card fraud detection 2023-2024

Fig 4.8.2 - Weka Logistic Regression

Fig 4.8.3 - RStudio


Logistic Regression

Dept of AI&DS, SIET 28


Credit card fraud detection 2023-2024

4.9 Support Vector Machine

Support Vector machine is a supervised ML technique with connected learning algorithms


which inspect data used for both classification and regression analyses, it also performs linear
classification, additionally to non-linear classification by creating margins between the classes,
which are created in such a fashion that the space between the margin and the classes is
maximum which minimizes the error of the classification.

The aim of a support vector machine algorithm is to find the best possible line, or decision
boundary, that separates the data points of different data classes. This boundary is called
a hyperplane when working in high-dimensional feature spaces. The idea is to maximize the
margin, which is the distance between the hyperplane and the closest data points of each
category, thus making it easy to distinguish data classes.

Fig 4.9.1 Support vector machine

Dept of AI&DS, SIET 29


Credit card fraud detection 2023-2024

The model Support Vector Machine as show in figure 4.9.2 managed to score
99.94% for the accuracy and misclassified 51 instances.

Fig 4.9.2 - Support Vector Machine

Dept of AI&DS, SIET 30


Credit card fraud detection 2023-2024

4.10 Evaluation and Deployment

In order to ensure the proper and appropriate usage of the model, accuracy needs to
be computed. Accuracy represents the overall number of instances that are predicted
correctly, accuracies are represented by confusion matrix where it showed the True
Positive (TP), True Negative (TN), False Positive (FP) and False Negative (FN). True
Positive represents the transactions that are fraudulent and was correctly classified by the
model as fraudulent. True Negative represents the not fraudulent transactions that were
correctly predicted by the model as Not fraudulent. The third rating is False positive which
represents the transaction that are fraudulent but was misclassified as not fraudulent. False
Negative represents non fraudulent transactions that were classified as fraudulent. The
confusion matrix representing these parameters has been illustrated in Table – 1.

Actual/Predicted Positive Negative

Positive TP FN

Negative FP TN

Table 1 - Confusion Matrix

𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑇𝑃 + 𝑇𝑁
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 +
𝐹𝑁

Dept of AI&DS, SIET 31


Credit card fraud detection 2023-2024

Model Accuracy

K=3

K=3 99.89%

K=7
KNN
K=7 99.88%

Naive Bayes

Naive Bayes Naive Bayes 97.76%

Logistic Regression

Logistic Regression Logistic Regression 99.92%

Support Vector Machine SVM 99.94%

Table 2 - Table of Accuracies

The last stage of the CRISP-DM (Cross industry standard process for data mining)
model deals with the evaluation and deployment stage. Table 2 shows the accuracies of all
the models that were created in the project, all models performed well in detecting
fraudulent transactions and managed to score high accuracies. It was observed that Support
Vector Machine marginally exceeded in accuracies as compared to other models. The
Naïve Bayes model was found to detect with the comparatively lower level of accuracy
which is also a sizable score of 97.76%.

Dept of AI&DS, SIET 32


Credit card fraud detection 2023-2024

Chapter 5: Conclusion

5.1 Conclusion
A number of models were designed to identify frauds in transactions involving credit
cards. It was observed that apart from Naïve Bayes, the other three models were able to detect
fraudulent transactions with an accuracy level of more than 99 %. Among them, SVM technique
was found to perform a shade better than other models as the accuracy of detection was found to
be 99.94 %. The number of misclassified instances was found to be a paltry 51 among a set of
2051 transactions. The high levels of accuracy in each of the models indicate that these models
could be explored for different kinds of applications covering a wide range of domains.

5.2 Future work

Future work in credit card fraud detection using machine learning holds several exciting
opportunities for advancement. One promising area is the incorporation of heterogeneous data
sources to build more comprehensive models. For example, integrating contextual information
about transactions, such as geographic location and merchant details, along with behavioral data
like spending habits and device usage patterns, could improve the detection of sophisticated
fraud tactics. Additionally, experimenting with state-of-the-art machine learning techniques, such
as deep learning architectures and reinforcement learning, might enhance the system's ability to
recognize subtle and evolving fraud patterns.

Another critical aspect is improving the model's interpretability and reducing false
positives. Techniques such as explainable AI (XAI) could make it easier for practitioners to
understand and trust the model's predictions, while adaptive algorithms and feedback loops can
help fine-tune models in response to new fraud trends and minimize disruptions to legitimate
transactions. Real-time fraud detection is another area of focus, with research aimed at optimizing
the speed and efficiency of processing transactions without sacrificing accuracy.

Exploring the scalability of these models to handle large volumes of data and diverse

Dept of AI&DS, SIET 33


Credit card fraud detection 2023-2024

transaction types is also essential. Collaborations with financial institutions and other
stakeholders can facilitate access to extensive and varied datasets, aiding in the development and
validation of more robust models. Furthermore, addressing ethical and privacy concerns, such as
ensuring data security and transparency in model decisions, will be crucial for maintaining
consumer trust and compliance with regulations.

Overall, future work should aim to enhance the effectiveness, efficiency, and fairness of
credit card fraud detection systems, paving the way for more secure and reliable financial
transactions.

Dept of AI&DS, SIET 34


Credit card fraud detection 2023-2024

5.3: Appendix A: Source Code

import pandas as pd
import numpy as np

import seaborn as sns


import matplotlib.pyplot as plt
%matplotlib inline

import sys
import scipy
import sklearn

import warnings
warnings.filterwarnings('ignore')
sns.set(style="whitegrid")

import tensorflow as tf

from sklearn.model_selection import train_test_split


from sklearn.preprocessing import StandardScaler

data = pd.read_csv("creditcard.csv")
data.head(5)

X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=99)
X_train.shape,X_test.shape,y_train.shape,y_test.shape

scaler=StandardScaler()
X_train=scaler.fit_transform(X_train)
X_test=scaler.fit_transform(X_test)

y_train=y_train.to_numpy()
y_test=y_test.to_numpy()

from tensorflow.keras.models import Sequential


from tensorflow.keras.layers import Dense,Flatten,Conv1D,BatchNormalization,Dropout
model.summary()

Dept of AI&DS, SIET 35


Credit card fraud detection 2023-2024

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])


history=model.fit(X_train,y_train,epochs=20,validation_data=(X_test,y_test))

def plot_learningcurve(history,epochs):
epoch=range(1,epochs+1)

plt.plot(epoch, history.history['accuracy'])
plt.plot(epoch, history.history['val_accuracy'])
plt.title('Model accuracy')
plt.xlabel('epoch')
plt.ylabel('accuracy')
plt.legend(['train','val'], loc='upper left')
plt.show()

plt.plot(epoch, history.history['loss'])
plt.plot(epoch, history.history['val_loss'])
plt.title('Model loss')
plt.xlabel('epoch')
plt.ylabel('loss')
plt.legend(['train','val'], loc='upper left')
plt.show()

sns.kdeplot(data.Amount[data.Class == 0], label = 'Fraud', shade=True)


sns.kdeplot(data.Amount[data.Class == 1], label = 'NonFraud', shade=True)
plt.xlabel('Amount');

sns.kdeplot(data.Time[data.Class == 0], label = 'Fraud', shade=True)


sns.kdeplot(data.Time[data.Class == 1], label = 'NonFraud', shade=True)
plt.xlabel('Time')

columns = data.columns.tolist()
columns = [c for c in columns if c not in ['Class']]
target = 'Class'
X = data[columns]
Y = data[target]
print(X.shape)
print(Y.shape)

Dept of AI&DS, SIET 36


Credit card fraud detection 2023-2024

from sklearn.metrics import classification_report, accuracy_score


from sklearn.ensemble import IsolationForest
from sklearn.neighbors import LocalOutlierFactor

fraud = data[data['Class'] == 1]
valid = data[data['Class'] == 0]

outlier_fraction = len(fraud) / float(len(valid))


print(outlier_fraction)

print('Fraud Cases: {}'.format(len(fraud)))


print('Valid Cases: {}'.format(len(valid)))

n_outliers = len(fraud)

for i, (clf_name, clf) in enumerate(classifiers.items()):

# fit the data and tag outliers


if clf_name == 'Local Outlier Factor':
y_pred = clf.fit_predict(X)
scores_pred = clf.negative_outlier_factor_
else:
clf.fit(X)
scores_pred = clf.decision_function(X)
y_pred = clf.predict(X)

y_pred[y_pred == 1] = 0
y_pred[y_pred == -1] = 1

n_errors = (y_pred != Y).sum()

print('{}: {}'.format(clf_name, n_errors))


print(accuracy_score(Y, y_pred))

Dept of AI&DS, SIET 37


Credit card fraud detection 2023-2024

5.4:Appendix B : Data Set of Credit card fraud detection

Time V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
0 -1.35981 -0.07278 2.536347 1.378155 -0.33832 0.462388 0.239599 0.098698 0.363787 0.090794
0 1.191857 0.266151 0.16648 0.448154 0.060018 -0.08236 -0.0788 0.085102 -0.25543 -0.16697
1 -1.35835 -1.34016 1.773209 0.37978 -0.5032 1.800499 0.791461 0.247676 -1.51465 0.207643
1 -0.96627 -0.18523 1.792993 -0.86329 -0.01031 1.247203 0.237609 0.377436 -1.38702 -0.05495
2 -1.15823 0.877737 1.548718 0.403034 -0.40719 0.095921 0.592941 -0.27053 0.817739 0.753074
2 -0.42597 0.960523 1.141109 -0.16825 0.420987 -0.02973 0.476201 0.260314 -0.56867 -0.37141
4 1.229658 0.141004 0.045371 1.202613 0.191881 0.272708 -0.00516 0.081213 0.46496 -0.09925
7 -0.64427 1.417964 1.07438 -0.4922 0.948934 0.428118 1.120631 -3.80786 0.615375 1.249376
7 -0.89429 0.286157 -0.11319 -0.27153 2.669599 3.721818 0.370145 0.851084 -0.39205 -0.41043
9 -0.33826 1.119593 1.044367 -0.22219 0.499361 -0.24676 0.651583 0.069539 -0.73673 -0.36685
10 1.449044 -1.17634 0.91386 -1.37567 -1.97138 -0.62915 -1.42324 0.048456 -1.72041 1.626659
10 0.384978 0.616109 -0.8743 -0.09402 2.924584 3.317027 0.470455 0.538247 -0.55889 0.309755
10 1.249999 -1.22164 0.38393 -1.2349 -1.48542 -0.75323 -0.6894 -0.22749 -2.09401 1.323729
11 1.069374 0.287722 0.828613 2.71252 -0.1784 0.337544 -0.09672 0.115982 -0.22108 0.46023
12 -2.79185 -0.32777 1.64175 1.767473 -0.13659 0.807596 -0.42291 -1.90711 0.755713 1.151087
12 -0.75242 0.345485 2.057323 -1.46864 -1.15839 -0.07785 -0.60858 0.003603 -0.43617 0.747731
12 1.103215 -0.0403 1.267332 1.289091 -0.736 0.288069 -0.58606 0.18938 0.782333 -0.26798
13 -0.43691 0.918966 0.924591 -0.72722 0.915679 -0.12787 0.707642 0.087962 -0.66527 -0.73798
14 -5.40126 -5.45015 1.186305 1.736239 3.049106 -1.76341 -1.55974 0.160842 1.23309 0.345173
15 1.492936 -1.02935 0.454795 -1.43803 -1.55543 -0.72096 -1.08066 -0.05313 -1.97868 1.638076
16 0.694885 -1.36182 1.029221 0.834159 -1.19121 1.309109 -0.87859 0.44529 -0.4462 0.568521
17 0.962496 0.328461 -0.17148 2.109204 1.129566 1.696038 0.107712 0.521502 -1.19131 0.724396
18 1.166616 0.50212 -0.0673 2.261569 0.428804 0.089474 0.241147 0.138082 -0.98916 0.922175
18 0.247491 0.277666 1.185471 -0.0926 -1.31439 -0.15012 -0.94636 -1.61794 1.544071 -0.82988
22 -1.94653 -0.0449 -0.40557 -1.01306 2.941968 2.955053 -0.06306 0.855546 0.049967 0.573743
22 -2.07429 -0.12148 1.322021 0.410008 0.295198 -0.95954 0.543985 -0.10463 0.475664 0.149451
23 1.173285 0.353498 0.283905 1.133563 -0.17258 -0.91605 0.369025 -0.32726 -0.24665 -0.04614
23 1.322707 -0.17404 0.434555 0.576038 -0.83676 -0.83108 -0.2649 -0.22098 -1.07142 0.868559
23 -0.41429 0.905437 1.727453 1.473471 0.007443 -0.20033 0.740228 -0.02925 -0.59339 -0.34619
23 1.059387 -0.17532 1.26613 1.18611 -0.786 0.578435 -0.76708 0.401046 0.6995 -0.06474
24 1.237429 0.061043 0.380526 0.761564 -0.35977 -0.49408 0.006494 -0.13386 0.43881 -0.20736
25 1.114009 0.085546 0.493702 1.33576 -0.30019 -0.01075 -0.11876 0.188617 0.205687 0.082262
26 -0.52991 0.873892 1.347247 0.145457 0.414209 0.100223 0.711206 0.176066 -0.28672 -0.48469

Dept of AI&DS, SIET 38


Credit card fraud detection 2023-2024

V19 V20 V21 V22 V23 V24 V25 V26 V27 V28 Amount
0.403993 0.251412 -0.01831 0.277838 -0.11047 0.066928 0.128539 -0.18911 0.133558 -0.02105 149.62
-0.14578 -0.06908 -0.22578 -0.63867 0.101288 -0.33985 0.16717 0.125895 -0.00898 0.014724 2.69
-2.26186 0.52498 0.247998 0.771679 0.909412 -0.68928 -0.32764 -0.1391 -0.05535 -0.05975 378.66
-1.23262 -0.20804 -0.1083 0.005274 -0.19032 -1.17558 0.647376 -0.22193 0.062723 0.061458 123.5
0.803487 0.408542 -0.00943 0.798278 -0.13746 0.141267 -0.20601 0.502292 0.219422 0.215153 69.99
-0.03319 0.084968 -0.20825 -0.55982 -0.0264 -0.37143 -0.23279 0.105915 0.253844 0.08108 3.67
-0.04558 -0.21963 -0.16772 -0.27071 -0.1541 -0.78006 0.750137 -0.25724 0.034507 0.005168 4.99
0.324505 -0.15674 1.943465 -1.01545 0.057504 -0.64971 -0.41527 -0.05163 -1.20692 -1.08534 40.8
0.570328 0.052736 -0.07343 -0.26809 -0.20423 1.011592 0.373205 -0.38416 0.011747 0.142404 93.2
0.451773 0.203711 -0.24691 -0.63375 -0.12079 -0.38505 -0.06973 0.094199 0.246219 0.083076 3.68
-0.22137 -0.38723 -0.0093 0.313894 0.02774 0.500512 0.251367 -0.12948 0.04285 0.016253 7.8
0.707664 0.125992 0.049924 0.238422 0.00913 0.99671 -0.76731 -0.49221 0.042472 -0.05434 9.99
-0.68319 -0.10276 -0.23181 -0.48329 0.084668 0.392831 0.161135 -0.35499 0.026416 0.042422 121.5
-0.98292 -0.1532 -0.03688 0.074412 -0.07141 0.104744 0.548265 0.104094 0.021491 0.021293 27.5
2.221868 -1.58212 1.151663 0.222182 1.020586 0.028317 -0.23275 -0.23556 -0.16478 -0.03015 58.8
0.432535 0.263451 0.499625 1.35365 -0.25657 -0.06508 -0.03912 -0.08709 -0.181 0.129394 15.99
-0.57568 -0.11391 -0.02461 0.196002 0.013802 0.103758 0.364298 -0.38226 0.092809 0.037051 12.99
0.025436 -0.04702 -0.1948 -0.67264 -0.15686 -0.88839 -0.34241 -0.04903 0.079692 0.131024 0.89
-0.40687 -2.19685 -0.5036 0.98446 2.458589 0.042119 -0.48163 -0.62127 0.392053 0.949594 46.8
0.05423 -0.38791 -0.17765 -0.17507 0.040002 0.295814 0.332931 -0.22038 0.022298 0.007602 5
-1.30041 -0.13833 -0.29558 -0.57196 -0.05088 -0.30421 0.072001 -0.42223 0.086553 0.063499 231.71
-2.02761 -0.26932 0.143997 0.402492 -0.04851 -1.37187 0.390814 0.199964 0.016371 -0.01461 34.09
-0.8166 -0.30717 0.018702 -0.06197 -0.10385 -0.37042 0.6032 0.108556 -0.04052 -0.01142 2.28
2.177807 -0.23098 1.65018 0.200454 -0.18535 0.423073 0.820591 -0.22763 0.336634 0.250475 22.75
0.488603 -0.21672 -0.57953 -0.79923 0.8703 0.983421 0.321201 0.14965 0.707519 0.0146 0.89
0.505751 -0.38669 -0.40364 -0.2274 0.742435 0.398535 0.249212 0.274404 0.359969 0.243232 26.43
-0.39093 0.027878 0.067003 0.227812 -0.15049 0.435045 0.724825 -0.33708 0.016368 0.030041 41.88
-1.24062 -0.52295 -0.28438 -0.32336 -0.03771 0.347151 0.559639 -0.28016 0.042335 0.028822 16
0.543969 0.097308 0.077237 0.457331 -0.0385 0.642522 -0.18389 -0.27746 0.182687 0.152665 33
-0.27783 -0.17802 0.013676 0.213734 0.014462 0.002951 0.294638 -0.39507 0.081461 0.02422 12.99
0.348416 -0.06635 -0.24568 -0.5309 -0.04427 0.079168 0.509136 0.288858 -0.0227 0.011836 17.28
-0.14571 -0.27383 -0.05323 -0.00476 -0.03147 0.198054 0.565007 -0.33772 0.029057 0.004453 4.45
-0.82337 -0.29035 0.046949 0.208105 -0.18555 0.001031 0.098816 -0.5529 -0.07329 0.023307 6.14

Chapter 6: References

Dept of AI&DS, SIET 39


Credit card fraud detection 2023-2024

• S. N. Kalid, K. -C. Khor, K. -H. Ng and G. -K. Tong, "Detecting Frauds and Payment
Defaults on Credit Card Data Inherited with Imbalanced Class Distribution and Overlapping
Class Problems: A Systematic Review," IEEE Access, Vol. 12 2024 pp. 23636-23652.
• F. A. Ghaleb, F. Saeed, M. Al-Sarem, S. N. Qasem and T. Al-Hadhrami, "Ensemble
Synthesized Minority Oversampling-Based Generative Adversarial Networks and Random
Forest Algorithm for Credit Card Fraud Detection," IEEE Access, Vol. 11 2023 pp. 89694-
89710.
• F. K. Alarfaj, I. Malik, H. U. Khan, N. Almusallam, M. Ramzan and M. Ahmed, "Credit
Card Fraud Detection Using State-of-the-Art Machine Learning and Deep Learning
Algorithms," IEEE Access, Vol. 10 2022 pp. 39700-39715.

Dept of AI&DS, SIET 40


Credit card fraud detection 2023-2024

Dept of AI&DS, SIET 41

You might also like