0% found this document useful (0 votes)
11 views19 pages

SYNOPIS

This project synopsis presents a study on fraud detection in banking data using machine learning techniques, specifically focusing on credit card fraud. The authors propose a weight-tuning hyperparameter approach and utilize algorithms like CatBoost, XGBoost, and LightGBM, achieving high performance metrics such as ROC-AUC = 0.95. The research emphasizes the importance of addressing unbalanced datasets and improving detection accuracy through advanced methodologies, including deep learning and Bayesian optimization.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views19 pages

SYNOPIS

This project synopsis presents a study on fraud detection in banking data using machine learning techniques, specifically focusing on credit card fraud. The authors propose a weight-tuning hyperparameter approach and utilize algorithms like CatBoost, XGBoost, and LightGBM, achieving high performance metrics such as ROC-AUC = 0.95. The research emphasizes the importance of addressing unbalanced datasets and improving detection accuracy through advanced methodologies, including deep learning and Bayesian optimization.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

VISVESVARAYA TECHNOLOGICAL UNIVERSITY,

BELAGAVI KARNATAKA

A Project Synopsis

On

FRAUD DETECTION IN BANKING DATA BY MACHINE


LEARNING TECHNIQUES
Submitted for partial fulfillment of the requirement for the award of the Bachelor degree in
Computer Science and Engineering during the year 2022-2023.

Submitted by

ABHILASHA H R - 4CA20CS001.

CHINNU CHAITANYA T S - 4CA20CS012.

KIRAN MAYE K P – 4CA20CS022.

VAIDESHWARI N – 4CA20CS046.

Under the Guidance of

PROF. AKSHATHA T M
Head of the Department, Dept. of CSE,

CIT, MANDYA

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


CAUVERY INSTITUTE OF TECHNOLOGY MANDYA

SIDDAIAHNAKOPPALU GATE, SUNDAHALLI, KARNATAKA 571401

2022-2023

BRIEF ABSTRACT.

As technology advanced and e-commerce services expanded, credit cards became one of the most
popular payment methods, resulting in an increase in the volume of banking transactions.
Furthermore, the significant increase in fraud requires high banking transaction costs. As a result,
detecting fraudulent activities has become a fascinating topic.

In this study, we consider the use of class weight-tuning hyperparameters to control the weight of
fraudulent and legitimate transactions. We use Bayesian optimization in particular to optimize the
hyperparameters while preserving practical issues such as unbalanced data. We propose weight-
tuning as a pre-process for unbalanced data, as well as CatBoost and XGBoost to improve the
performance of the LightGBM method by accounting for the voting mechanism. Finally, in order
to improve performance even further, we use deep learning to fine-tune the hyperparameters,
particularly our proposed weight-tuning one. We perform some experiments on real-world data to
test the proposed methods.

To better cover unbalanced datasets, we use recall-precision metrics in addition to the standard
ROC-AUC. CatBoost, LightGBM, and XGBoost are evaluated separately using a 5-fold cross-
validation method. Furthermore, the majority voting ensemble learning method is used to assess
the performance of the combined algorithms. LightGBM and XGBoost achieve the best level
criteria of ROC-AUC = 0.95, precision 0.79, recall 0.80, F1 score 0.79, and MCC 0.79, according
to the results. By using deep learning and the Bayesian optimization method to tune the
hyperparameters, we also meet the ROC-AUC = 0.94, precision = 0.80, recall = 0.82, F1 score =
0.81, and MCC = 0.81. This is a significant improvement over the cutting-edge methods we
compared it to.
INTRODUCTION.

In recent years, there has been a significant increase in the volume of financial transactions due to
the expansion of financial institutions and the popularity of web-based e-commerce. Fraudulent
transactions have become a growing problem in online banking, and fraud detection has always
been challenging. Along with credit card development, the pattern of credit card fraud has always
been updated. Fraudsters do their best to make it look legitimate, and credit card fraud has always
been updated. Fraudsters do their best to make it look legitimate. They try to learn how fraud
detection systems work and continue to stimulate these systems, making fraud detection more
complicated. Therefore, researchers are constantly trying to find new ways or improve the
performance of the existing methods.

In this paper, we propose an efficient approach for detecting credit card fraud that has been
evaluated on publicly available datasets and has used optimised algorithms LightGBM, XGBoost,
CatBoost, and logistic regression individually, as well as majority voting combined methods, as
well as deep learning and hyperparameter settings. An ideal fraud detection system should detect
more fraudulent cases, and the precision of detecting fraudulent cases should be high, i.e., all
results should be correctly detected, which will lead to the trust of customers in the bank, and on
the other hand, the bank will not suffer losses due to incorrect detection.

• The main contributions of this project are summarized as follows:

• We adopt Bayesian optimization for fraud detection and propose to use the weight-tuning
hyperparameter to solve the unbalanced data issue as a pre-process step. We also suggest
using CatBoost and XGBoost alongside LightGBM to improve performance. We use the
XGBoost algorithm due to the high speed of training in big data as well as the regularization
term, which overcomes overfitting by measuring the complexity of the tree, and it does not
require much time to set the hyperparameters. We also use the Catboost algorithm because
there is no need to adjust hyperparameters for overfitting control, and it also obtains good
results without changing hyperparameters compared to other machine learning algorithms.
• We propose a majority-voting ensemble learning approach to combine CatBoost, XGBoost,
and LightGBM and review the effect of the combined methods on the performance of fraud
detection on real, unbalanced data. We also propose to use deep learning for adjusting and
fine-tuning the hyperparameters.
• To evaluate the performance of the proposed methods, we perform extensive experiments
on real-world data. To better cover the unbalanced datasets, we use recall precision in
addition to the typically used ROC-AUC. We also evaluate the performance using F1_score
and MCC metrics. According to the results, the proposed methods outperform the existing
and based methods.
LITERATURE SURVEY.

OVERVIEW :
A literature survey or a literature review in a project report shows the various analyses and research
made in the field of interest and the results already published, taking into account the various
parameters of the project and the extent of the project. Literature survey is mainly carried out in
order to analyze the background of the current project which helps to find out flaws in the existing
system & guides on which unsolved problems we can work out. So, the following topics not only
illustrate the background of the project but also uncover the problems and flaws which motivated
to propose solutions and work on this project.
A literature survey is a text of a scholarly paper, which includes the current knowledge including
substantive findings, as well as theoretical and methodological contributions to a particular topic.
Literature reviews use secondary sources, and do not report new or original experimental work.
Most often associated with academic-oriented literature, such as a thesis, dissertation or a peer-
reviewed journal article, a literature review usually precedes the methodology and results sectional
though this is not always the case. Literature reviews are also common in are search proposal or
prospectus (the document that is approved before a student formally begins a dissertation or thesis).
Its main goals are to situate the current study within the body of literature and to provide context
for the particular reader. Literature reviews are a basis for researching nearly every academic field.
demic field.

• A literature survey includes the following:


• Existing theories about the topic which are accepted universally.
• Books written on the topic, both generic and specific.
• Research done in the field usually in the order of oldest to latest.
• Challenges being faced and on-going work, if available.

Literature survey describes about the existing work on the given project. It deals with the
problem associated with the existing system and also gives user a clear knowledge on how to deal
with the existing problems and how to provide solution to the existing problems
• Objectives of Literature Survey :

• Learning the definitions of the concepts.


• Access to latest approaches, methods and theories.
• Discovering research topics based on the existing research
• Concentrate on your own field of expertise– Even if another field uses the same words,
they usually mean completely.
• It improves the quality of the literature survey to exclude sidetracks– Remember to
explicate what is excluded.

Before building our application, the following system is taken into consideration:

1. TITTLE : Credit Card Fraud Detection Using State-of-the-Art Machine Learning and
Deep Learning Algorithms

AUTHOR : Fawaz Khaled Alarfaj; Iqra Malik; Hikmat Ullah Khan; Naif Almusallam;
Year: 2022

Abstract: People can use credit cards for online transactions as it provides an efficient and easy-
to-use facility. With the increase in usage of credit cards, the capacity of credit card misuse has
also enhanced. Credit card frauds cause significant financial losses for both credit card holders and
financial companies. In this research study, the main aim is to detect such frauds, including the
accessibility of public data, high-class imbalance data, the changes in fraud nature, and high rates
of false alarm. The relevant literature presents many machines learning based approaches for credit
card detection, such as Extreme Learning Method, Decision Tree, Random Forest, Support Vector
Machine, Logistic Regression and XG Boost. However, due to low accuracy, there is still a need
to apply state of the art deep learning algorithms to reduce fraud losses. The main focus has been
to apply the recent development of deep learning algorithms for this purpose. Comparative analysis
of both machine learning and deep learning algorithms was performed to find efficient outcomes.
The detailed empirical analysis is carried out using the European card benchmark dataset for fraud
detection. A machine learning algorithm was first applied to the dataset, which improved
the accuracy of detection of the frauds to some extent. Later, three architectures based on
a convolutional neural network are applied to improve fraud detection performance. Further
addition of layers further increased the accuracy of detection. A comprehensive empirical analysis
has been carried out by applying variations in the number of hidden layers, epochs and applying
the latest models. The evaluation of research work shows the improved results achieved, such as
accuracy, f1-score, precision and AUC Curves having optimized values of 99.9%,85.71%,93%,
and 98%, respectively. The proposed model outperforms the state-of-the-art machine learning and
deep learning algorithms for credit card detection problems. In addition, we have performed
experiments by balancing the data and applying deep learning algorithms to minimize the false
negative rate. The proposed approaches can be implemented effectively for the real-world
detection of credit card fraud.

• METHODOLOGY USED :
Extreme Learning Method, Decision Tree, Random Forest, Support Vector Machine,
Logistic Regression and XG Boost.
• MERITS :
It shows the improved results achieved, such as accuracy, F1-score, precision and AUC
Curves.
• LIMITATIONS :
• It works for specific dataset.
• It is not suitable for real world data.

2. TITTLE : A Neural Network Ensemble With Feature Engineering for Improved Credit
Card Fraud Detection.

AUTHOR : Ebenezer Esenogho; Ibomoiye Domor Mienye; Theo G. Swart


Year: 2021

Abstract: Recent advancements in electronic commerce and communication systems have


significantly increased the use of credit cards for both online and regular transactions. However,
there has been a steady rise in fraudulent credit card transactions, costing financial companies huge
losses every year. The development of effective fraud detection algorithms is vital in minimizing
these losses, but it is challenging because most credit card datasets are highly imbalanced. Also,
using conventional machine learning algorithms for credit card fraud detection is inefficient due
to their design, which involves a static mapping of the input vector to output vectors. Therefore,
they cannot adapt to the dynamic shopping behavior of credit card clients. This paper proposes an
efficient approach to detect credit card fraud using a neural network ensemble classifier and a
hybrid data resampling method. The ensemble classifier is obtained using a long short-term
memory (LSTM) neural network as the base learner in the adaptive boosting (AdaBoost)
technique. Meanwhile, the hybrid resampling is achieved using the synthetic minority
oversampling technique and edited nearest neighbor (SMOTE-ENN) method. The effectiveness
of the proposed method is demonstrated using publicly available real-world credit card transaction
datasets. The performance of the proposed approach is benchmarked against the following
algorithms: support vector machine (SVM), multilayer perceptron (MLP), decision tree, traditional
AdaBoost, and LSTM. The experimental results show that the classifiers performed better when
trained with the resampled data, and the proposed LSTM ensemble outperformed the other
algorithms by obtaining a sensitivity and specificity of 0.996 and 0.998, respectively.

• METHODOLOGY USED :
support vector machine (SVM), multilayer perceptron (MLP), decision tree, traditional
AdaBoost, and LSTM.
• MERITS :
The experimental results show that the classifiers performed better when trained with the
resampled data, and the proposed LSTM ensemble outperformed the other algorithms by
obtaining a sensitivity and specificity of 0.996 and 0.998, respectively.
• LIMITATIONS :
• It works for specific dataset.
• It is not suitable for real world data.
3. TITTLE : Ensemble Synthesized Minority Oversampling-Based Generative
Adversarial Networks and Random Forest Algorithm for Credit Card Fraud Detection.

AUTHOR :Fruad A. Ghaleb; Faisal Saeed; Mohammed Al-Sarem; Sultan Noman


Qasem.
Year : 2023.

Abstract: The recent increase in credit card fraud is rapidly has caused huge monetary losses for
individuals and financial institutions. Most credit card frauds are conducted online by illegally
obtaining payment credentials through data breaches, phishing, or scamming. Many solutions have
been suggested to address the credit card fraud problem for online transactions. However, the high-
class imbalance is the major challenge that faces the existing solutions to construct an effective
detection model. Most of the existing techniques used for class imbalance overestimate the
distribution of the minority class, resulting in highly overlapped or noisy and unrepresentative
features, which cause either overfitting or imprecise learning. In this study, a credit card fraud
detection model (CCFDM) is proposed based on ensemble learning and a generative adversarial
network (GAN) assisted by Ensemble Synthesized Minority Oversampling techniques (ESMOTE-
GAN). Multiple subsets were extracted using under-sampling and SMOTE was applied to generate
less skewed sets to prevent the GAN from modeling the noise. These subsets were used to train
diverse sets of GAN models to generate the synthesized subsets. A set of Random Forest classifiers
was then trained based on the proposed ESMOTE-GAN technique. The probabilistic outputs of
the trained classifiers were combined using a weighted voting scheme for decision-making. The
results show that the proposed model achieved 1.9%, and 3.2% improvements in overall
performance and the detection rate, respectively, with a 0% false alarm rate. Due to the massive
number of transactions, even a tiny false positive rate can overwhelm the analysis team. Thus, the
proposed model has improved the detection performance and reduced the cost needed for manual
analysis.

• METHODOLOGY USED :
A set of Random Forest classifiers was then trained based on the proposed ESMOTE-GAN
technique.
MERITS :
The proposed model has improved the detection performance and reduced the cost needed
For manual analysis.
• LIMITATIONS :
• It works for specific dataset.
• It is not suitable for real world data.
• It consumes more time

4. TITTLE : AMWSPL Adaboost Credit Card Fraud Detection Method Based on


Enhanced Base Classifier Diversity.

AUTHOR : Wang Ning; Siliang Chen; Songyi Lei.


Year:2023.

Abstract: With the popularity of online transactions, credit card fraud incidents are occurring
more and more frequently, and adaptive enhancement (Adaboost) models are most often used in
credit card fraud detection, so how to improve the robustness of the traditional Adaboost algorithm
has become a hot issue. A large part of the reason for the poor robustness of the traditional
Adaboost algorithm is that the base classifier is selected in a way that is uniquely oriented to the
error rate. Therefore, this paper uses an adaptive hybrid weighted self-paced learning method to
improve the objective function of the Adaboost algorithm, thus changing the strategy of base
learner selection in the Adaboost algorithm, while the self-paced learning selected in this paper
The self-adaptive threshold finding algorithm selected in this paper can well mitigate the influence
of human experience on model training. This paper also selects a double-fault measure to calculate
the degree of diversity among base categories from the perspective of generalization error, adds
the influence coefficient of diversity to the weight calculation of weak learners, and gives the
optimal range of influence coefficients through experiments. Finally, the proposed improved
algorithm is applied to credit card fraud scenario, and the experiments are compared with several
effective Adaboost improvement algorithms, which show that the combined performance of the
proposed improved algorithm is better than other algorithms in terms of AUC value and F1 value.
• METHODOLOGY USED :
an adaptive hybrid weighted self-paced learning method to improve the objective function
of the Adaboost algorithm
• MERITS :
performance of the proposed improved algorithm is better than other algorithms in terms
of AUC value and F1 value
• LIMITATIONS :
• It works for specific dataset.
• It is not suitable for real world data.

EXISTING SYSTEM – GENERALIZE BASED ON LITERATURE


SURVEY .
For credit card fraud detection, Random Forest (RF), Support Vector Machine, (SVM) and
Logistic Regression (LOR) were examined. The data set consisted of one year transactions. Data
under-sampling was used to examine the algorithm performances, with RF demonstrating a better
performance as compared with SVM and LOR . An Artificial Immune Recognition System (AIRS)
for credit card fraud detection was proposed. AIRS is an improvement over the standard AIS
model, where negative selection was used to achieve higher precision. This resulted in an increase
of accuracy by 25% and reduced system response time by 40%. A credit card fraud detection
system was proposed, which consisted of a rule-based filter, Dumpster–Shafer adder, transaction
history database, and Bayesian learner. The Dempster–Shafer theory combined various evidential
information and created an initial belief, which was used to classify a transaction as normal,
suspicious, or abnormal. If a transaction was suspicious, the belief was further evaluated using
transaction history from Bayesian learning. Simulation results indicated a 98% true positive rate.
A modified Fisher Discriminant function was used for credit card fraud detection. The
modification made the traditional functions to become more sensitive to important instances. A
weighted average was utilized to calculate variances, which allowed learning of profitable
transactions. The results from the modified function confirm it can eventuate more profit.
Fraud detection and understanding spending patterns to uncover potential fraud cases was detailed.
It used the SOM to interpret, filter, and analyze fraud behaviors. Clustering was used to identify
hidden patterns in the input data. Then , filters were used to reduce the total cost and processing
time. By setting appropriate numbers of neurons and iteration steps, the SOM was able to converge
fast. There resulting model appeared to be an efficient and a cost-effective method.

➔ DISADVANTAGES OF EXISTING SYSTEM :


• Accuracy is less than 95%.
• Some of the deep learning approaches consumes more training time.

PROPOSED SYSTEM :
The proposed framework for fraud detection is presented in Fig. 1. As this figure shows, we first
apply the desired pre-processing on the data and further divide the data into two sections: training
and testing, followed by performing Bayesian optimization on the training data to find the best
hyperparameters that lead to the improvement of the performance. We use the cross-validation
method to obtain performance comparison in an unbalanced set and then examine the algorithms
using different evaluation metrics, including accuracy, precision, recall, the Matthews correlation
coefficient (MCC), the F1-score, and AUC diagrams.

Fig.1 Flow chart.

• FLOWCHART OF PROPOSED SYSTEM :


Start

Select data

Preprocess

Extract the features

Apply
ADABoost

Predict the Fraud


Transactions

Stop

SYSTEM REQUIREMENTS
• SOFTWARE REQUIREMENTS :
• Operating system : Windows 7 b4-bit
• Coding Language : Python

• HARDWARE REQUIREMENTS :
• System : Pentium i3.
• Hard Disk : 120GB.
• Monitor : 15’’LED.
• Input Device : Keyboard , Mouse.
• Ram : 4GB.

FUNCTIONALITY REQUIREMEMTS :

This section describes the functional requirements of the system for those requirements which
are expressed in the natural language style.

1. Create a Desktop application using Wxpython.


2. User should Load dataset.
3. System will preprocess and extract features.
4. System will train data using the AdaBoost Classifier.
5. Application should accurately predict the banking fraud Transaction.

NON FUNCTIONALITY REQUIREMENTS :


These are requirements that are not functional in nature, that is, these are constraints within which
the system must work.

• The program must be self-contained so that it can easily be moved from one Computer to
another. It is assumed that network connection will be available on the computer on which
the program resides.
• Capacity, scalability and availability.
The system shall achieve 100 per cent availability at all times.

The system shall be scalable to support additional clients and volunteers.


• MAINTAINABILITY :
The system should be optimized for supportability, or ease of maintenance as far as
possible. This may be achieved through the use documentation of coding standards, naming
conventions, class libraries and abstraction.

• RANDOMNESS, VERIFIABLITY AND LOAD BALANCING:


The system should be optimized for supportability, or ease of maintenance as far as
possible. This may be achieved through the use documentation of coding standards, naming
conventions, class libraries and abstraction. It should have randomness to check the nodes
and should be load balanced.

CONCLUSION .
• In this project, we studied the credit card fraud detection problem in real unbalanced datasets.
We proposed a machine learning approach to improve the performance of fraud detection.
• We used a publicly available ‘‘credit card’’ dataset with 28 features and 0.17 percent of the
fraud data. We proposed two methods. In the proposed LightGBM, we used class weight tuning
to choose the proper hyperparameters. We used the common evaluation metrics, including
accuracy, precision, recall, F1-score, and AUC. Our experimental results showed that the
proposed LightGBM method improved the fraud detection cases by 50% and the F1-score by
20% compared with the recently presented method.
• We improve the performance of the algorithm with the help of the majority voting algorithm.
We also improved the criteria by using the deep learning method. The assurance of the results
of MCC for unbalanced data proved that, compared to other criteria of evaluation, it’s stronger.
In this paper, by combining the LightGBM and XGBoost methods, we obtained 0.79 and 0.81
for the deep learning method.
• Using hyper parameters to address data unbalance compared to sampling methods, in addition
to reducing memory and time needed to evaluate algorithms, also has better results.

REFERENCES.
[1] J. Nanduri, Y.-W. Liu, K. Yang, and Y. Jia, ‘‘Ecommerce fraud detection through fraud islands
and multi-layer machine learning model,’’ in Proc. Future Inf. Commun. Conf., in Advances in
Information and Communication. San Francisco, CA, USA: Springer, 2020, pp. 556–570.

[2] I. Matloob, S. A. Khan, R. Rukaiya, M. A. K. Khattak, and A. Munir, ‘‘A sequence mining-
based novel architecture for detecting fraudulent transactions in healthcare systems,’’ IEEE
Access, vol. 10, pp. 48447–48463, 2022.

[3] H. Feng, ‘‘Ensemble learning in credit card fraud detection using boosting methods,’’ in Proc.
2nd Int. Conf. Comput. Data Sci. (CDS), Jan. 2021, pp. 7–11.

[4] M. S. Delgosha, N. Hajiheydari, and S. M. Fahimi, ‘‘Elucidation of big data analytics in


banking: A four-stage delphi study,’’ J. Enterprise Inf. Manage., vol. 34, no. 6, pp. 1577–1596,
Nov. 2021.

[5] M. Puh and L. Brkić, ‘‘Detecting credit card fraud using selected machine learning
algorithms,’’ in Proc. 42nd Int. Conv. Inf. Commun. Technol., Electron. Microelectron. (MIPRO),
May 2019, pp. 1250–1255.

[6] K. Randhawa, C. K. Loo, M. Seera, C. P. Lim, and A. K. Nandi, ‘‘Credit card fraud detection
using AdaBoost and majority voting,’’ IEEE Access, vol. 6, pp. 14277–14284, 2018.

[7] N. Kumaraswamy, M. K. Markey, T. Ekin, J. C. Barner, and K. Rascati, ‘‘Healthcare fraud


data mining methods: A look back and look ahead,’’ Perspectives Health Inf. Manag., vol. 19, no.
1, p. 1, 2022.

[8] E. F. Malik, K. W. Khaw, B. Belaton, W. P. Wong, and X. Chew, ‘‘Credit card fraud detection
using a new hybrid machine learning architecture,’’ Mathematics, vol. 10, no. 9, p. 1480, Apr.
2022.

[9] K. Gupta, K. Singh, G. V. Singh, M. Hassan, G. Himani, and U. Sharma, ‘‘Machine learning
based credit card fraud detection—A review,’’ in Proc. Int. Conf. Appl. Artif. Intell. Comput.
(ICAAIC), 2022, pp. 362–368.
[10] R. Almutairi, A. Godavarthi, A. R. Kotha, and E. Ceesay, ‘‘Analyzing credit card fraud
detection based on machine learning models,’’ in Proc. IEEE Int. IoT, Electron. Mechatronics
Conf. (IEMTRONICS), Jun. 2022, pp. 1–8.

[11] N. S. Halvaiee and M. K. Akbari, ‘‘A novel model for credit card fraud detection using
artificial immune systems,’’ Appl. Soft Comput., vol. 24, pp. 40–49, Nov. 2014.

[12] A. C. Bahnsen, D. Aouada, A. Stojanovic, and B. Ottersten, ‘‘Feature engineering strategies


for credit card fraud detection,’’ Expert Syst. Appl., vol. 51, pp. 134–142, Jun. 2016

You might also like