SYNOPIS
SYNOPIS
BELAGAVI KARNATAKA
A Project Synopsis
On
Submitted by
ABHILASHA H R - 4CA20CS001.
VAIDESHWARI N – 4CA20CS046.
PROF. AKSHATHA T M
Head of the Department, Dept. of CSE,
CIT, MANDYA
2022-2023
BRIEF ABSTRACT.
As technology advanced and e-commerce services expanded, credit cards became one of the most
popular payment methods, resulting in an increase in the volume of banking transactions.
Furthermore, the significant increase in fraud requires high banking transaction costs. As a result,
detecting fraudulent activities has become a fascinating topic.
In this study, we consider the use of class weight-tuning hyperparameters to control the weight of
fraudulent and legitimate transactions. We use Bayesian optimization in particular to optimize the
hyperparameters while preserving practical issues such as unbalanced data. We propose weight-
tuning as a pre-process for unbalanced data, as well as CatBoost and XGBoost to improve the
performance of the LightGBM method by accounting for the voting mechanism. Finally, in order
to improve performance even further, we use deep learning to fine-tune the hyperparameters,
particularly our proposed weight-tuning one. We perform some experiments on real-world data to
test the proposed methods.
To better cover unbalanced datasets, we use recall-precision metrics in addition to the standard
ROC-AUC. CatBoost, LightGBM, and XGBoost are evaluated separately using a 5-fold cross-
validation method. Furthermore, the majority voting ensemble learning method is used to assess
the performance of the combined algorithms. LightGBM and XGBoost achieve the best level
criteria of ROC-AUC = 0.95, precision 0.79, recall 0.80, F1 score 0.79, and MCC 0.79, according
to the results. By using deep learning and the Bayesian optimization method to tune the
hyperparameters, we also meet the ROC-AUC = 0.94, precision = 0.80, recall = 0.82, F1 score =
0.81, and MCC = 0.81. This is a significant improvement over the cutting-edge methods we
compared it to.
INTRODUCTION.
In recent years, there has been a significant increase in the volume of financial transactions due to
the expansion of financial institutions and the popularity of web-based e-commerce. Fraudulent
transactions have become a growing problem in online banking, and fraud detection has always
been challenging. Along with credit card development, the pattern of credit card fraud has always
been updated. Fraudsters do their best to make it look legitimate, and credit card fraud has always
been updated. Fraudsters do their best to make it look legitimate. They try to learn how fraud
detection systems work and continue to stimulate these systems, making fraud detection more
complicated. Therefore, researchers are constantly trying to find new ways or improve the
performance of the existing methods.
In this paper, we propose an efficient approach for detecting credit card fraud that has been
evaluated on publicly available datasets and has used optimised algorithms LightGBM, XGBoost,
CatBoost, and logistic regression individually, as well as majority voting combined methods, as
well as deep learning and hyperparameter settings. An ideal fraud detection system should detect
more fraudulent cases, and the precision of detecting fraudulent cases should be high, i.e., all
results should be correctly detected, which will lead to the trust of customers in the bank, and on
the other hand, the bank will not suffer losses due to incorrect detection.
• We adopt Bayesian optimization for fraud detection and propose to use the weight-tuning
hyperparameter to solve the unbalanced data issue as a pre-process step. We also suggest
using CatBoost and XGBoost alongside LightGBM to improve performance. We use the
XGBoost algorithm due to the high speed of training in big data as well as the regularization
term, which overcomes overfitting by measuring the complexity of the tree, and it does not
require much time to set the hyperparameters. We also use the Catboost algorithm because
there is no need to adjust hyperparameters for overfitting control, and it also obtains good
results without changing hyperparameters compared to other machine learning algorithms.
• We propose a majority-voting ensemble learning approach to combine CatBoost, XGBoost,
and LightGBM and review the effect of the combined methods on the performance of fraud
detection on real, unbalanced data. We also propose to use deep learning for adjusting and
fine-tuning the hyperparameters.
• To evaluate the performance of the proposed methods, we perform extensive experiments
on real-world data. To better cover the unbalanced datasets, we use recall precision in
addition to the typically used ROC-AUC. We also evaluate the performance using F1_score
and MCC metrics. According to the results, the proposed methods outperform the existing
and based methods.
LITERATURE SURVEY.
OVERVIEW :
A literature survey or a literature review in a project report shows the various analyses and research
made in the field of interest and the results already published, taking into account the various
parameters of the project and the extent of the project. Literature survey is mainly carried out in
order to analyze the background of the current project which helps to find out flaws in the existing
system & guides on which unsolved problems we can work out. So, the following topics not only
illustrate the background of the project but also uncover the problems and flaws which motivated
to propose solutions and work on this project.
A literature survey is a text of a scholarly paper, which includes the current knowledge including
substantive findings, as well as theoretical and methodological contributions to a particular topic.
Literature reviews use secondary sources, and do not report new or original experimental work.
Most often associated with academic-oriented literature, such as a thesis, dissertation or a peer-
reviewed journal article, a literature review usually precedes the methodology and results sectional
though this is not always the case. Literature reviews are also common in are search proposal or
prospectus (the document that is approved before a student formally begins a dissertation or thesis).
Its main goals are to situate the current study within the body of literature and to provide context
for the particular reader. Literature reviews are a basis for researching nearly every academic field.
demic field.
Literature survey describes about the existing work on the given project. It deals with the
problem associated with the existing system and also gives user a clear knowledge on how to deal
with the existing problems and how to provide solution to the existing problems
• Objectives of Literature Survey :
Before building our application, the following system is taken into consideration:
1. TITTLE : Credit Card Fraud Detection Using State-of-the-Art Machine Learning and
Deep Learning Algorithms
AUTHOR : Fawaz Khaled Alarfaj; Iqra Malik; Hikmat Ullah Khan; Naif Almusallam;
Year: 2022
Abstract: People can use credit cards for online transactions as it provides an efficient and easy-
to-use facility. With the increase in usage of credit cards, the capacity of credit card misuse has
also enhanced. Credit card frauds cause significant financial losses for both credit card holders and
financial companies. In this research study, the main aim is to detect such frauds, including the
accessibility of public data, high-class imbalance data, the changes in fraud nature, and high rates
of false alarm. The relevant literature presents many machines learning based approaches for credit
card detection, such as Extreme Learning Method, Decision Tree, Random Forest, Support Vector
Machine, Logistic Regression and XG Boost. However, due to low accuracy, there is still a need
to apply state of the art deep learning algorithms to reduce fraud losses. The main focus has been
to apply the recent development of deep learning algorithms for this purpose. Comparative analysis
of both machine learning and deep learning algorithms was performed to find efficient outcomes.
The detailed empirical analysis is carried out using the European card benchmark dataset for fraud
detection. A machine learning algorithm was first applied to the dataset, which improved
the accuracy of detection of the frauds to some extent. Later, three architectures based on
a convolutional neural network are applied to improve fraud detection performance. Further
addition of layers further increased the accuracy of detection. A comprehensive empirical analysis
has been carried out by applying variations in the number of hidden layers, epochs and applying
the latest models. The evaluation of research work shows the improved results achieved, such as
accuracy, f1-score, precision and AUC Curves having optimized values of 99.9%,85.71%,93%,
and 98%, respectively. The proposed model outperforms the state-of-the-art machine learning and
deep learning algorithms for credit card detection problems. In addition, we have performed
experiments by balancing the data and applying deep learning algorithms to minimize the false
negative rate. The proposed approaches can be implemented effectively for the real-world
detection of credit card fraud.
• METHODOLOGY USED :
Extreme Learning Method, Decision Tree, Random Forest, Support Vector Machine,
Logistic Regression and XG Boost.
• MERITS :
It shows the improved results achieved, such as accuracy, F1-score, precision and AUC
Curves.
• LIMITATIONS :
• It works for specific dataset.
• It is not suitable for real world data.
2. TITTLE : A Neural Network Ensemble With Feature Engineering for Improved Credit
Card Fraud Detection.
• METHODOLOGY USED :
support vector machine (SVM), multilayer perceptron (MLP), decision tree, traditional
AdaBoost, and LSTM.
• MERITS :
The experimental results show that the classifiers performed better when trained with the
resampled data, and the proposed LSTM ensemble outperformed the other algorithms by
obtaining a sensitivity and specificity of 0.996 and 0.998, respectively.
• LIMITATIONS :
• It works for specific dataset.
• It is not suitable for real world data.
3. TITTLE : Ensemble Synthesized Minority Oversampling-Based Generative
Adversarial Networks and Random Forest Algorithm for Credit Card Fraud Detection.
Abstract: The recent increase in credit card fraud is rapidly has caused huge monetary losses for
individuals and financial institutions. Most credit card frauds are conducted online by illegally
obtaining payment credentials through data breaches, phishing, or scamming. Many solutions have
been suggested to address the credit card fraud problem for online transactions. However, the high-
class imbalance is the major challenge that faces the existing solutions to construct an effective
detection model. Most of the existing techniques used for class imbalance overestimate the
distribution of the minority class, resulting in highly overlapped or noisy and unrepresentative
features, which cause either overfitting or imprecise learning. In this study, a credit card fraud
detection model (CCFDM) is proposed based on ensemble learning and a generative adversarial
network (GAN) assisted by Ensemble Synthesized Minority Oversampling techniques (ESMOTE-
GAN). Multiple subsets were extracted using under-sampling and SMOTE was applied to generate
less skewed sets to prevent the GAN from modeling the noise. These subsets were used to train
diverse sets of GAN models to generate the synthesized subsets. A set of Random Forest classifiers
was then trained based on the proposed ESMOTE-GAN technique. The probabilistic outputs of
the trained classifiers were combined using a weighted voting scheme for decision-making. The
results show that the proposed model achieved 1.9%, and 3.2% improvements in overall
performance and the detection rate, respectively, with a 0% false alarm rate. Due to the massive
number of transactions, even a tiny false positive rate can overwhelm the analysis team. Thus, the
proposed model has improved the detection performance and reduced the cost needed for manual
analysis.
• METHODOLOGY USED :
A set of Random Forest classifiers was then trained based on the proposed ESMOTE-GAN
technique.
MERITS :
The proposed model has improved the detection performance and reduced the cost needed
For manual analysis.
• LIMITATIONS :
• It works for specific dataset.
• It is not suitable for real world data.
• It consumes more time
Abstract: With the popularity of online transactions, credit card fraud incidents are occurring
more and more frequently, and adaptive enhancement (Adaboost) models are most often used in
credit card fraud detection, so how to improve the robustness of the traditional Adaboost algorithm
has become a hot issue. A large part of the reason for the poor robustness of the traditional
Adaboost algorithm is that the base classifier is selected in a way that is uniquely oriented to the
error rate. Therefore, this paper uses an adaptive hybrid weighted self-paced learning method to
improve the objective function of the Adaboost algorithm, thus changing the strategy of base
learner selection in the Adaboost algorithm, while the self-paced learning selected in this paper
The self-adaptive threshold finding algorithm selected in this paper can well mitigate the influence
of human experience on model training. This paper also selects a double-fault measure to calculate
the degree of diversity among base categories from the perspective of generalization error, adds
the influence coefficient of diversity to the weight calculation of weak learners, and gives the
optimal range of influence coefficients through experiments. Finally, the proposed improved
algorithm is applied to credit card fraud scenario, and the experiments are compared with several
effective Adaboost improvement algorithms, which show that the combined performance of the
proposed improved algorithm is better than other algorithms in terms of AUC value and F1 value.
• METHODOLOGY USED :
an adaptive hybrid weighted self-paced learning method to improve the objective function
of the Adaboost algorithm
• MERITS :
performance of the proposed improved algorithm is better than other algorithms in terms
of AUC value and F1 value
• LIMITATIONS :
• It works for specific dataset.
• It is not suitable for real world data.
PROPOSED SYSTEM :
The proposed framework for fraud detection is presented in Fig. 1. As this figure shows, we first
apply the desired pre-processing on the data and further divide the data into two sections: training
and testing, followed by performing Bayesian optimization on the training data to find the best
hyperparameters that lead to the improvement of the performance. We use the cross-validation
method to obtain performance comparison in an unbalanced set and then examine the algorithms
using different evaluation metrics, including accuracy, precision, recall, the Matthews correlation
coefficient (MCC), the F1-score, and AUC diagrams.
Select data
Preprocess
Apply
ADABoost
Stop
SYSTEM REQUIREMENTS
• SOFTWARE REQUIREMENTS :
• Operating system : Windows 7 b4-bit
• Coding Language : Python
• HARDWARE REQUIREMENTS :
• System : Pentium i3.
• Hard Disk : 120GB.
• Monitor : 15’’LED.
• Input Device : Keyboard , Mouse.
• Ram : 4GB.
FUNCTIONALITY REQUIREMEMTS :
This section describes the functional requirements of the system for those requirements which
are expressed in the natural language style.
• The program must be self-contained so that it can easily be moved from one Computer to
another. It is assumed that network connection will be available on the computer on which
the program resides.
• Capacity, scalability and availability.
The system shall achieve 100 per cent availability at all times.
CONCLUSION .
• In this project, we studied the credit card fraud detection problem in real unbalanced datasets.
We proposed a machine learning approach to improve the performance of fraud detection.
• We used a publicly available ‘‘credit card’’ dataset with 28 features and 0.17 percent of the
fraud data. We proposed two methods. In the proposed LightGBM, we used class weight tuning
to choose the proper hyperparameters. We used the common evaluation metrics, including
accuracy, precision, recall, F1-score, and AUC. Our experimental results showed that the
proposed LightGBM method improved the fraud detection cases by 50% and the F1-score by
20% compared with the recently presented method.
• We improve the performance of the algorithm with the help of the majority voting algorithm.
We also improved the criteria by using the deep learning method. The assurance of the results
of MCC for unbalanced data proved that, compared to other criteria of evaluation, it’s stronger.
In this paper, by combining the LightGBM and XGBoost methods, we obtained 0.79 and 0.81
for the deep learning method.
• Using hyper parameters to address data unbalance compared to sampling methods, in addition
to reducing memory and time needed to evaluate algorithms, also has better results.
REFERENCES.
[1] J. Nanduri, Y.-W. Liu, K. Yang, and Y. Jia, ‘‘Ecommerce fraud detection through fraud islands
and multi-layer machine learning model,’’ in Proc. Future Inf. Commun. Conf., in Advances in
Information and Communication. San Francisco, CA, USA: Springer, 2020, pp. 556–570.
[2] I. Matloob, S. A. Khan, R. Rukaiya, M. A. K. Khattak, and A. Munir, ‘‘A sequence mining-
based novel architecture for detecting fraudulent transactions in healthcare systems,’’ IEEE
Access, vol. 10, pp. 48447–48463, 2022.
[3] H. Feng, ‘‘Ensemble learning in credit card fraud detection using boosting methods,’’ in Proc.
2nd Int. Conf. Comput. Data Sci. (CDS), Jan. 2021, pp. 7–11.
[5] M. Puh and L. Brkić, ‘‘Detecting credit card fraud using selected machine learning
algorithms,’’ in Proc. 42nd Int. Conv. Inf. Commun. Technol., Electron. Microelectron. (MIPRO),
May 2019, pp. 1250–1255.
[6] K. Randhawa, C. K. Loo, M. Seera, C. P. Lim, and A. K. Nandi, ‘‘Credit card fraud detection
using AdaBoost and majority voting,’’ IEEE Access, vol. 6, pp. 14277–14284, 2018.
[8] E. F. Malik, K. W. Khaw, B. Belaton, W. P. Wong, and X. Chew, ‘‘Credit card fraud detection
using a new hybrid machine learning architecture,’’ Mathematics, vol. 10, no. 9, p. 1480, Apr.
2022.
[9] K. Gupta, K. Singh, G. V. Singh, M. Hassan, G. Himani, and U. Sharma, ‘‘Machine learning
based credit card fraud detection—A review,’’ in Proc. Int. Conf. Appl. Artif. Intell. Comput.
(ICAAIC), 2022, pp. 362–368.
[10] R. Almutairi, A. Godavarthi, A. R. Kotha, and E. Ceesay, ‘‘Analyzing credit card fraud
detection based on machine learning models,’’ in Proc. IEEE Int. IoT, Electron. Mechatronics
Conf. (IEMTRONICS), Jun. 2022, pp. 1–8.
[11] N. S. Halvaiee and M. K. Akbari, ‘‘A novel model for credit card fraud detection using
artificial immune systems,’’ Appl. Soft Comput., vol. 24, pp. 40–49, Nov. 2014.