0% found this document useful (0 votes)
1 views

Credit_Card_Fraud_Detection_Framework_A

This paper discusses a framework for credit card fraud detection using various machine learning algorithms, including K-Nearest Neighbor, Decision Trees, Support Vector Machine, Logistic Regression, Random Forest, and XGBoost. The study utilizes a dataset of 284,808 credit card transactions from an EU financial institution, revealing that KNN yields the highest accuracy and F1 score for fraud detection. The findings emphasize the importance of feature selection and dataset balancing in improving predictive performance.

Uploaded by

sholasamuel32
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Credit_Card_Fraud_Detection_Framework_A

This paper discusses a framework for credit card fraud detection using various machine learning algorithms, including K-Nearest Neighbor, Decision Trees, Support Vector Machine, Logistic Regression, Random Forest, and XGBoost. The study utilizes a dataset of 284,808 credit card transactions from an EU financial institution, revealing that KNN yields the highest accuracy and F1 score for fraud detection. The findings emphasize the importance of feature selection and dataset balancing in improving predictive performance.

Uploaded by

sholasamuel32
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

International Journal of Scientific Research in Science and Technology

Print ISSN: 2395-6011 | Online ISSN: 2395-602X (www.ijsrst.com)


doi : https://doi.org/10.32628/IJSRST207671

Credit Card Fraud Detection Framework – A Machine Learning


Perspective
Jasmin Parmar*1, Dr. Achyut C. Patel2, Dr. Mayur Savsani3
*1Saurashtra University, Rajkot, Gujarat, India
2SMT. M. T. Dhamsania College of Commerce, Rajkot, Gujarat, India

3Symbiosis Statistical Institute, Symbiosis International (Deemed University), Pune, Maharashtra, India

ABSTRACT

Article Info The short improvement withinside the E-Commerce enterprise has caused a
Volume 7, Issue 6 dramatic enlargement withinside the usage of credit score playing cards for on-
Page Number: 431-435 line buys and thusly they had been flooded with the fraud diagnosed with it.
Publication Issue : As of late, for banks has gotten extraordinarily tough for figuring out the fraud
November-December-2020 with inside the credit card framework. Machine getting to know assumes an
essential component in distinguishing credit card fraud withinside the
transactions. For foreseeing those transactions banks make use of specific
system getting to know methodologies, beyond data has been accrued and new
highlights are being applied for enhancing the prescient force. The exhibition
of possible threats identification in credit card instances is highly prompted
through the analysing technique at the informational collection, the dedication
of factors, and discovery strategies applied. This paper explores the presentation
of K-Nearest Neighbor, Decision Trees, Support Vector Machine (SVM),
Logistic Regression, Random Forest, and XGBoost for credit card fraud
detection. Dataset of credit card transactions is accrued from Kaggle and it
includes a sum of 2,84,808 credit card transactions of an EU financial
institution dataset. It depicts doubtful transactions as fraud & labels it "high-
quality class" and actual ones as the "poor class". The dataset is relatively
imbalanced, it has approximately 0.172% of fraud cases and the relaxations are
actual transactions. These methods are implemented for the dataset and work is
carried out in Python. The presentation of the methods is classed relying on the
accuracy and F1 rating and confusion matrix. Results display that every set of
rules may be used for credit card fraud detection with excessive precision. The
proposed version may be helpful for the invention of numerous anomalies.
Article History Keywords: Fraud detection, K-Nearest Neighbor (KNN), Decision Trees,
Accepted : 15 Dec 2020 Support Vector Machine (SVM), Logistic Regression, Random Forest, and
Published : 30 Dec 2020 XGBoost

Copyright: © the author(s), publisher and licensee Technoscience Academy. This is an open-access article distributed
under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-
commercial use, distribution, and reproduction in any medium, provided the original work is properly cited
431
Jasmin Parmar et al Int J Sci Res Sci & Technol. November-December-2020; 7 (6) : 431-435

I. INTRODUCTION Kosemani Temitayo Hafiz [4], they depict move


define of the fraud detection process. as an instance
With the growth and acceleration of e-commerce, information Acquisition, information pre-processing,
plastic cards have been used gigantically for online information evaluation and techniques or algorithms
shopping, resulting in a high level of cheats identified are in the element. Algorithms are K-nearest
with credit cards. Today is the digital age and the neighbour, random tree, AdaBoost and Logistic
need to distinguish fraudulent transactions of credit regression accuracy are ~96.90%, 94.30%, 57.70% and
cards are essential. 98.20% individually.

Fraud identification includes checking and Fraudulent physical games are inflicting sizeable
investigating the behaviour of customers to assess the misfortune, which roused professionals to find out a
differentiation or circumvention of disruptive solution that might understand and stop fraud. A few
behaviour. To successfully differentiate credit card strategies have simply been proposed and tried. Some
fraud detection, we have to go through basics of how of them are quick evaluated underneath.
a credit card is used, what are the types & usage area
of the credit card etc. III. PROPOSED TECHNIQUE

Algorithms may or may not separate fraudulent The proposed strategies are applied in this paper, for
transactions. If you discover blackmail, they must pass distinguishing the cheats in credit card framework.
record and information about false transactions. They The correlation is made for different gadget mastering
dissect the data set and characterize all transactions. set of rules, Decision Trees, Logistic Regression,
Support Vector Machine, Random Forest, and
II. LITERATURE REVIEW XGBoost to determine which set of rules offers fits
satisfactory and may be adjusted via way of means of
You Dai, et. al [2] truly describe Random forest set of credit card shippers for distinguishing The Figure1
rules relevant to detect frauds. Random forest area has indicates the constructing graph for speaking me to
sorted, as an instance, random tree primarily based the in standard framework structure.
random forest and CART based random forest area.
They depict in the element and their accuracy of The getting ready steps are mentioned in Table 1 to
more than 91% and 96% separately. The paper also differentiate the satisfactory set of rules for the given
concludes the second type is far better than the dataset.
number one sort. TABLE 1
PREPARING STEPS
Suman Arora [3] stated that several supervised
systems which study algorithms generally trail on 70+% Algorithm Steps
training and 30+% testing dataset. Random forest, Step 1 Importing the required packages into our
stacking classifier, XGB classifier, SVM, Decision tree, python environment
naïve Bayes and KNN algorithms reflect on Step 2 Importing the data
consideration on each other as an instance ~94.60%, Step 3 Processing the data to our needs and
~95.30%, ~94.60%, ~93.20%, ~90.90%, ~90.50% and Exploratory Data Analysis
94.30% respectively. Step 4 Feature Selection and Data Split

International Journal of Scientific Research in Science and Technology (www.ijsrst.com) | Volume 7 | Issue 6
432
Jasmin Parmar et al Int J Sci Res Sci & Technol. November-December-2020; 7 (6) : 431-435

Step 5 Building six types of classification models near statistics factor. Distance rule helps us to
Step 6 Evaluating the created classification represent the novel statistics factor into the arena via
models using the evaluation metrics way of means of contrasting its spotlights and that of
statistics concentrates on the K estimation. A diagram
Decision Tree is a computational device for type and depending on the validation blunders curve is
prediction. A tree incorporates inner nodes which depicted on the graph to perform the estimation of K.
represent a take a look at on an attribute, every It must be applied for almost predictions. We
department indicates a final result of every leaf node compute the fundamental elegance withinside the
(terminal node) holds a category label. It recursively vicinity of any new transaction and classify the
parcels a dataset making use of both profundities first transaction to have an area with that winning
grasping technique or breadth grasping technique and elegance.
forestalls whilst all of the factors were appointed a
particular for the parcel rule to be talented it must Logistic regression is possibly the maximum well-
isolate the statistics into bunches in which a solitary known type set of rules in gadget mastering. The
elegance prevails in every gathering. As such, the logistic regression version portrays connection among
satisfactory parcel may be the only wherein the signs that may be constant, binary, and categorical.
subsets do not cowl for instance they may be There are chances that a dependent variable can be
unmistakably disjoint to a maximum excessive sum. binary. Because of positive predictors, we foresee if
something will occur. We gauge the chance of getting
Support Vector Machine is a supervised mastering set an area with each elegance for a given association of
of rules wherein given a dataset it isolates them into predictors.
diverse lessons making use of a hyperplane. The goal
of SVM is to find out this hyperplane. There will Random forest area is a set of rules that may be
sever a hyperplane but we're resolved to discover a applied in each type and regression issues. It
perfect hyperplane. The focuses nearest to the incorporates of severing a decision tree. This set of
hyperplane withinside the diverse lessons are referred rules offers higher consequences whilst there may be
to as support vectors and the said support vectors are a better wide variety of decision trees withinside the
applied to expect the lessons of the latest statistics. To forest area and forestalling version to overfitting.
make it more clear, our gadget we feed supervised Every choice tree in decision tree offers some
facts for instance facts with effects without a doubt consequences. These consequences are blended to get
which are known. It acquires the behaviour of fraud extra particular and solid expectation.
and actual transactions and then it can organize new
transactions with admire to which elegance it has an IV. OUTCOME AND ANALYSIS
area.
To figure out the best algorithm is generally
K- Nearest Neighbor (KNN) is possibly the most used appropriate for the issue of distinguishing fraud
algorithm for each type and regression prescient instances, various measures for algorithm checking
issues. Its performance is based upon three factors: the has been utilized. Often utilized measurements for
space measurements, the space rule and the deciding the consequences of ML algorithms are
estimation of K. Distance measurements offer the Precision and F1 Score. The entirety of the referenced
degree to discover closest associates of any coming

International Journal of Scientific Research in Science and Technology (www.ijsrst.com) | Volume 7 | Issue 6
433
Jasmin Parmar et al Int J Sci Res Sci & Technol. November-December-2020; 7 (6) : 431-435

measurements can be determined from a Confusion TABLE 5


matrix. CONFUSION MATRIX FOR SVM
Actual Predicted
Since here, the test set comprises of 20+% of the 0 1
dataset, a total summation of samples is 56962. From 0 56855 6
the all above samples, 101 are fraud transactions, the 1 33 68
Decision Tree model accomplished:
RF model got the following outcomes (Table 6):
Accuracy score: 99.93% Accuracy score: 99.92%
F1 score: 81.05% F1 score: 77.27%
TABLE 2 TABLE 6
CONFUSION MATRIX FOR DT CONFUSION MATRIX FOR RF
Actual Predicted Actual Predicted
0 1 0 1
0 56849 12 0 56854 7
1 24 77 1 33 68

KNN model got the following outcomes (Table 3): XGBoost model got the following outcomes (Table
Accuracy score: 99.95% 7):
F1 score: 85.71% Accuracy score: 99.94%
TABLE 3 F1 score: 84.49%
CONFUSION MATRIX FOR KNN TABLE 7
Actual Predicted CONFUSION MATRIX FOR XGBoost
0 1 Actual Predicted
0 56854 7 0 1
1 21 81 0 56854 7
1 22 79
LR model got the following outcomes (Table 4):
Accuracy score: 99.91% As indicated by the accuracy score assessment metric,
F1 score: 73.56% the KNN model uncovers to be the most precise
TABLE 4 model and the Logistic relapse model to be the most
CONFUSION MATRIX FOR LR un-precise model. In any case, when we gather
Actual Predicted together the consequences of each model, it shows 99%
0 1 precise which is a generally excellent score.
0 56852 9
1 37 64 The positioning of the models is practically like the
past evaluation metric. On-premise of the F1 score
SVM model got the following outcomes (Table 5): evaluation metric, the KNN model grabs the primary
Accuracy score: 99.93% spot again and the Logistic regression model
F1 score: 77.71% remaining parts to be the most un-exact model.

International Journal of Scientific Research in Science and Technology (www.ijsrst.com) | Volume 7 | Issue 6
434
Jasmin Parmar et al Int J Sci Res Sci & Technol. November-December-2020; 7 (6) : 431-435

algorithms." Int J Comput Appl 182.44 (2019):


While contrasting the confusion matrix of the 8-12.
multitude of models, it tends to be seen that the K- [2] Dai, You, et al. "Online credit card fraud
Nearest Neighbors model has played out generally detection: A hybrid framework with big data
excellent employment of classifying the fraud technologies." 2016 IEEE
transactions from non-fraud transactions followed by Trustcom/BigDataSE/ISPA. IEEE, 2016.
the XGBoost model. So we can infer that the most [3] Arora, Suman, and Dharminder Kumar.
fitting model which can be utilized for our case is the "Selection of optimal credit card fraud detection
K-Nearest Neighbors model and the model which can models using a coefficient sum approach." 2017
be ignored is the Logistic relapse model. International Conference on Computing,
Communication and Automation (ICCCA).
V. CONCLUSION IEEE, 2017.
[4] Hafiz, Kosemani Temitayo, Shaun Aghili, and
Credit card fakes speak to an intense business issue. Pavol Zavarsky. "The use of predictive analytics
These frauds can prompt immense misfortunes, both technology to detect credit card fraud in
business what's more, individual. Therefore, Canada." 2016 11th Iberian Conference on
organizations contribute more and more cash in Information Systems and Technologies (CISTI).
growing ground-breaking thoughts and the roadmaps IEEE, 2016.
that will give us an edge to identify and minimise
frauds.
Cite this article as :
The fundamental objective of the given paper is to
think about different ML algorithms for identification Jasmin Parmar, Dr. Achyut C. Patel, Dr. Mayur
of fraudulent transactions. Later, the examination Savsani, "Credit Card Fraud Detection Framework - A
concluded that KNN gives the best outcomes for the Machine Learning Perspective", International Journal
given example and gives the exact classification of of Scientific Research in Science and Technology
whether transactions are fraud or not. The set up (IJSRST), Online ISSN : 2395-602X, Print ISSN : 2395-
utilizing various evaluation metrics, for example, 6011, Volume 7 Issue 6, pp. 431-435, November-
precision and F1 Score. Selection of features and December 2020. Available at
dataset balancing have demonstrated to be critical in doi : https://doi.org/10.32628/IJSRST207671
accomplishing critical outcomes. Journal URL : http://ijsrst.com/IJSRST207671

The future work should be contributed towards


finding out about resampling strategies that will
support us with decreasing skewness proportion of
the datasets and apply deep learning procedures.

VI. REFERENCES

[1] Naik, Heta, and Prashasti Kanikar. "Credit card


fraud detection based on machine learning

International Journal of Scientific Research in Science and Technology (www.ijsrst.com) | Volume 7 | Issue 6
435

You might also like