Implementation of Credit Card Fraud Detection Using Support Vector Machine
Implementation of Credit Card Fraud Detection Using Support Vector Machine
https://jespublication.com/ PageNo:163
Vol 12, Issue 06, June /2021
ISSN NO: 0377-9254
Now these days digital, statistics are very easily available throughout the world because of
digital online availability. All the information that also has a large volume, wide range,
frequency, as well as importance is stored from small to large organizations over the cloud
[2]. The whole information is available from massive amounts of sources such as followers
on social media, customer order behaviors, likes, and shares. White-collar crime is the ever-
increasing problem with-reaching consequences for the finance sector, business institutions
as well as governments. Fraud can indeed be described as illegal deceit to gain financial
benefit [3]. Enhanced card transactions had already appreciated a heavy emphasis on
communication technology. When credit card transactions are by far the most prevalent form
of transaction for offline and online payments, raising the rate of card fraud accelerates as
well. Machine learning is the innovation of this century that eliminates conventional
strategies and also can function on huge datasets [4] where humans can't immediately access.
Strategies of machine learning break within two important categories; supervised learning
versus unsupervised learning; Tracking of fraud can also be achieved any form and may only
be determined how to use as per the datasets. Supervised training includes anomalies to
always be identified as before.
Many supervised methods [5] are being used over the last few decades to identify credit card
fraud. The major obstacle in implementing ML for detecting fraud seems to be the presence
of extremely imbalanced databases. Most payments are legitimate in several available
evidence sets, with such an extremely small number of fraudulent ones. The significant
challenges to investigators are designing the accurate as well as efficient fraud prevention
framework that will be low on false positives but efficiently identifies fraud activity [6].
Throughout this study, we introduce an effective credit card fraud identification system with
a feedback system, centered on machine learning techniques. That feedback approach
contributes to boosting the classifier's detection rate and performance. Also analysis the
performance of different classification methods [7] including random forest, tree classifiers,
artificial neural networks, supporting vector machine, Naïve Baiyes, logistic regression
including gradient boosting classifier approaches, on even a highly skewed credit card fraud
database. This complete research paper is divided into different sections including;
introduction portion, related activities, credit card fraud obfuscation techniques for machine
learning, and the obstacles. Subsequently, the implementation for machine learning
techniques as well as the estimation and evaluation of different performance measurement
https://jespublication.com/ PageNo:164
Vol 12, Issue 06, June /2021
ISSN NO: 0377-9254
parameters are covered and then the findings of the entire research are covered and also
suggested further enhancements.
The major contributions of the paper as follows:
Preprocessing of the data has been performed, so errors in the data and malwares are
effectively removed.
The SVM method was implemented classification on public available dataset, the
results shows that the proposed SVM classification gives the better performance
compared to other approaches.
Rest of the paper is organized as follows; section 2 deals with the various literatures with
their drawbacks respectively. Section 3 deals with the detailed analysis of the proposed
method with its operation. Section 4 deals with the analysis of the results with the comparison
analysis. Section 5 concludes the paper with possible future enhancements.
II. LITERATURE SURVEY
Machine learning approaches [8] play a crucial role throughout numerous efficient areas for
data processing; one of them is the identification of card fraud. Through previous research,
several methods were suggested to include strategies for detecting fraud through supervised
methods, unsupervised methods including a hybrid strategy; that makes it necessary and
know some technology involved in identifying credit card fraud and have a better
understanding of the types of card fraud. Many strategies were suggested and checked. Most
of them will be reviewed in the brief following.
Detection of card fraud is focused on an interpretation [9] of the card actions in purchases.
Most strategies were implemented throughout the identification of card fraud like artificial
neural network (ANN), genetic algorithm (GA), support vector machine (SVM), frequent
item set mining (FISM), decision tree (DT), optimization algorithm for migratory birds
(MBO) and process for naïve Baiyes (NB). The quantitative logistic regression and naïve
bays analysis are conducted in. Bayesian and neural system output is assessed on data
concerning credit card fraud [10].
Decision trees, machine learning, and logistical regression are evaluated in fraud detections
of the scope.
The article [11] evaluates several innovative methods of machine learning; supporting vector
machines including random forests together with logistical regression as part of an attempt
towards better detect fraud when applying neural network and logistic regression to identity
fraud detection issues.
https://jespublication.com/ PageNo:165
Vol 12, Issue 06, June /2021
ISSN NO: 0377-9254
Credit card identification faces many problems because fraud behavioral models [12] are
complex. Which are suspicious transactions appear to look like genuine ones; card
transaction sets of data are seldom accessible but extremely imbalanced (and skewed);
optimum feature choice (parameters) for models; sufficient measures for test the efficiency of
distorted credit card fraud database strategies. The efficiency of credit card fraud detecting
becomes greatly affected by both the form of sampling approach utilized, parameter choice as
well as identification techniques used [13].
Designed to detect fraudulent activity utilizing conventional manual methods seems to be
time-consuming as well as incorrect, rendering such manual techniques more unrealistic to
have the emergence of big data. Financial companies have also transformed into intelligent
methods. Such intelligent scam methods [14] comprise methods predicated on computing
intelligence (CI). Its techniques for detecting statistical scams are split into two categories:
supervised and unsupervised methods. Designs are calculated of supervised techniques in
fraud detection predicated on both the specimens in fraud as well as valid exchanges to
classify duplicate entries when fraud and valid when statistical anomalies 'exchanges will be
identified when prospective cases of fraudulent charges in unsupervised fraud detection[15].
In such a data analysis paper [16] the hybrid data model has been primarily examined by
experts when functionality choice, as well as heuristic classification, has been achieved on 3
different levels. Its ordinary preprocessing has been accomplished during the first stage. Four
functionality choice algorithms such as genetic algorithm, data gain ratio, as well as an
assessment of the recovery characteristics have been used in the second and third phases.
Here variables with functionality choice techniques were determined based on both the
precision of distinct classification machine learning and then the feature selection technique
has been used and this is wisest for a specific classification. Such a hybrid model produced
outcomes that were of good precision.
A credit card data collection seems [17] to be strongly imbalanced because it holds more
legitimate transfers than that of the fraud. This implies that prediction may acquire a quite
high precision rating without identifying the scam transaction. Class allocation, i.e. sampling
minority classes, seems to be one great way to deal with such a type of issue. Class learning
instances could be doubled even in oversampling significant minority also in reasonable
proportion again to the significant majority class so that the new algorithm increases the
chance of such correct prediction.
https://jespublication.com/ PageNo:166
Vol 12, Issue 06, June /2021
ISSN NO: 0377-9254
Another detailed discussion of such supervised and otherwise unsupervised tools [18] and
techniques also can be found at all in this document. There will always fail to detect cases of
fraud through supervised optimization techniques. The design in deep autoencoder, as well as
restrained Boltzmann machine, is also known as RBM which can build ordinary transfers to
discover anomalies from ordinary trends. It's not only developing the hybrid technique with
such a variety of techniques of that same AdaBoost and perhaps Majority Voting [19].
Credit card scam has become too widely known when the digital environment currently has
to be. Through staying at home human wants everything within the hand. It tends to increase
the use of e-commerce, by which attackers, as well as scammers, have always been
compensated for further chance to attempt scam. Its fraudsters usually use many techniques
to commit fraud. Recognizing the approach is the requirement to stop more fraud. Something
quite a few more previous studies also has been carried out on another variety of techniques
to find solutions linked to card fraud identification. All such strategies include [20], and are
not restricted to; neural network models (NN), Bayesian network (BN), intelligent decision
engines (IDE), optimization algorithms, meta-learning agents, artificial intelligence, image
processing, Constitution-based systems, logic regression (LR) [21], SVM [22], decision
tree[23], k-nearest neighbor (kNN) [24], meta-learning strategy, adaptive learning, and so on.
Its structure of both the neural network becomes primarily used according to an unsupervised
technique in using real-time payment processing applications. Self-organizing graph of both
the neural network, this can solve this problem from each correlated community using optical
classification. With more than just 90 percent of the total detection system of ROC demand
curve fraudulent without actually causing any other false alarm ensemble cast learning (also
often widely known as meta-classifier) enhances results through combining different learning
algorithms optimization algorithms to enhance statistical results [25].
III. PROPOSED SYSTEM
We propose a model which detects fraudulent transactions in credit card using Machine
Learning techniques. The proposed model treats the fraud detection as binary classification
problem. To build this system the major challenge is Class Imbalance Problem.
Import the dataset from publically available Kaggle. The format of the dataset is .CSV
(Comma Separated Values) file. Prepare the data by removing duplicates and verify that the
dataset contains no missing values. Label encoding and one-hot encoding will handle each
categorical feature in the dataset. The data consists of attributes of different scales, and
several machine models may gain from rescaling the attributes to the same size for all
https://jespublication.com/ PageNo:167
Vol 12, Issue 06, June /2021
ISSN NO: 0377-9254
attributes in the data. Attributes are frequently rescaled into the range between 0 and 1.
MinMaxScaler is used to rescale the data. A pre-processed dataset will be available and the
SVM based machine learning algorithms will be used to assess it. Separating a validation
dataset to be used for subsequent confirmation of the developed model's skill. The simple
approach we can use to assess the performance of a machine learning algorithm is to use
different data sets for training and testing. Due to overfitting we cannot train the machine
learning algorithms on the dataset and make predictions from that same dataset to evaluate
machine learning algorithms. Fig. 1 represents the proposed system of fraud detection.
https://jespublication.com/ PageNo:168
Vol 12, Issue 06, June /2021
ISSN NO: 0377-9254
https://jespublication.com/ PageNo:169
Vol 12, Issue 06, June /2021
ISSN NO: 0377-9254
4. EXPERMENTAL RESULTS
4.1 Dataset:
The prior information on the dataset, such as its attributes, dimensions, and data types of each
feature, etc., is an essential factor that helps one to perform proper operations. An offline
dataset which is a publically accessible web platform named “Kaggle” is considered for the
implementation of the program. The dataset is a Credit card fraud dataset that consists of
several transactions. The dataset contains a combination of cases of fraud and non-fraud.
CSV files are the most commonly used format for machine learning data. The dataset
contains rows and columns of the following features like Merchant_id, Transaction amount,
Is declined, Total Number of declines per day, is Foreign Transaction, is HighRisk Country,
and is Fraudulent.
Such dataset descriptions are summarized below in Table 2.
Table 2: Data-set properties
Description Value
No. of Transactions 3075
No. of Attributes 11
Types of Classes 0,1
https://jespublication.com/ PageNo:170
Vol 12, Issue 06, June /2021
ISSN NO: 0377-9254
https://jespublication.com/ PageNo:171
Vol 12, Issue 06, June /2021
ISSN NO: 0377-9254
Experimental results from Table 3 as well as Figure 3 demonstrate the percent of the different
assessment parameters for just the credit card fraud dataset for distinct machine learning
techniques. Findings indicate that SVM techniques demonstrate an accuracy percentage with
95.988 percent, although Random Forest 93.228 percent, LR 92.89 percent, NB 91.2 percent,
Decision trees 90.9 percent as well as GBM 93.99 percent demonstrate a precision percentage
of ULB machine learning credit card fraud identification. For any machine learning
technique, greater values are shown to be accepted as just a higher performance method of
precision, accuracy, recall, and F1-score. As we have seen, there are a few algorithms that
have surpassed others as well quite significantly. Thus, selecting SVM over all other
techniques could be a sensible approach in attaining a greater degree of completeness when
decreasing quality just significantly.
5. CONCLUSION
It is certain that with the advent of deregulation liberalization, globalization and privatization
new ways are opened for banks to enhance their revenues by diversifying their product
portfolio and offerings. This paper investigates performance analysis of Support vector
machine's Kernel methods are trained on transactional data and their performances are
evaluated and compared based on accuracy, specificity and sensitivity performance metrics.
The model is compared with existing classifiers like Naive Bayes, Decision Tree, KNN, and
Logistic Regression and SVM. The highly skewed data is sampled where positive-class in
down sampled and negative-class in up sampled to convert dataset into balanced dataset. The
Results shows that SVM Kernel methods shows great performance for all three performance
metrics like sensitivity, accuracy and specificity over traditional techniques. It is analyzed
and observed that RBF kernel function outperforms and gives 96% accuracy and 96 %
sensitivity compared with other techniques. Linear kernel function gives 90 % as highest
sensitivity in comparison with other techniques. This study highlights the performance of
SVM kernel classification of imbalanced and skewed data. In future scope multi-classifiers
and meta learning can be consider for highly imbalanced credit card fraud detection. For the
evaluation of algorithms, a publically available credit card dataset was used. The accuracy
and confusion matrix was adopted as metrics that can be used to evaluate algorithm
efficiency. The current system for detecting credit card fraud was built with default
parameters. In the future, it can also be designed in such a way that it would prevent
overfitting by parameter tuning. Most machine learning models have hyper-parameters. The
https://jespublication.com/ PageNo:172
Vol 12, Issue 06, June /2021
ISSN NO: 0377-9254
https://jespublication.com/ PageNo:173
Vol 12, Issue 06, June /2021
ISSN NO: 0377-9254
[10]. Bahnsen, A.C., Stojanovic, A., Aouada, D., and Ottersten, B., 2014, April.
Improving credit card fraud detection with calibrated probabilities. In Proceedings of
the 2014 SIAM international conference on data mining (pp. 677-685). Society for
Industrial and Applied Mathematics.
[11]. Popat, R.R. and Chaudhary, J., 2018, May. A survey on credit card fraud detection
using machine learning. In 2018 2nd International Conference on Trends in
Electronics and Informatics (ICOEI) (pp. 1120-1125). IEEE.
[12]. ] Patil, S., Nemade, V. and Soni, P.K., 2018. Predictive modelling for credit card
fraud detection using data analytics. Procedia computer science, 132, pp.385-395.
[13]. [13] Malini, N. and Pushpa, M., 2017, February. Analysis on credit card fraud
identification techniques based on KNN and outlier detection. In 2017 Third
International Conference on Advances in Electrical, Electronics, Information,
Communication, and Bio-Informatics (AEEICB) (pp. 255-258). IEEE.
[14]. Zareapoor, M. and Shamsolmoali, P., 2015. Application of credit card fraud
detection: Based on bagging ensemble classifier. Procedia computer science,
48(2015), pp.679-685.
[15]. Dal Pozzolo, A., Boracchi, G., Caelen, O., Alippi, C. and Bontempi, G., 2015, July.
Credit card fraud detection and concept-drift adaptation with delayed supervised
information. In 2015 international joint conference on Neural networks (IJCNN) (pp.
1-8). IEEE.
[16]. Mahmoudi, N. and Duman, E., 2015. Detecting credit card fraud by modified Fisher
discriminant analysis. Expert Systems with Applications, 42(5), pp.2510-2516.
[17]. ]Jurgovsky, J., Granitzer, M., Ziegler, K., Calabretto, S., Portier, P.E., He-Guelton,
L. and Caelen, O., 2018. Sequence classification for credit-card fraud detection.
Expert Systems with Applications, 100, pp.234-245.
[18]. Dal Pozzolo, A., Boracchi, G., Caelen, O., Alippi, C. and Bontempi, G., 2017.
Credit card fraud detection: a realistic modeling and a novel learning strategy. IEEE
transactions on neural networks and learning systems, 29(8), pp.3784-3797.
[19]. Gupta, Shalini, and R. Johari. ”A New Framework for Credit Card Transactions
Involving Mutual Authentication between Cardholder and Merchant.” International
Conference on Communication Systems and Network Technologies IEEE, 2011:22-
26.
https://jespublication.com/ PageNo:174
Vol 12, Issue 06, June /2021
ISSN NO: 0377-9254
[20].Y. Gmbh and K. G. Co, “Global online payment methods: Full year 2016,” Tech.
Rep., 3 2016.
Author’s profile
https://jespublication.com/ PageNo:175