Fraud Prediction in Property Insurance
Fraud Prediction in Property Insurance
net/publication/352658980
CITATIONS READS
44 1,247
2 authors:
All content following this page was uploaded by Yaohao Peng on 26 June 2021.
1. Introduction Therefore, this article aimed to verify whether the use of various
machine learning models – ranging from regularized extensions of
The insurance market is a highly profitable market that moves large simple models to different structures of non-linear interactions and
sums of money over the years. In Brazil alone, about 10.8 billion USD ensemble-based classifiers – can contribute to fraud identification for
was paid in insurance policies in 2017 (Brazilian National Confedera- property insurance policies, comparing the performance of these mod-
tion of Insurance Companies, 2017). Similarly, frauds can bring huge els with the standard logistic regression. Furthermore, in this paper,
losses to the companies: In the same year of 2017, the total value of all we compiled an overall profile for confirmed fraudsters and analyzed
occurred claims was around 10.0 billion USD, while the value of proven the relative importance of the input variables according to each model
frauds totaled 221.2 million USD (Brazilian National Confederation of using eXplainable Artificial Intelligence (XAI) methods, addressing as
Insurance Companies, 2017). well the interpretation of the predictions for prominent false positive
Bearing in mind the economic relevance of this market and the and false negative observations. These results were discussed in terms
challenge of fraud detection by professional analysts, the search for of their practical applicability to effectively aid risk management pro-
data mining and machine learning techniques had been showing its fessionals in building data-driven decision rules based on the most
predicting potential in financial applications, as seen in works like Hsu prominent ‘‘signs’’ for potential frauds in future policies.
Moreover, this paper used real-world data at the individual level
et al. (2016), notably when involving complex problems and non-linear
from a major Brazilian insurance company, containing information
patterns (Huang et al., 2004; Soman et al., 2009). In a study that
about the income level of the clients, as well as features like the time
applied several machine learning models to predict the default rate
between contract start and claim and the number of past policy claims.
of a government-funded housing finance program, de Castro Vieira
In this sense, the use of real data represents a significant advantage
et al. (2019) reported that the practical application of the proposed
over simulated data in terms of both model evaluation and practical
method would have significantly reduced the number of conceded non-
applicability in decision-making. In addition, since scientific papers
performing loans and avoided approximately 3.0 billion USD from
that analyze frauds in residential and business property insurance are
credit losses. Specifically for fraud detection, machine learning appli-
relatively scarce in comparison to works about fraud detection in
cations include Awoyemi et al. (2017), Chen et al. (2006), Hajek and
automobile insurance of credit card transactions, this paper further
Henriques (2017) and Raghavan and El Gayar (2019).
contributes to the current literature on machine learning applications.
∗ Corresponding author.
E-mail addresses: [Link]@[Link] (M.K. Severino), [Link]@[Link] (Y. Peng).
URL: [Link] (M.K. Severino).
[Link]
Received 31 July 2020; Received in revised form 11 June 2021; Accepted 14 June 2021
Available online 22 June 2021
2666-8270/© 2021 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license
([Link]
M.K. Severino and Y. Peng Machine Learning with Applications 5 (2021) 100074
This article is structured as follows: Section 2 provides a review of highlighting common issues like the non-stationary distribution of the
the recent literature on insurance fraud prediction, with an emphasis on data, highly imbalanced classes distributions, a continuous and massive
studies that applied machine learning techniques; Section 3 describes flow of new transactions, and scarcity of available microdata due to
the explanatory variables, the data processing procedure and the steps confidentiality issues. Regarding these issues, the authors evaluated
of the empirical experiments, as well as metrics to evaluate the fore- the predictive performance of three machine learning models (ran-
casts; Section 4 presents the results of the forecasts and their statistical dom forest, support vector machine, and neural network) using a
significances, alongside a general profile of the fraud cases and model- real-world credit card dataset, as well as the overall impact of up-
agnostic methods of feature importance evaluation for global and local date periodicity, application of balancing techniques, and retainment
interpretation using eXplainable Artificial Intelligence, discussing the of older observations in the training dataset. The results indicated
findings of this paper on practical risk analysis and fraud detection; that the random forest model performed consistently better than Sup-
finally, Section 5 discusses the limitations of this paper and points out port Vector Machine and neural networks for all training approaches;
suggestions for possible future developments. moreover, models that were updated with new data more often per-
formed better, which indicates that the fraud distribution can quickly
2. Related literature change over time. Concerning the issue of unbalanced classes, the
application of balancing methods improved the performance over the
Machine learning methods are based on an inductive analytical ‘‘static’’ non-balanced dataset, in which the random forest yielded the
paradigm, drawing conclusions based on the patterns observed from worst performance. Finally, the procedure of discarding older obser-
data without defining assumptions like probability distributions and vations exhibited a smaller marginal improvement in comparison to
functional form, such as linearity. As discussed in Peng and Nagata maintaining the dataset balanced.
(2020), this flexibility demands additional caution to control the mod- Waghade and Karandikar (2018) used machine learning models
els’ balance between generalization ability and complexity, since differ- to predict frauds in the healthcare sector, pointing out the costs of
ent models or even small variations on their hyperparameters can in- the manual identification process – which requires a long effort for
duce great impacts on the predicting performance while being applied reviewing auditors to evaluate medical insurance claims are fraudulent
to the same dataset.
– and discussing the relevance of automated decision support systems
One of the main applications of machine learning in business ad-
for different types of fraud in this business branch. Similarly, Verma
ministration and finance is fraud detection, a topic of great relevance
et al. (2017) applied outlier detection models to identify anomalies and
from the perspective of a decision-maker, given the fact that decision
potential frauds in healthcare systems.
support systems that help risk analysts predict fraudsters have a direct
Wang and Xu (2018) applied machine learning-based text mining al-
impact on the economic performance of a company. In this sense, this
gorithms to analyze the descriptions of car accidents in order to predict
topic has been investigated by a big number of researchers in recent
frauds for automobile insurance claims: the tested models were support
years, as discussed in papers like Awoyemi et al. (2017),Ngai et al.
vector machine (SVM), random forest, and deep neural network, and all
(2011), Raghavan and El Gayar (2019) and Waghade and Karandikar
three models managed to reach an F1 Score greater than 75%. Roy and
(2018).
George (2017), on the other hand, applied random forest and naive
For instance, Triepels et al. (2018) proposed an automated system
Bayes models to detect fraud in automobile claims, finding that the
to detect frauds in shipping documents, which can be adulterated to
former performed better than the latter. Likewise, Yao et al. (2018) pro-
overpass restrictions or to facilitate smuggling. The authors developed a
posed a model to detect financial fraud combining feature selection and
model based on Bayesian networks to generate probabilistic discrimina-
machine learning classification models. Starting from high-dimensional
tive models and predict the presence of goods on the shipments’ cargo
data, Principal Component Analysis and XGBoost were used to identify
list, and then crossed with the documentation to determine whether
the most informative variables, after which several machine learning
a fraud is configured. The results showed that proposed automated
systems considerably improved the detection of miscoding and smug- models were applied, amongst which the random forest had the best
gling compared to random audits, which are typically used by shipping out-of-sample performance.
companies to check these documentations, usually in a labor-intensive Eshghi and Kargari (2019), on the other hand, stated that unsuper-
and non-scalable way. vised methods like clustering and outlier detection techniques may not
Similarly, Dou et al. (2019) analyzed download frauds in Mobile suffice for complex fraud detection tasks, and proposed a framework
App Markets, categorizing the frauds into three main classes based with Multi-Criteria Decision Analysis and intuitionistic fuzzy sets to in-
on their motivation: (1) boosting an App’s front end downloads, (2) corporate the effect of behavioral uncertainties to model the propensity
optimizing an App’s search ranking, and (3) enhancing an App’s user of a banking transaction to be a fraud. Similarly, Carcillo et al. (2019)
acquisition and retention rates. The authors applied the XGBoost model stated in favor of integrating unsupervised and supervised learning
to predict frauds using different sets of features, reaching over 99% of techniques for credit card fraud detection, in order to better adapt to
accuracy in the most general case. In addition, the authors evaluated changes in customer behavior and fraudsters’ ability to invent novel
the predictions using performance metrics that consider the overall bal- fraud patterns. The authors computed outlier scores for different levels
ance between false negatives and false positives, as well as generating a of granularity based on clustering analysis, subsequently applying them
ranking of the features’ importance for this predicting task. Both aspects on a real-world dataset and reporting an accuracy improvement on the
were considered in this paper’s empirical analysis as well. detection performance.
Bearing in mind that the presence of fraud implies large profit losses Jurgovsky et al. (2018) presented a sequential learning approach to
for the insurance sector, Sheshasaayee and Thomas (2018) illustrates fraud detection in credit card transactions using LSTM recurrent neural
the main challenges of risk and fraud analysts to develop fraud iden- networks, comparing it with a Random Forest classifier as a static
tification mechanisms and decision rules, discussing the advantages benchmark. Using a real-world dataset and analyzing independently
of using Machine Learning methods to perform such tasks, especially offline and e-commerce transactions, the authors found out that the
concerning the most prominent features of fraudsters. In this line, Popat frauds detected by the two learners were consistently different, which
and Chaudhary (2018) listed recent researches about credit card fraud suggests the potential for the development of ensemble-based models
detection with machine learning models, discussing the strengths of this that incorporate both approaches. In addition, the performance of both
paradigm on mining patterns from high-dimensional data and assisting the static and the sequence learners benefited from manual feature
real-world decision-making. aggregations, evidencing the importance of modeling aspects and fea-
Likewise, Dal Pozzolo et al. (2014) discussed the complexity in- ture engineering in fraud detection. Other recent studies on credit card
volved in the development of a data-driven fraud detection algorithm, fraud detection include Kim et al. (2016), which proposed a multi-class
2
M.K. Severino and Y. Peng Machine Learning with Applications 5 (2021) 100074
algorithm to detect fraud intention in financial misstatements applying claim was a fraud – were assigned by human experts from the com-
MetaCost (Domingos, 1999) to deal incorporate asymmetric misclas- pany’s risk analysis sector. In this sense, we also included a few cases
sification costs to control for the classes’ unbalance; and Varmedja of detected frauds but had not yet been proven in court in the class
et al. (2019), which applied Logistic Regression, Random Forest, Naive ‘‘fraud’’ — we justify this decision based on a preemptive approach
Bayes, and Neural Network as machine learning classifiers, combined for risk management, as we find it important not to leave any real
with SMOTE (Synthetic Minority Oversampling Technique) to balance frauds (false negatives) out in a preliminary screening stage, which is
the training data. the stage this study brings its main contributions. Our understanding is
For the prediction of corporate bankruptcy, Chen et al. (2020) that refining a smaller set of most likely frauds for a posterior human
combined two ensemble methods (namely bagging and boosting) with inspection is an optimal strategy for fraud detection, and we evaluated
Support Vector Machines, using a scheme that assigns labels for un- the predictions using metrics that penalize false positives and false
labeled training data controlling for the bag-level relative proportion negatives bearing this in mind as well. The database used in this paper
between the classes — this approach was shown to be efficient in also contains information concerning the person involved in the claim,
terms of both data-labeling for large datasets and prediction perfor-
such as age, gender, wage level, timestamp of actions taken, past fraud
mance improvement through the introduction of ensemble learning
occurrences, etc., adding extra value to the conclusions of this study.
strategies; Nami and Shajari (2018), on the other hand, developed a
Fields that allow personal identification were accordingly suppressed.
method that involves two stages of detecting fraudulent payment card
Since most operations are not frauds, the dataset would be unbal-
transactions, applying dynamic random forest and k-nearest neighbors
anced if the time periods were the same for both frauds and non-frauds.
as machine learning models: based on the primary data, additional
transaction features are derived and a greater weight is assigned to In this sense, for this paper, we opted to collect all fraud claims
most recent transactions, considering that the most recent behavior of registered between 2009 and 2018 and all non-fraud claims from 2015
credit card holders tend to have a larger impact on deciding whether a onwards to keep the dataset roughly balanced, with a similar number
transaction is fraudulent or legitimate. of observations for the two labeled classes, totaling 851 observations.
In an attempt to learn Complex Event Processing rules to extract rel- We opted for this treatment assuming that the overall fraudster profile
evant information from big-scale data streams, Bruns et al. (2019) pro- did not change structurally in the observed years — based on the
posed a model based on genetic algorithms, discussing as well heuristics authors’ practical experience in fraud analysis in the company which
about the choice of suitable parameters for the process. The empirical provided the microdata, this is a standard procedure for property fraud
validation of this model was performed using real-world transportation detection. Hence, whilst the literature has proposed many data mining
data, which allowed the evaluation of the merits and weaknesses of methods to deal with imbalanced datasets and prediction of rare events,
the approach should it be applied in a real-world decision-making as presented by Haixiang et al. (2017), we did not apply oversampling
context. For a fraud detection application in finance, Eweoya et al. techniques, such that we balanced the dataset by taking a longer
(2019), in turn, used real-world data from a financial institution and period for confirmed frauds instead. Moreover, while balancing is not
applied decision trees to predict frauds in bank loan administration and mandatory for any of the analyzed models, it helps to better interpret
consequently diminish losses due to loan defaults. the predictions’ accuracy, a metric that could yield high values due to
Thus, an application using real-world data and machine learning non-informative classification in very unbalanced datasets: for instance,
methods is pertinent to investigate which machine learning models can predicting only ‘‘non-frauds’’ for a dataset with 5% of frauds would
better identify fraud patterns and accurately predict future felonies. lead to a 95% accuracy without generating any value in terms of fraud
Besides, given the great variety of possible frauds, with each category detection.
having its specificities and modus operandi (Gottschalk, 2010), in this The description and motivation of each feature contained in our
paper we focused on frauds in which the consumer is the author of the dataset are summarized below:
fraud, delimited to policy claims over residential assets of individuals
and firms, a relatively less explored segment in comparison to automo- • Product Type: There were 3 classes for product type in our
bile or credit card insurance. Moreover, the data used in this paper were dataset: ‘‘Residential’’, and ‘‘Residential exclusive’’, for natural
collected from a major insurance company, which provides additional persons; and ‘‘Business’’, exclusive for legal persons;
insights to our conclusions regarding the relative importance of the • Coverage type: Coverage type is the protection granted by the
database features, which can be very useful for real-world insurance
insurance, each different type has its peculiarities regarding the
policy evaluations.
operational procedure and analysis process for the claims. We
As reported by the review papers of Ngai et al. (2011) and Sinay-
summarized the coverage types into 6 classes: ‘‘electrical dam-
obye et al. (2018), the majority of the studies that apply machine learn-
age’’, ‘‘theft’’, ‘‘storm’’, ‘‘glass break’’, ‘‘fire/lightning/explosion’’,
ing to fraud detection problems focused on credit card and telecommu-
and ‘‘others’’;
nication frauds, while the applications for insurance frauds are mostly
• Contract channel: refers to the channel that the client used to
concentrated on healthcare and automobile applications, with few stud-
contract his/her policy, with 3 possible classes: ‘‘physically’’ (at
ies that tackle frauds for property insurances, especially for residential
policies. In this sense, as mentioned at the end of the introduction, this the counter), ‘‘online system’’ or ‘‘remote channel’’;
paper contributes to the literature by testing the empirical strengths • Automatic renewal: Indicates whether the customer has opted to
and weaknesses of various well-known machine learning models using include a clause to automatically renew the policy after its expi-
real-world microdata for a relatively less explored insurance segment, ration date. This field is important, as customers with intentions
potentially aiding market professionals and decision-makers on their to commit frauds tend to not hire an insurance policy to renew it
respective model choices for similar tasks. afterward;
• Past renewal: Indicates whether the policy is a new one or a past
3. Empirical analysis one renewed. For the company, a customer who renewed his/her
policy indicates less risk of frauds, since the renewal approval
3.1. Data overview and preprocessing depends on some procedural analyses;
• Legal person: Indicates whether the client is a natural or legal
We collected data from 2009 to 2018 of registered claims for person;
residential and business insurance policies from one of Brazil’s largest • Number of payment installments of the insurance value;
insurance companies. The labels for every policy – i.e.: whether the • Time of approval of the insurance policy;
3
M.K. Severino and Y. Peng Machine Learning with Applications 5 (2021) 100074
4
M.K. Severino and Y. Peng Machine Learning with Applications 5 (2021) 100074
Table 2
Main references of applications of machine learning techniques in fraud detection.
Model References
Caudill et al. (2005), Viaene et al. (2002), Yeh and Lien (2009)
Logistic Regression
Awoyemi et al. (2017), Varmedja et al. (2019), Yao et al. (2018)
Awoyemi et al. (2017), Viaene et al. (2002), Yeh and Lien (2009)
Naive Bayes
Roy and George (2017), Varmedja et al. (2019)
Awoyemi et al. (2017), Viaene et al. (2002), Yeh and Lien (2009)
KNN
Nami and Shajari (2018), Raghavan and El Gayar (2019)
Chen et al. (2006), Dal Pozzolo et al. (2014), Viaene et al. (2002)
SVM Raghavan and El Gayar (2019), Wang and Xu (2018), Yao et al. (2018)
Chen et al. (2020)
Dal Pozzolo et al. (2014), Viaene et al. (2002), Yeh and Lien (2009)
Neural Networks Jurgovsky et al. (2018), Wang and Xu (2018), Yao et al. (2018)
Raghavan and El Gayar (2019), Varmedja et al. (2019)
Dal Pozzolo et al. (2014), Nami and Shajari (2018), Roy and George (2017)
Random Forest Jurgovsky et al. (2018), Wang and Xu (2018), Yao et al. (2018)
Raghavan and El Gayar (2019), Varmedja et al. (2019)
Dou et al. (2019), Gupta et al. (2019), Majhi (2019)
GBM
Dhieb et al. (2019), Taha and Malebary (2020)
Table 3
Grid-search intervals of the hyperparameters.
Model Hyperparameter Interval
Logistic Regression No hyperparameters
Penalized Logistic Regression Elastic-net regularization weight {0.1, 0.2, … , 0.8, 0.9}
Naive Bayes Laplace correction factor {0, 0.1, … , 0.9, 1}
KNN Number of neighbors {1, 3, … , 13, 15}
Polynomial degree {2, 3, 4}
Misclassification cost {10−4 , 10−3 , … , 103 , 104 }
Polynomial Kernel SVM
Tolerance band for the 𝜀-insensitive loss function {0, 0.05, … , 0.95, 1}
Bias term for the Kernel function {0, 0.1, … , 1.9, 2}
Misclassification cost {10−4 , 10−3 , … , 103 , 104 }
Gaussian Kernel SVM
Inverse bandwidth for the Kernel function {0, 0.05, … , 1.95, 2}
Number of hidden layers {3, 5, 7}
Learning rate {0.1, 0.2, … , 0.8, 0.9}
Deep Neural Network
Input layer dropout ratio {0, 0.1, 0.2, 0.3}
Hidden layer dropout ratio {0, 0.1, 0.2, 0.3, 0.4, 0.5}
Number of trees {300, 400, … , 900, 1000}
Random Forest
Number of sampled features at each split {2, 3, … , 9, 10}
Learning rate {0.1, 0.2, … , 0.8, 0.9}
GBM Maximum depth of each tree {3, 4, 5, 6, 7, 8, 9}
Minimum loss reduction for splits {0.1, 0.2, 0.3, 0.4, 0.5}
Table 4
Mean and standard deviation of performance metrics for 1000 rounds of out-of-sample forecasts.
Model Accuracy Precision Recall F1 Score Kappa MCC
80.67% 80.56% 78.99% 79.67% 61.26% 61.41%
Logistic Regression
(1.60%) (2.94%) (3.79%) (1.81%) (3.21%) (3.18%)
81.40% 81.27% 79.95% 80.48% 61.34% 62.93%
Penalized Logistic Regression
(1.72%) (3.36%) (3.99%) (1.88%) (3.19%) (3.36%)
71.16% 73.18% 73.02% 72.39% 47.66% 49.51%
Naive Bayes
(5.66%) (8.64%) (5.53%) (3.72%) (10.28%) (8.78%)
75.74% 77.74% 69.77% 73.39% 51.25% 51.66%
KNN
(2.51%) (3.76%) (4.67%) (2.88%) (5.01%) (4.98%)
81.34% 79.22% 82.98% 80.93% 62.92% 63.00%
Polynomial Kernel SVM
(0.75%) (1.15%) (1.06%) (0.84%) (1.48%) (1.48%)
79.56% 79.07% 78.41% 78.53% 58.81% 59.00%
Gaussian Kernel SVM
(1.71%) (3.08%) (4.17%) (1.98%) (3.36%) (3.31%)
81.88% 78.41% 86.28% 82.06% 63.84% 64.32%
Deep Neural Network
(1.58%) (3.11%) (3.06%) (1.32%) (3.08%) (2.80%)
84.56% 84.72% 82.77% 83.61% 69.05% 69.24%
Random Forest
(1.43%) (2.60%) (3.72%) (1.65%) (2.97%) (2.88%)
83.21% 83.55% 81.73% 82.44% 66.20% 66.39%
GBM
(1.61%) (2.97%) (3.96%) (1.82%) (3.08%) (3.02%)
5
M.K. Severino and Y. Peng Machine Learning with Applications 5 (2021) 100074
regression, and neural network for credit card fraud detection; in their 4. Results and discussion
research, the authors also emphasized the importance of bearing in
mind the different costs associated with false alarms (false positives) 4.1. Performance metrics and statistical significance
and missed frauds (false negatives).
At a descriptive level, we first summarized a macro-profile of the
To further evaluate the reliability of the results and controlling by 409 cases of fraud:
the possibility of the models having predicted outcomes correctly by
chance, we also calculated the Cohen’s Kappa coefficient (Cohen, 1960) 1. 60.14% of the fraudsters were male;
2. 48.16% of the frauds were premature claims;
of the models after each round of training-validation-test, as well as
3. 52.81% of the fraudsters were non-married;
the Matthews correlation coefficient — MCC (Matthews, 1975), also
4. 79.95% of the total coverage amount was for electrical damage
known as ‘‘Pearson’s phi coefficient’’, which is a scaled version of the or theft claims;
test statistic for Pearson’s chi-squared test on a 2 × 2 contingency table. 5. The average age of fraudsters was 41 years;
As pointed out in Chicco and Jurman (2020), MCC tends to be more 6. 72.61% of the frauds were new insurance policies;
informative than metrics like accuracy and F1 Score because it takes 7. Fire/lightning/explosion coverage had the highest average pay-
into account the balance ratios of the four confusion matrix categories; ment amount for detected but unproven frauds.
although our dataset was a balanced one, we calculated the MCCs for
The results of the predictions provided by the machine learning
each model to further verify their empirical performance. The average algorithms are summarized in Table 4 below:
values for each evaluation metric and their standard deviations are The results indicate that the standard logistic regression showed
displayed in Table 4 and plotted in Figs. 1to 6. a middling overall performance in comparison to the other models,
6
M.K. Severino and Y. Peng Machine Learning with Applications 5 (2021) 100074
being outperformed by the penalized logistic regression and the two mostly concentrated on larger values for all metrics, with the standard
ensemble-based machine learning models (random forest and gradient logistic regression (in black) staying in the middle-ground. It can also
boosting) for all six evaluation metrics. On the other hand, naive be noted that the polynomial Kernel SVM (in cyan) had the lowest
Bayes and KNN showed the worse results, with their low recall values variance, and naive Bayes had the greatest variance, causing a heavy
suggesting a high presence of false negatives, consequently hindering tail to the left in its histogram.
the F1 Score as well — from a risk manager’s perspective, this implies As seen in Fig. 2, random forest and GBM had the best out-of-
in a large proportion of frauds that went unnoticed, which in turn sample values for precision, indicating that those models had a small
means that the company would be bearing losses in favor of fraudsters. amount of false positive predictions; conversely, the model with the
Concerning the Gaussian Kernel SVM, which is able to theoretically best performance for the recall was the neural network, as illustrated
generalize nonlinear interactions with arbitrarily high dimensionality, by Fig. 3, indicating that this model performed better with regard to
its out-of-sample was actually worse than the logistic regression, which avoiding false negatives. When jointly evaluating the two types of error
is a sign of overfitting of this model, similar to the reported in Peng and using F1 Score, Cohen’s Kappa and MCC, GBM, and random forest stood
Nagata (2020). out as the best models for our experiments; nonetheless, since a false
A visual synthesis of Table 4 is given by Figs. 1to 6, which give negative (failing to predict an actual fraud) often lead to larger financial
away the histograms of all tested models for each performance metric losses than false positives, our experiments point out that the deep
over the 1000 rounds of training-validation-test. In overall terms, it neural network approach is also recommended as a prominent model
can be observed that the distributions for all 6 metrics of KNN (in to support risk analysts and decision-makers.
dark green) and naive Bayes (in light green) were shifted to the left in Moreover, in order to statistically evaluate which of the tested
comparison to the others, while the values for Deep Neural Networks models had the best predicting performance across the experiments, we
(in magenta), GBM (in yellow) and random forest (in orange) were applied Hansen et al. (2011)’s Model Confidence Set procedure (MCS).
7
M.K. Severino and Y. Peng Machine Learning with Applications 5 (2021) 100074
Starting from the set of all tested models, MCS provides, at a given suggest that the neural network model may perform better in avoiding
significance level 𝛼, a subset of ‘‘superior models’’ that contain the false negatives. This finding can aid professional risk analysts interested
best model with probability greater than 1 − 𝑝 by recursively testing in constructing fraud prediction systems based on machine learning
the null hypothesis of equal predictive ability for all remaining models models, depending on the relative cost of a false negative over a false
using a block bootstrap approach. The elimination rule evaluates a loss positive.
function and recursively removes the model with the worst relative
performance in comparison to the average across all other models until 4.2. Global interpretation: permutation-based variable importance
all remaining models are statistically equal in terms of predictive power
or until only one model remains. We applied MCS for all evaluation Besides the evaluation of the performance metrics from Table 4
metrics described in subSection 3.3, defining the loss function as 1 and the statistical significances from Table 5, we proceed to yield a
minus the respective metric (given that all of them are naturally ranking of the features’ relative importance, with a model-agnostic,
bounded between 0 and 1) and using the usual confidence level of 95% permutation-based approach, proposed by Fisher et al. (2019) as an
(i.e.: 𝛼 = 0.05). extension of Breiman (2001)’s feature importance measurement for
As shown by the results displayed in Table 5, for all evaluation the random forest model. As Fisher et al. (2019) pointed out, besides
metrics, MCS identified only one model showing superior performance providing an estimate for the model class reliance, this more general
over all others, with the random forest standing out as the superior approach is able to be applied for general models instead of tree-based
model for all metrics except recall, for which the deep neural network or ensemble models since permutations on inputs are performed to the
model statistically outperformed the other models. Once again, the overall model instead of the individual ensemble members. Intuitively,
results argue in favor of the random forest model for all metrics that if a specific feature is important to model the target variable, the
balance both type I and type II errors (F1 Score, Kappa, and MCC), but predictive performance of the learner is expected to undergo sharper
8
M.K. Severino and Y. Peng Machine Learning with Applications 5 (2021) 100074
Table 5
Set of superior models for each performance metric according to Hansen et al. (2011)’s Model Confidence Set procedure at the 95% confidence
level.
Model Evaluation metric
Accuracy Precision Recall F1 Score Kappa MCC
Logistic Regression
Penalized Logistic Regression
Naive Bayes
KNN
Polynomial Kernel SVM
Gaussian Kernel SVM
Deep Neural Network ✓
Random Forest ✓ ✓ ✓ ✓ ✓
GBM
Fig. 7. Most important features for each model, given by the average loss in F1 Score after permutations.
drops after that feature is permuted; on the other hand, on a less analyzing the customer’s claims history, it is possible to see all the
relevant feature, the performance loss is expected to be smaller. In this events described in previous claims, which are useful to understand
sense, the difference between the observed values for the loss function not only the customer’s behavior in previous claims but also most likely
before and after the permutations on a feature can be regarded as a patterns of frauds, in the potential case of reincidence.
proxy for its overall importance. With regard to premature claims, on the other hand, this variable
Therefore, for each of the 1000 rounds of training-validation-test is commonly used by analysts to evaluate whether the policy was con-
and for all models, after computing the out-of-sample F1 Score we tracted right from the start with the expressive intention of reporting a
performed random permutations on the values of each feature and claim, especially for new clients. The customer’s history and the start-
estimated the respective F1 Score using the optimal hyperparameters ing/ending dates of the policy are considered to be ‘‘primary variables’’,
tuned after the cross-validation step, storing the difference between and are usually analyzed before the data related to the specific claim,
the metrics for every feature. Then, by taking the difference between such as the customer’s description of the event, the reported damages,
the F1 Scores before and after the permutations, we computed the and the reports from the regulators, which are responsible for carrying
performance loss after the permutations for each feature. Finally, after out the necessary inspections to verify the claim. However, the analysts
computing the F1 Score differences between all features across the 1000 usually emphasize more on the time period between the start of the
rounds, we took their average value and sorted them by descending contract and the claim, instead of the end date of the contract, which
importance. The five most relevant features for each model and their is more regarded as a ‘‘check-up’’ variable of the term of the contract to
respective estimated impact on the F1 Score are displayed in Fig. 7: verify whether the claim date is valid for the purposes of claim rejection
As seen in Fig. 7, the number of previous claims and the vari- or receipt of financial restitution, instead of an indicator of fraud.
ables that indicate premature claims (days between contract start/end The fact that this variable having a high impact on the predictions
and claim) were among the most important ones according to the may suggest that fraudsters could plan the ‘‘timing’’ of the fraudulent
permutation-based approach; ‘‘previous claims’’, specifically, was claim based on the contract duration, potentially anticipating the well-
ranked as one of the 5 most relevant features for all 9 tested models. established ‘‘suspicion level’’ of a premature claim by making a delayed
This result is aligned with the average expectation of a professional risk claim instead, which tend to arouse less suspicion on professional risk
analyst, as discussed in the list of variables displayed in Section 3: based analysts, since the fraudster would be paying for the product for a
on our practical professional experience in fraud analysis, there are longer period.
many necessary steps to analyze the information described by the client Other variables that had a high overall relevance across the models
and evaluate the veracity of the claim: in this process, the variables include the insurance premium and the insurance amount, which are
which are usually considered the most relevant ones by analysts are also commonly used variables by risk analysts in fraud detection.
the customer’s history of previous claims and whether the claim is Variables like income range, marital status, and age were ranked in
premature (claims in the first 60 days after the policy start date). By intermediate positions, reaching top-5 on importance only for KNN
9
M.K. Severino and Y. Peng Machine Learning with Applications 5 (2021) 100074
Fig. 8. Average SHAP values of deep neural network for the prominent false positive observation.
and Polynomial Kernel SVM, suggesting the existence of more complex inspired by Shapley (1953)’s work on cooperative game theory. As
patterns that determine the propensity of fraud. The indicators for discussed in Lundberg and Lee (2017), SHAP unifies a wide class
the product type exhibited high importance for the logistic regression of additive feature attribution techniques used for machine learning
and its penalized version while having a much more timid relevance model explanations, such as LIME (Ribeiro et al., 2016), which approxi-
according to the majority of the other models, including the ones with mates linear interpretable models near a given prediction; and Shapley
the best predictive performance according to Table 4 (deep neural sampling values (Štrumbelj & Kononenko, 2014), which provide esti-
network, random forest GBM). mates for feature importance in linear models under the presence of
The importance of the variable ‘‘number of installments’’ varied multicollinearity, by approximating the effect of removing each feature
across the nine tested models, having a high impact on the F1 Score from the learner as a weighted average of differences between the
after permutations for one of the best-performing models (random predictions of a model trained with and without the respective feature.
forest) and one of the worst-performing models (Naive Bayes) at the In this sense, while being computationally expensive, SHAP values
same time. A possible explanation is the fact that this variable is assign importance values for each feature for a particular prediction,
often evaluated jointly with other features, such as income range and thus allowing to decompose the impact of each variable in the pre-
insurance premium, suggesting that there may be cross interaction dicted outcome compared to the average prediction for the sampled
between those variables that culminate in relevant patterns for fraud observations.
identification. The variable ‘‘automatic renewal’’ was considered as a In this sense, in addition to the global variable importance analysis
variable with low importance, while, in contrast, ‘‘past renewal’’ was presented in the previous subsection, we performed a local analysis on
assigned fairly high importance values for some models — this may the non-fraud observation classified as a fraud the most times across
have captured the effect of the document analysis involved in the all models and the fraud observation classified as a non-fraud the most
renewal process, which is expected to make the fraudster more exposed times across all models — we shall call those observations as ‘‘promi-
to the detection of inconsistencies or abnormal behaviors. Indicators nent false positive’’ and ‘‘prominent false negative’’, respectively. We
for legal person and contract channel were other variables that were in calculated the SHAP values associated with those two observations
general considered as less relevant features across the models. for the models that had the best overall performances – deep neural
The feature importance rankings presented in this subsection pro- network, random forest, and GBM – using 1000 training rounds, each
vided by the machine learning models enable a better understanding of of them with a sample of 200 randomly selected training examples and
the relative strengths and weakness of each machine learning model, 50 variable orderings, using the implementation of Biecek (2018). The
and can be used to generate decision rules applicable to real-world average SHAP values for the 1000 rounds are displayed in Figs. 8to
insurance policy evaluation, being applicable in practice as a screening 13. The observed values for each variable were reported in their actual
stage to assist human analysts’ posterior evaluation, with potential values (instead of centered and scaled) for better understanding.
gains of speed and efficiency. Since we fitted the models with data from In general terms, it was observed that the average SHAP values of
real fraud claims from an insurance company, the proposed methods two variables stood out for all three models: for the false positive ob-
are not only empirically effective but also have high applicability on servation, the variable ‘‘days between contract end and claim’’, was the
corporate or governmental decision-making, integrating a good out- one that contributed the most for predicting the prominent observation
of-sample fraud prediction performance without losing the practical as a fraud; on the other hand, the variable ‘‘number of previous claims’’
interpretability of those algorithms. was responsible for the strongest contribution for the prediction of the
false negative case as a non-fraud. These results are aligned with the
4.3. Local interpretation: Shapley additive explanation estimated variable importance displayed in Fig. 7 and, based on our
practical experience in fraud detection, also aligned with the intuition
As a complementary analysis to exemplify the potential use of of human analysts, since clients that issue frequent claims in property
eXplainable Artificial Intelligence framework to interpret predictions insurance usually raise the warnings for probable fraudulent behavior.
for individual observations, we performed an additional exercise us- Moreover, as reported at the beginning of Section 3, a significant
ing Shapley Additive Explanation values (henceforth SHAP values), a proportion of the frauds are premature claims, which is measured by
model-agnostic method introduced by Lundberg and Lee (2017) that the interval between the contract start or end and the claim, variables
aims at explaining individual machine learning model predictions, that analysts tend to look at with emphasis.
10
M.K. Severino and Y. Peng Machine Learning with Applications 5 (2021) 100074
Fig. 9. Average SHAP values of deep neural network for the prominent false negative observation.
Fig. 10. Average SHAP values of random forest for the prominent false positive observation.
Fig. 11. Average SHAP values of random forest for the prominent false negative observation.
11
M.K. Severino and Y. Peng Machine Learning with Applications 5 (2021) 100074
Fig. 12. Average SHAP values of GBM for the prominent false positive observation.
Fig. 13. Average SHAP values of GBM for the prominent false negative observation.
For the neural network model (Figs. 8 and 9) the variable ‘‘insured observations with strong average SHAP values, as well as age and
amount’’ played a relevant role ‘‘pushing up’’ the predicted fraud income range. This is an interesting result because the number of
probability for the false positive observation, while the covered event installments usually does not play a decisive role for risk classification
(electrical damage) and the contract channel were pointed as important in property insurance; in the global interpretation subsection, this
for reducing the predicted fraud probability, probably because the variable was also assigned a large importance for the random forest
physical contract channel is the most common modality and the policies model. Another noteworthy result is the sign of the average SHAP value
for electrical damage having in general low values for the insurance for the variable ‘‘past renewal’’, which is typically expected to reduce
premium. For the prominent false negative observation, apart from the fraud propensity when the policy is a renewed one, since it involves
the number of previous claims, the variables ‘‘income range’’, ‘‘age’’ a screening process, while also having a lower rate of premature claims;
and ‘‘marital status’’, which are variables usually jointly analyzed by
however, premeditated fraudsters may also anticipate this decision rule
human professionals, had important contributions for the erroneous
to hide their intentions from the analysts. Given the overall good perfor-
classification of this particular policy, although none of them reached
mance of the random forest model, future researches and professional
the top-5 most important features in the permutation-based analysis
analysts are recommended to further study the implications of these
of the previous subsection. Bearing in mind the fact that deep neural
networks had the best overall performance for false negatives (as seen particular features on fraud detection tasks.
by the recall values in Fig. 3), it can be inferred that those variables Finally, for GBM (Figs. 12 and 13) the relatively high insured
still play an important role in detecting actual frauds, alongside the amount was associated with a higher probability of fraud for the false
insured amount, which also had a strong average SHAP value towards positive case, as well as the fact of that policy being a residential one,
‘‘non-fraud’’ prediction. No variables apart from the ones that indicate which usually has a smaller proportion of frauds in comparison to
premature claims showed significant positive SHAP contributions. the other categories. The number of installments also had a positive
For random forest (Figs. 10 and 11) the number of installments average SHAP for the false positive case, just like in random forest,
played an important role for both the false positive and false negative and the strongest contribution for a non-fraud prediction was the small
12
M.K. Severino and Y. Peng Machine Learning with Applications 5 (2021) 100074
approval time, which usually occurs in policies for a common event CRediT authorship contribution statement
with low insurance premium — indeed, the insurance premium value
also had negative average SHAP. For the prominent false negative Matheus Kempa Severino: Methodology, Software, Formal analy-
observation, insurance amount and insurance premium had large con- sis, Investigation, Data curation, Writing - original draft, Visualization.
tributions for predicting it as a non-fraud, while all variables apart Yaohao Peng: Conceptualization, Methodology, Software, Validation,
from the number of days between contract start/end and claim (i.e.: Formal analysis, Investigation, Writing - original draft, Writing - review
the variables associated with the detection of premature claims) had & editing, Visualization, Supervision.
small contributions in indicating a higher probability of fraud.
Declaration of competing interest
5. Conclusion and remarks
The authors declare that they have no known competing finan-
This article evaluated machine learning-based predictive models
cial interests or personal relationships that could have appeared to
to detect frauds in property insurance policy claims, comparing the
influence the work reported in this paper.
predictive results of nine predictive models using data from a major
Brazilian insurance company. The results indicated that the random for-
References
est model achieved significantly better performance than the standard
logistic regression and other machine learning methods, as evidenced
Awoyemi, J. O., Adetunmbi, A. O., & Oluwadare, S. A. (2017). Credit card fraud
by the metrics of accuracy, precision, F1 Score, Cohen’s Kappa, and detection using machine learning techniques: A comparative analysis. In 2017
MCC, while the deep neural network model outperformed the other international conference on computing networking and informatics (ICCNI) (pp. 1–9).
models for the recall metric. Moreover, based on the documented IEEE.
fraud cases, we listed a macro profile of the fraudsters and ranked Biecek, P. (2018). DALEX: explainers for complex predictive models in R. Journal of
Machine Learning Research, 19(1), 3245–3249.
the relative importance of the explanatory variables according to a
Brazilian National Confederation of Insurance Companies (2017). Dados básicos. http:
permutation-based approach, highlighting the features that contributed //[Link]/cnseg/estatisticas/mercado/dados-basicos/.
the most to the models’ overall predictive power and for the prediction Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
of prominent false positive and false negative observations. Bruns, R., Dunkel, J., & Offel, N. (2019). Learning of complex event processing rules
The findings of this paper can contribute to the literature of machine with genetic programming. Expert Systems with Applications, 129, 186–199.
Carcillo, F., Le Borgne, Y.-A., Caelen, O., Kessaci, Y., Oblé, F., & Bontempi, G. (2019).
learning applications to fraud detection for residential and business
Combining unsupervised and supervised learning in credit card fraud detection.
insurance policies, a segment with relatively fewer works that follow Information Sciences.
this paradigm. In special, the fact that we tested the models using real- Caudill, S. B., Ayuso, M., & Guillén, M. (2005). Fraud detection using a multinomial
world data strengthens the relevance of the results over exercises that logit model with missing information. The Journal of Risk and Insurance, 72(4),
use simulated data, and further evidences the feasibility of converting 539–550.
the proposed models to operational tools for decision-making support Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE:
synthetic minority over-sampling technique. Journal of Artificial Intelligence Research,
in risk management, potentially assisting in the creation of data-driven 16, 321–357.
and interpretable decision rules or being integrated into the evaluation Chen, R.-C., Chen, T.-S., & Lin, C.-C. (2006). A new binary support vector system
process itself. Analogously, human analysts can benefit from this kind for increasing detection rate of credit card fraud. International Journal of Pattern
of product and also refine the algorithms by feeding more human- Recognition and Artificial Intelligence, 20(02), 227–239.
validated data into past datasets, which can be valuable to rectify Chen, Z., Chen, W., & Shi, Y. (2020). Ensemble learning with label proportions for
bankruptcy prediction. Expert Systems with Applications, 146, Article 113155.
mistakes made by the machine learning models.
Chicco, D., & Jurman, G. (2020). The advantages of the matthews correlation coefficient
The models proposed in this paper can also be adapted to a proba- (MCC) over F1 score and accuracy in binary classification evaluation. BMC
bilistic approach, yielding not only if an insurance policy is more likely Genomics, 21(1), 1–13.
to be a fraud or a non-fraud, but also the probability of that specific Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and
policy to be a fraud. This probability can then be used as an input to Psychological Measurement, 20(1), 37–46.
Dal Pozzolo, A., Caelen, O., Le Borgne, Y.-A., Waterschoot, S., & Bontempi, G. (2014).
evaluate the expected return (or loss) of a given insurance policy or to
Learned lessons in credit card fraud detection from a practitioner perspective. Expert
mathematically estimate the insurance premium adjusted to the fraud Systems with Applications, 41(10), 4915–4928.
risk, bearing in mind operational, legal, and ethical constraints. de Castro Vieira, J. R., Barboza, F., Sobreiro, V. A., & Kimura, H. (2019). Machine
As future developments, we believe that a spatial analysis can be learning models for credit analysis improvements: Predicting low-income families’
performed, adding to the current model variables such as the distance default. Applied Soft Computing, 83, Article 105640.
Dhieb, N., Ghazzai, H., Besbes, H., & Massoud, Y. (2019). Extreme gradient boosting
of the claimer’s home to town center/working place, the overall income
machine learning algorithm for safe auto insurance operations. In 2019 IEEE
level and demographic variables of his/her neighborhood, among other international conference on vehicular electronics and safety (ICVES) (pp. 1–5). IEEE.
potentially useful features that can potentially further enhance the Domingos, P. (1999). Metacost: A general method for making classifiers cost-sensitive.
model’s quality. Other popular machine learning algorithms such as In Proceedings of the fifth ACM SIGKDD international conference on knowledge
deep belief networks and restricted Boltzmann machines can also be discovery and data mining (pp. 155–164).
applied in similar experiments on fraud detection. Finally, additional Dou, Y., Li, W., Liu, Z., Dong, Z., Luo, J., & Philip, S. Y. (2019). Uncovering download
fraud activities in mobile app markets. In 2019 IEEE/ACM international conference
improvements can also be made augmenting the number of replications
on advances in social networks analysis and mining (ASONAM) (pp. 671–678). IEEE.
performed for each model, testing for a larger volume of data using Eshghi, A., & Kargari, M. (2019). Introducing a new method for the fusion of fraud
methods for imbalanced classification, such as SMOTE (Chawla et al., evidence in banking transactions with regards to uncertainty. Expert Systems with
2002) and other methods described in Haixiang et al. (2017), as well as Applications, 121, 382–392.
performing additional tuning of the hyperparameters’ values for each Eweoya, I., Adebiyi, A., Azeta, A., & Azeta, A. E. (2019). Fraud prediction in bank loan
administration using decision tree. Journal of Physics: Conference Series, 1299(1),
model.
Article 012037.
Fisher, A., Rudin, C., & Dominici, F. (2019). All models are wrong, but many are useful:
6. Disclaimer Learning a variable’s importance by studying an entire class of prediction models
simultaneously. Journal of Machine Learning Research, 20(177), 1–81.
Gottschalk, P. (2010). Categories of financial crime. Journal of Financial Crime, 17(4),
Disclaimer 1: The views expressed in this work are of entire respon- 441–458.
sibility of the authors and do not necessarily reflect those of their Gupta, R. Y., Mudigonda, S. S., Kandala, P. K., & Baruah, P. K. (2019). Implementation
respective affiliated institutions nor those of its members. of a predictive model for fraud detection in motor insurance using gradient boosting
method and validation with actuarial models. In 2019 IEEE international conference
Disclaimer 2: This research did not receive any specific grant from on clean energy and energy efficient electronics circuit for sustainable development
funding agencies in the public, commercial, or not-for-profit sectors. (INCCES) (pp. 1–6). IEEE.
13
M.K. Severino and Y. Peng Machine Learning with Applications 5 (2021) 100074
Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., & Bing, G. (2017). Raghavan, P., & El Gayar, N. (2019). Fraud detection using machine learning and deep
Learning from class-imbalanced data: Review of methods and applications. Expert learning. In 2019 international conference on computational intelligence and knowledge
Systems with Applications, 73, 220–239. economy (ICCIKE) (pp. 334–339). IEEE.
Hajek, P., & Henriques, R. (2017). Mining corporate annual reports for intelligent Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). " Why should i trust you?" Ex-
detection of financial statement fraud–a comparative study of machine learning plaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD
methods. Knowledge-Based Systems, 128, 139–152. international conference on knowledge discovery and data mining (pp. 1135–1144).
Hansen, P. R., Lunde, A., & Nason, J. M. (2011). The model confidence set. Roy, R., & George, K. T. (2017). Detecting insurance claims fraud using machine
Econometrica, 79(2), 453–497. learning techniques. In 2017 international conference on circuit, power and computing
Henrique, B. M., Sobreiro, V. A., & Kimura, H. (2019). Literature review: Machine technologies (ICCPCT) (pp. 1–6). IEEE.
learning techniques applied to financial market prediction. Expert Systems with Shapley, L. S. (1953). A value for n-person games. Contributions to the Theory of Games,
Applications. 2(28), 307–317.
Hsu, M.-W., Lessmann, S., Sung, M.-C., Ma, T., & Johnson, J. E. (2016). Bridging the Sheshasaayee, A., & Thomas, S. S. (2018). Usage of r programming in data analytics
divide in financial market forecasting: machine learners vs. financial economists. with implications on insurance fraud detection. In International conference on
Expert Systems with Applications, 61, 215–234. intelligent data communication technologies and internet of things (pp. 416–421).
Huang, Z., Chen, H., Hsu, C.-J., Chen, W.-H., & Wu, S. (2004). Credit rating analysis Springer.
with support vector machines and neural networks: a market comparative study. Sinayobye, J. O., Kiwanuka, F., & Kyanda, S. K. (2018). A state-of-the-art review
Decision Support Systems, 37(4), 543–558. of machine learning techniques for fraud detection research. In 2018 IEEE/ACM
Jurgovsky, J., Granitzer, M., Ziegler, K., Calabretto, S., Portier, P.-E., He-Guelton, L., symposium on software engineering in africa (SEiA) (pp. 11–19). IEEE.
& Caelen, O. (2018). Sequence classification for credit-card fraud detection. Expert Soman, K., Loganathan, R., & Ajay, V. (2009). Machine learning with SVM and other
Systems with Applications, 100, 234–245. kernel methods. PHI Learning Pvt. Ltd..
Kim, Y. J., Baik, B., & Cho, S. (2016). Detecting financial misstatements with fraud Štrumbelj, E., & Kononenko, I. (2014). Explaining prediction models and individual
intention using multi-class cost-sensitive learning. Expert Systems with Applications, predictions with feature contributions. Knowledge and Information Systems, 41(3),
62, 32–43. 647–665.
Kim, E., Lee, J., Shin, H., Yang, H., Cho, S., Nam, S.-k., Song, Y., Yoon, J.-a., & Kim, J.- Taha, A. A., & Malebary, S. J. (2020). An intelligent approach to credit card fraud
i. (2019). Champion-challenger analysis for credit card fraud detection: Hybrid detection using an optimized light gradient boosting machine. IEEE Access, 8,
ensemble and deep learning. Expert Systems with Applications, 128, 214–224. 25579–25587.
Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model Triepels, R., Daniels, H., & Feelders, A. (2018). Data-driven fraud detection in
predictions. Advances in Neural Information Processing Systems, 30, 4765–4774. international shipping. Expert Systems with Applications, 99, 193–202.
Majhi, S. K. (2019). Fuzzy clustering algorithm based on modified whale optimization Varmedja, D., Karanovic, M., Sladojevic, S., Arsenovic, M., & Anderla, A. (2019).
algorithm for automobile insurance fraud detection. Evolutionary Intelligence, 1–12. Credit card fraud detection-machine learning methods. In 2019 18th international
Matthews, B. W. (1975). Comparison of the predicted and observed secondary structure symposium INFOTEH-JAHORINA (INFOTEH) (pp. 1–5). IEEE.
of T4 phage lysozyme. Biochimica Et Biophysica Acta (BBA)-Protein Structure, 405(2), Verma, A., Taneja, A., & Arora, A. (2017). Fraud detection and frequent pattern match-
442–451. ing in insurance claims using data mining techniques. In 2017 tenth international
Nami, S., & Shajari, M. (2018). Cost-sensitive payment card fraud detection based on conference on contemporary computing (IC3) (pp. 1–7). IEEE.
dynamic random forest and k-nearest neighbors. Expert Systems with Applications, Viaene, S., Derrig, R. A., Baesens, B., & Dedene, G. (2002). A comparison of state-of-the-
110, 381–392. art classification techniques for expert automobile insurance claim fraud detection.
Naser, M., & Alavi, A. (2020). Insights into performance fitness and error metrics for The Journal of Risk and Insurance, 69(3), 373–421.
machine learning. ArXiv Preprint arXiv:2006.00887. Waghade, S. S., & Karandikar, A. M. (2018). A comprehensive study of healthcare fraud
Ngai, E., Hu, Y., Wong, Y., Chen, Y., & Sun, X. (2011). The application of data mining detection based on machine learning. International Journal of Applied Engineering
techniques in financial fraud detection: A classification framework and an academic Research, 13(6), 4175–4178.
review of literature. Decision Support Systems, 50(3), 559–569. Wang, Y., & Xu, W. (2018). Leveraging deep learning with LDA-based text analytics to
Niu, F., Recht, B., Re, C., & Wright, S. J. (2011). HOGWILD! a lock-free approach detect automobile insurance fraud. Decision Support Systems, 105, 87–95.
to parallelizing stochastic gradient descent. In Proceedings of the 24th international Yao, J., Zhang, J., & Wang, L. (2018). A financial statement fraud detection model
conference on neural information processing systems (pp. 693–701). based on hybrid data mining methods. In 2018 international conference on artificial
Peng, Y., & Nagata, M. H. (2020). An empirical overview of nonlinearity and overfitting intelligence and big data (ICAIBD) (pp. 57–61). IEEE.
in machine learning using COVID-19 data. Chaos, Solitons & Fractals, Article Yeh, I.-C., & Lien, C.-h. (2009). The comparisons of data mining techniques for the
110055. predictive accuracy of probability of default of credit card clients. Expert Systems
Popat, R. R., & Chaudhary, J. (2018). A survey on credit card fraud detection using with Applications, 36(2), 2473–2480.
machine learning. In 2018 2nd international conference on trends in electronics and
informatics (ICOEI) (pp. 1120–1125). IEEE.
14
Machine learning models improve fraud detection in the insurance sector by mining patterns from high-dimensional data and assisting real-world decision-making, as highlighted by Popat and Chaudhary (2018). They address challenges such as non-stationary data distributions, imbalanced class distributions, a massive flow of new transactions, and data confidentiality issues, which make the development of fraud detection algorithms complex . Furthermore, efforts like the use of explainable AI methods to estimate feature importance address the scarcity of microdata .
The random forest model outperforms others like Support Vector Machine and neural networks for credit card fraud detection because it is better suited to handle non-stationary data distributions and class imbalance. The frequent updates with new data improve its performance as fraud patterns change over time, and balancing techniques enhance its accuracy . Comparing to other models shows less overfitting and higher precision in predicting lesser false positives, making it robust in various training scenarios .
Data preprocessing, including one-hot encoding of categorical variables such as 'product type', 'coverage type', and 'contract channel', is essential in avoiding multicollinearity issues that could impair model performance . This step ensures that all features contribute orthogonally to the model's predictions, allowing for effective learning from the dataset. Proper preprocessing also includes centering and scaling to facilitate models like logistic regression, enhancing overall model performance and robustness .
Different kernels in SVM models affect their ability to capture data complexity and interactions. The polynomial kernel has low variance but may miss complex interactions, whereas the Gaussian kernel can model high-dimensional nonlinear interactions but may overfit to noise, particularly evident in its poor performance relative to logistic regression in fraud detection tasks . Optimal kernel choice balances complexity with generalization ability, crucial for accurately detecting fraud .
Model explainability via methods like SHAP values helps identify which features significantly influence predictions, making models more transparent and aiding risk analysts in understanding fraud patterns . This transparency assists in refining model development and aligning predictive insights with business context, improving model effectiveness and stakeholder trust. Explainability also supports continuous model improvement and better integration into decision-making frameworks .
Variable selection is critical as it directly influences model accuracy and interpretability. Including significant variables like 'number of installments', 'income range', and 'age' can improve model predictions by focusing on relevant patterns and mitigating noise . Excluding low-impact variables prevents overfitting and enhances model efficiency, while allowing analyzations like permuted importance tests provide beneficial insights into underlying fraud dynamics .
Risk managers face trade-offs between false positives and false negatives when selecting machine learning models for fraud detection. Models like Deep Neural Networks, while having high recall, may allow some false negatives to go unnoticed, resulting in financial losses . Conversely, models like GBM and random forest tend to have better precision with fewer false positives but may miss potential fraudulent transactions. Managers must balance these trade-offs to minimize financial impact while maintaining a tolerable rate of false alerts .
Updating machine learning models with new data is crucial in fraud detection as it significantly improves predictive performance by adapting to the evolving nature of fraudulent activities . Regular updates ensure the model captures new fraud patterns, which are essential due to the rapid change in fraud distribution over time. This dynamic updating process outperforms static models and maintains the model's relevance and accuracy in real-time fraud detection .
Balancing class distributions addresses the issue of imbalanced classes common in fraud detection, improving model accuracy and robustness. It ensures the model does not become biased towards the majority class and enhances performance metrics like F1 Score by reducing false negatives and enabling better fraud detection . This method prevents the degradation of models like random forests, keeping them competitive by ensuring sampled balance .
Naive Bayes and K-Nearest Neighbors models have been shown to yield low recall values in fraud detection, leading to a high incidence of false negatives . This indicates these models miss many fraud cases, contributing to financial losses. Their simplistic approaches might not capture complex data interactions, and their performance lags behind more sophisticated techniques like ensemble methods . This limits their utility for effective fraud risk management.