0% found this document useful (0 votes)
5 views7 pages

graph_construction_and_applicaiton

Uploaded by

enrique.repulles
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views7 pages

graph_construction_and_applicaiton

Uploaded by

enrique.repulles
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Auto Insurance Knowledge Graph Construction and Its

Application to Fraud Detection


Long Zhang Tianxing Wu∗ Xiuqi Chen
Zhejiang Lab, China Southeast University, China Zhejiang Lab, China
[email protected] [email protected] [email protected]

Bingjie Lu Chongning Na Guilin Qi


Zhejiang Lab, China Zhejiang Lab, China Southeast University, China
[email protected] [email protected] [email protected]

ABSTRACT 1 INTRODUCTION
In recent years, feature engineering based machine learning models Insurance is a means of protection from financial loss and rele-
have made great progress in auto insurance fraud detection. How- vant industries are essential components of the financial domain.
ever, their performance on single fraud case detection has never In recent years, governments, societies, and companies pay much
reached to a high level, and they cannot detect gang frauds. One attention to insurance fraud, which is a deliberate deception perpe-
of the main causes is that such machine learning models directly trated against an insurance company or agent for the purpose of
neglect the associations between auto insurance cases. To resolve financial gain. Among all kinds of insurance, auto insurance is the
this problem, we propose to leverage knowledge graph techniques harder-hit area. Auto insurance fraud cases accounted for approxi-
to discover associations between cases for fraud detection. We first mate 80% of the insurance criminal cases, which seriously violated
construct an auto insurance knowledge graph (AIKG) with ontol- the legal rights of insurance consumers, disrupted the normal auto
ogy building and knowledge extraction from relational database. insurance market, and affected road traffic safety. These are the key
We then apply AIKG to both gang fraud detection and single fraud types of crime that industry regulatory agencies, public security,
case detection. We finally conduct comprehensive experiments on and judicial organs need to prevent and combat [15]. According
fraud detection, and our methods significantly outperform state- to the International Association of Insurance Supervisors, about
of-the-art baselines in different evaluation metrics. Our built auto 20% ∼30% of insurance indemnities worldwide are suspected as
insurance ontology which is the core part of AIKG has been pub- fraud [13]. China Insurance Regulatory Commission [30] indicated
lished on the Web and can be open access. that China’s auto insurance claims expenditure totals 72.70 billion
dollars in 2020. Therefore, the leakage losses of insurance compa-
CCS CONCEPTS nies on auto insurance fraud can reach 14 billion dollars [15] in
• Information systems → Data mining; • Computing method- 2020. Auto insurance fraud infringes not only the economic inter-
ologies → Ontology engineering. ests of insurance companies, but also the interests of policyholders,
and at the same time harms the property of others even the whole
society. Increasing the detection rate of auto insurance fraud cases
KEYWORDS can reduce the cost of insurance companies and insurance premi-
Knowledge Graph; Ontology; Fraud Detection; Knowledge Graph ums when setting new types of insurance, maintain the order of
Embedding the insurance market, improve the operational efficiency of the
financial system, rapidly deter finance crimes, and lower the risk
ACM Reference Format:
Long Zhang, Tianxing Wu, Xiuqi Chen, Bingjie Lu, Chongning Na, and Guilin of insurance industry.
Qi. 2021. Auto Insurance Knowledge Graph Construction and Its Application Traditional methods on auto insurance fraud detection rely on
to Fraud Detection. In The 10th International Joint Conference on Knowledge manual judgment, but detecting fraud cases by human has relatively
Graphs (IJCKG’21), December 6–8, 2021, Virtual Event, Thailand. ACM, New high time cost. To solve this problem, rule-based expert system has
York, NY, USA, 7 pages. https://doi.org/10.1145/3502223.3502231 been introduced in auto insurance fraud detection. However, the
number of trigger rules in expert systems is limited and only all
∗ Corresponding conditions in a rule are satisfied, then the system can be triggered.
Author.
In order to further improve the coverage of auto insurance fraud
detection, neural networks [1], XGBoost [5], and other machine
Permission to make digital or hard copies of all or part of this work for personal or learning models are recently utilized on detecting auto insurance
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
fraud [8, 16], demonstrating promising results. Feature engineering
on the first page. Copyrights for components of this work owned by others than ACM occupies a key position in machine learning, and features such as
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, insurance value, vehicle brand, reporter phone number, garage type,
to post on servers or to redistribute to lists, requires prior specific permission and/or a
fee. Request permissions from [email protected]. accident address, and etc., are usually incorporated in the above
IJCKG’21, December 6–8, 2021, Virtual Event, Thailand machine learning models. Such features are plain information of a
© 2021 Association for Computing Machinery. single case, which means potential associations (e.g., the existing
ACM ISBN 978-1-4503-9565-6/21/12. . . $15.00
https://doi.org/10.1145/3502223.3502231
accomplice relationship) between cases are directly ignored. This

64
IJCKG’21, December 6–8, 2021, Virtual Event, Thailand Long Zhang, Tianxing Wu, Xiuqi Chen, Bingjie Lu, Chongning Na, and Guilin Qi

causes two problems: 1) existing machine learning models on auto domain-specific knowledge graph such as GeoNames2 for geogra-
insurance fraud detection can only detect single cases, but gang phy, DrugBank [26] for life science, MusicBrainz [20] for music,
detection cannot be solved; 2) the performance on detecting single and KG-Buddhism [27] for religion, have been constructed. How-
fraud cases has much room for improvement since useful features ever, knowledge graph techniques have not been applied in the
across cases are not fully considered. field of auto insurance. Thus, this is the first work on constructing
In this paper, we try to leverage knowledge graph techniques to auto insurance knowledge graph including ontology building and
find explicit associations between auto insurance cases for fraud ontology-based knowledge extraction from relational database.
detection. We first design an auto insurance ontology based on
the schema of relational databases and feature analysis on existing 2.2 Auto Insurance Fraud Detection
machine learning models. The built ontology1 is published on the Insurance has become an essential part of the management strat-
Web. We then extract RDF triples from relational databases based on egy of individuals, enterprises, and society. As a way to reduce
the built ontology to construct the auto insurance knowledge graph insurance costs, fraud control issues are gaining momentum. The
(AIKG). We finally utilize knowledge graph embedding to discover insurance fraud problem has existed since the beginning of insur-
new accomplice relationships for gang detection and improve the ance industry. Many people would make personal injury in purpose
machine learning models for single fraud case detection. to fraud [7]. Expert systems would be used in auto insurance fraud
The main contributions are summarized as follows: detection in the early stage [2, 7, 17, 18]. Different mechanisms
• We first propose to introduce knowledge graph in auto in- would be used in expert systems, for example, the occurrence of
surance fraud detection, which can effectively discover asso- fraud would be organized and fraudulent activities could be de-
ciations between cases to better detect fraud. tected in a timely manner based on the preventive mechanisms [7].
• We first design an auto insurance ontology from real-world Belhadji et al. [2] developed an anti-fraud expert system which used
relational databases of auto insurance, and publish it on the a procedure to separate the most important indicators in predict-
Web to facilitate knowledge reuse and relevant research in ing claims. Schiller [17] focused on studying the impact of fraud
the field of auto insurance. detection expert systems on audit procedures and policyholders’
• We first apply the built Auto Insurance Knowledge Graph over-compensation.
(AIKG) to gang fraud detection and improving the machine Nowadays machine learning technologies would be used in auto
learning models for single fraud case detection. insurance fraud detection [3, 6, 9], and feature engineering is quite
• We conduct comprehensive experiments on auto insurance important in such machine learning based methods. Data mining
fraud detection. Experimental results show that leveraging techniques are usually utilized to find new features to make machine
AIKG can effectively discover fraud gangs and the F1-score learning anti-fraud models better [3]. Dhieb et al. [6] calculated the
of single fraud case detection is significantly improved by premium percentage and used it in XGBoost [5] model to classify
17.2%. According to the estimation of China Insurance Regu- auto insurance normal cases and fraud cases. However, classic
latory Commission, the improvement of one percent on auto machine learning based methods only focused on single cases, i.e.,
insurance fraud detection in all China’s insurance compa- only use details of each single case to predict it fraud or not, which
nies can save the loss around 139 million dollars. These fully ignores the relevance between cases, i.e., it misses some important
reflects the value of AIKG in real-world applications. features across cases for single fraud case detection. Besides, gang
fraud detection is also directly neglected. Thus, we propose to
The rest of this paper is organized as follows. Section 2 outlines
construct an auto insurance knowledge graph AIKG to solve the
the related work. Section 3 presents the details of building AIKG.
above problems and better detect auto insurance frauds.
Section 4 introduces fraud detection with AIKG. Section 5 shows the
experimental results and finally Section 6 concludes and describes 3 THE CONSTRUCTION OF AIKG
the future work.
In this section, we present the whole process of constructing AIKG,
including two steps, i.e., ontology design and knowledge extraction
2 RELATED WORK
from relational database.
2.1 Knowledge Graph Construction
Knowledge graph can be any collection of knowledge represented 3.1 Ontology Design
in the form of graph, such as semantic web knowledge bases, As introduced in Section 1, current state-of-the-art machine learn-
RDF datasets and formal ontologies [28]. The nodes in the graph ing based methods on detecting auto insurance frauds focus on
are entities or literals, and edges are relationships between enti- judging whether a single case has fraudulent behaviours. They
ties or properties linking an entity to a literal. Currently, many process the fields in a relational table record as features for the cor-
well-known generic knowledge graphs including DBpedia [12], responding auto insurance case, thus neglecting associations across
YAGO [21], Wikidata [23], Zhishi.me [29], and etc., have been auto insurance cases. Such associations may improve the perfor-
built, which cover a wide variety of knowledge, but lack in-depth mance of single fraud case detection. For example, given two cases,
domain-specific knowledge when faced with specific applications. if the first one is a known fraud case and the second one shares
Thus, to meet the requirements of domain-specific applications, many similarities (e.g., the same reporter and garage) with the fraud
case, then the possibility of the second one being a fraud case will
1 https://github.com/zhanglongZJ/auto_insurance_ontology.git 2 http://www.geonames.org/

65
Auto Insurance Knowledge Graph Construction and Its Application to Fraud Detection IJCKG’21, December 6–8, 2021, Virtual Event, Thailand

“Time Stamp”, and “Accident Address” are properties of “Case”. Be-


sides, relations are defined between different classes, e.g., “Accident
Vehicle” exists between class “Case” and “Automobile”.

3.2 Ontology-Based Knowledge Extraction


From Relational Database
According to the built ontology, extracting corresponding fields in
batches from the relational database is the second step of building
AIKG. During this process, data cleansing is necessary due to the
data redundancy on the database storing the information on auto
insurance cases. Since the database content is filled in manually
by different people, many different field values share the same
meaning. For example, institute “Hangzhou Yuantong”, “Hangzhou
Yuantong Garage” and “Hangzhou Yuantong Limited Company” are
the same garage with different names.
To tackle this problem, we use the geographical information and
string similarity to find equivalent instances, especially institutes.
For example, given two institutes i 1 and i 2 , we first segment their
names into a set of words Wi 1 and Wi 2 , and compute the string
similarity using the Jaccard coefficient [10] as follows:
|Wi 1 | ∩ |Wi 2 |
strSim(Wi 1 ,Wi 2 ) = (1)
|Wi 1 | ∪ |Wi 2 |

Figure 1: The workflow of building auto insurance ontology. If the strSim(Wi 1 ,Wi 2 ) is larger than a fixed empirical threshold
0.2, then we leverage the corresponding longitudes and latitudes
(already exist in the database) to compute the geographical distance
дeoDis(i 1 , i 2 ) using the Haversine formula [25] as follows:
be high. Besides, without considering associations between auto p
дeoDis(i 1 , i 2 ) = 2R · arcsin h(i 1 , i 2 ) (2)
insurance cases, gang frauds cannot be detected. Hence, we use
knowledge graph techniques to help explicitly model associations φi2 − φi1 λi − λi 1
between auto insurance cases. h(i 1 , i 2 ) = sin2 ( ) + cos(φ i 1 )cos(φ i 2 )sin 2 ( 2 ) (3)
2 2
The first step of building our auto insurance knowledge graph where R is the earth radius, φ is the latitude, and λ is the longitude.
(AIKG) is to design an ontology. Figure 1 illustrates the workflow In this way, we can get the physical distance between i 1 and i 2 . If the
of ontology building. We first take fields in the relational tables дeoDis(i 1 , i 2 ) is smaller than 500 meters, then i 1 and i 2 are judged
describing the past auto insurance cases as features to train machine as equivalent. With the above strategies, the examples: “Hangzhou
learning models, inculding XGBoost, SVM and Neural Networks, Yuantong”, “Hangzhou Yuantong Garage” and “Hangzhou Yuantong
to detect auto insurance frauds. We then test the performance Limited Company” are identified as the same institute.
of the re-trained machine learning models removing one feature,
in order to evaluate the feature importance [22]. If the F1-scores
4 FRAUD DETECTION WITH AIKG
of all re-trained models significantly get lower, then this means
the removed feature is important for machine learning based auto In this section, we introduce how to use AIKG in gang fraud detec-
insurance fraud detection. Afterwards, we take such features (i.e., tion and single fraud case detection.
fields) as important ontological properties and the tables which
the selected features (i.e., fields) locate at as ontological classes. 4.1 Gang Fraud Detection
Finally, domain experts will not only make a revision by adding, In the relational database storing the information of auto insurance
deleting, or renaming properties and classes, but also define the cases, some cases have been labeled as fraud by domain experts. In
relations between classes. Here, we only select important fields on order to further discover fraud gang from the labeled cases, we first
auto insurance fraud detection for ontology building because we try to identify accomplice relations between reporters and garages.
expect the knowledge graph we aim to construct is useful to auto With the identified accomplice relations, we propose a fraud gang
insurance fraud detection, thus useless fields are not considered, detection algorithm to generate all gangs across cases.
and since domain experts can complement properties and classes, To identify accomplice relations between reporters and garages,
the quality of ontology design can be guaranteed. we record the number of times nco (r , д) that a reporter r and a
Figure 2 shows a part of our built auto insurance ontology consist- garage д co-occur in fraud cases. If nco (r , д) is larger than (or equal
ing of classes, properties, and relations between classes. Important to) a fixed threshold which is empirically set to 5, then we label
classes include “Case”, “Person”, “Automobile”, “Policy”, “Institute”, that there exists an accomplice relation between r and д. Domain
and etc. Each class has its own properties, such as “Accident Time”, experts will finally check whether the labeling results are correct.

66
IJCKG’21, December 6–8, 2021, Virtual Event, Thailand Long Zhang, Tianxing Wu, Xiuqi Chen, Bingjie Lu, Chongning Na, and Guilin Qi

Figure 2: A part of the built auto insurance ontology.

Since the proportion of the manually labeled fraud cases is quite graphs which have same garage. The output is the fraud gang set F
low when considering all cases, we utilize knowledge graph em- consisting of connected graphs, each of which is a fraud gang.
bedding models [24] to find implicit accomplice relations in AIKG.
Knowledge Graph Embedding is to embed the entities and relation
in knowledge graph into the continuous low-dimensional vector 4.2 Improving Single Fraud Case Detection
spaces. This technology could preserve the inherent graph structure Existing studies focus on predicting whether a single case is fraudu-
and semantic information in the entity and relation embeddings. lent or not, but the labeled cases in the training set are independent
Such vector representations help us perform relation inference without considering the features from the view of the knowledge
easily by link prediction. More precisely, link prediction can com- graph structure. Since we have leveraged knowledge graph embed-
pute the probability on the accomplice relation existing between a ding to train entity and relation embeddings which actually incor-
reporter instance and a garage instance. porate the contextual information across cases, the above problem
With the link prediction results of a knowledge graph embed- in single fraud case detection is solved by the following strategy. In
ding model, we can get the probability on the accomplice relation Section 4.1, we get the link prediction results on the probabilities of
between a reporter instance and a garage instance existing within the accomplice relations between reporters and garages, and such
each case. The top ranking accomplice relations have higher pos- probabilities are computed by the embeddings (incorporating the
sibility for the gang fraud crime. The reporter and garage with knowledge graph information across cases) of reporter instances
the accomplice relation (i.e., the top ranking accomplice relations and garage instances. Thus, we choose the probability of the ac-
(we empirically take top 10%) or labeled accomplice relations ) are complice relation existing within a case as an important feature
viewed as a fraud gang, but we also find that such reporters and in feature engineering based machine learning models, which are
garages with accomplice relations occur in many different cases. used for single fraud case detection.
Thus, after merging the same reporters and garages across cases, The machine learning models applied include XGBoost, SVM,
we could get connected graphs of reporters and garages, which are and Neural Networks which are already used in Section 3.1 (i.e.,
the output fraud gangs. Ontology Design). Here, we improve these machine learning mod-
The algorithm about detecting fraud gangs are shown in Figure 3. els by adding the features (i.e., the probabilities of the accomplice
In this algorithm, we use the reporter and garage instances with relation between a reporter instance and a garage instance) gener-
accomplice relations in all auto insurance cases as the input. At the ated by different knowledge graph embedding models, including
beginning, we regard each case si as a single connected graph. Then TransE [4], TransR [14], TransD [11], and RotatE [19]. The proba-
if two connected graphs have the same reporter, we merge them bilities of the labeled accomplice relations are directly set as one
to one connected graph. In the end, we merge the two connected for all knowledge graph embedding models.

67
Auto Insurance Knowledge Graph Construction and Its Application to Fraud Detection IJCKG’21, December 6–8, 2021, Virtual Event, Thailand

give the statistics of AIKG and evaluate the quality of the


output triples.
• Gang Fraud Detection: With AIKG, we used knowledge
graph embedding models to generate accomplice relations
between reporters and garages. With these accomplice rela-
tions, we discovered fraud gangs. In this task, we will evalu-
ate the correctness of the accomplice relations generated by
knowledge graph embedding modes and fraud gangs.
• Single Fraud Case Detection: After utilizing a knowledge
graph embedding model to train entity and relation em-
beddings, we computed the probabilities on the accomplice
relations existing between reporters and garages. Such prob-
abilities were taken as new features which are used for im-
proving existing feature engineering based machine learning
models on single fraud case detection. In this task, we will
evaluate the performance on the machine learning models
incorporating new features for single fraud case detection.

5.1.3 Evaluation Metrics. For all the three tasks, we used precision
as the evaluation metrics. Besides, for the single fraud case detection,
we used recall and F1-score as another two evaluation metrics. F1-
Score is the harmonic mean of precision and recall, which balances
the precision and recall of different machine learning models.

5.2 Result Analysis


5.2.1 Knowledge Graph Construction. In the final built AIKG, for
the schema part (i.e., ontology), it has 6 classes, 54 class properties,
Figure 3: The fraud gang detection algorithm. and 11 relations between classes. For the instance part, there are
2,077,669 instances, in which 374,939 are cases, 812,929 are persons,
59,127 institutes, 338,245 policies and 492,429 are automobiles. All
the instances compose 5,758,476 triples in total.
5 EXPERIMENT To evaluate the precision of the triples in AIKG, we invited five
In this section, we evaluate the quality of the built AIKG and test domain experts to manually check whether the randomly selected
the performance of applying AIKG in gang fraud detection and 100 triples are correct since the total number of triples in too large
single fraud case detection. The ontology in our built AIKG has to evaluate them all. The average number of correct triples in this
been published on the Web: https://github.com/zhanglongZJ/auto_ evaluation is 98.8, i.e., the average precision is 98.8%.
insurance_ontology.git.
5.2.2 Gang Fraud Detection with AIKG. In this work, we found
5.1 Experiment Settings that fraud gangs used the customers’ vehicles which needed to be
5.1.1 Datasets. In this work, our dataset is from a famous Chinese repaired to forge accident vehicles, and they used these vehicles to
insurance company. There are 374,939 auto insurance cases, and diddle insurance companies. This is why we tried to mine accom-
these cases from January 30, 2015 to January 31, 2021. This dataset plice relationships between the insurance reporters and garages.
was exported from a real relational database which contains thirteen We test different knowledge graph embedding models including
tables and 309 fields in total. Due to the privacy and commercial TransE [4], TransR [14], TransD [11], and RotatE [19] to add ac-
reasons, we cannot introduce more details about the dataset. complice relations in AIKG using top 10% link prediction results.
Similarly, five domain-experts were asked to label the randomly
5.1.2 Tasks. In our experiment, we built AIKG from relational selected 100 accomplice relations from the common top 10% link
database at first. Then we used AIKG to catch the gang fraud cases. prediction results via majority voting. Table 1 shows the precision
Finally, we leveraged the probabilities of accomplice relations as results, and we find that RotatE significantly outperforms other
new features to improve machine learning models for single fraud models with the pecision 67.86%. According to the gang fraud
case detection. Therefore, we have three tasks to be evaluated in detection algorithm, we can find fraud gangs consisting of reporter
the experiment as follows: instances and garage instances. Finally, there are 124 gangs, which
• Knowledge Graph Construction: With our designed on- involves 7,608 cases in total, and 4,468 cases were labeled as fraud
tology and proposed strategies on ontology-based knowl- cases, fraud rate is 58.7%, which is quite high in auto insurance
edge extraction from relational database, we built an auto domain and this demonstrates our method on fraud gang detection
insurance knowledge graph, i.e., AIKG. In this task, we will is effective in real-world scenarios.

68
IJCKG’21, December 6–8, 2021, Virtual Event, Thailand Long Zhang, Tianxing Wu, Xiuqi Chen, Bingjie Lu, Chongning Na, and Guilin Qi

Table 1: The precisions of the accomplice relations generated by different knowledge graph embedding models in top 10% link
prediction results

RotatE TransE TransR TransD


Precision 67.86% 48.21% 48.22% 43.75%

Table 2: The different evaluation results of all comparison models in single fraud case detection

model Precision Recall F1-Score


XGBoost baseline 0.520 0.206 0.295
XGBoost+RotatE 0.640 0.324 0.430
XGBoost+TransE 0.573 0.295 0.389
XGBoost+TransR 0.577 0.263 0.362
XGBoost+TransD 0.579 0.295 0.391
Ensemble XGBoost 0.692 0.354 0.468
SVM baseline 0.465 0.116 0.186
SVM+RotatE 0.466 0.312 0.374
SVM+TransE 0.520 0.232 0.321
SVM+TransR 0.479 0.177 0.258
SVM+TransD 0.487 0.251 0.332
Ensemble SVM 0.496 0.354 0.413
Neural Networks baseline 0.263 0.136 0.179
Neural+Networks RotatE 0.239 0.994 0.385
Neural+Networks TransE 0.275 0.328 0.299
Neural+Networks TransR 0.260 0.466 0.333
Neural+Networks TransD 0.257 0.460 0.330
Ensemble Neural Networks 0.275 0.760 0.404

5.2.3 Single Fraud Case Detection with AIKG. In single fraud case fraud case detection. Besides, the recall is improved by 14.8%, and
detection, we test different combinations of machine learning mod- this could save more than two billion dollars each year in catch-
els and knowledge graph embedding models. Our proposed strategy ing fraud cases since 14.8% more fraud cases can be discovered.
(denoted as Ensemble models) is to integrate different new features The combination of knowledge graph embedding models and ma-
on the probabilities (generated by different knowledge graph em- chine learning models would provide a new sight of auto insurance
bedding models including TransE, TransR, TransD, and RotatE) of anti-fraud business.
the accomplice relations. We labeled five thousand cases, all test
models used 80% data to train and the rest 20% data to test. In
In Table 2, Ensemble Models significantly outperform others in
6 CONCLUSION AND FUTURE WORK
precision, recall, and F1-score. This means that Ensemble Models In this paper, we combine the domain-specific knowledge graph and
could learn relational information across cases from these accom- the machine learning models to handle the task of auto insurance
plice relation features effectively. The best combination is Ensemble fraud detection. We first design a framework on auto insurance
XGBoost, i.e., XGBoost with four new features generated by link ontology building. The built ontology has been published on Web
prediction results of different knowledge graph embedding models. and can be open accessed. With our auto insurance ontology, we
Besides, the result of the XGBoost model combined with RotatE is extract knowledge from relational database to construct auto in-
better than other combinations of one machine learning model and surance knowledge graph (i.e., AIKG) and resolve the problem of
knowledge graph embedding model. TransE, TransR and TransD data redundancy. Based on AIKG, we leverage knowledge graph
models would improve the performance of basic XGBoost model embedding techniques to solve gang fraud detection by generat-
(i.e., XGBoost baseline). However, the results are similar with each ing fraud gangs with our detection algorithm. We also utilize the
other. link prediction results to add new features which can capture the
According to the experimental results, adding new features gen- contextual knowledge across cases to improve the single fraud case
erated by knowledge graph embedding models could improve the detection machine learning models. Experimental results show the
performance of machine learning models in different degrees. This high quality of AIKG, and the effectiveness of applying AIKG to
improvement could reduce the incidence of fraud cases and at the gang fraud detection and single fraud case detection on the large
same time increase the robustness of auto insurance fraud detection scale real-world dataset.
in business companies. We improve the F1-score by 17.2% in single As for the future work, we plan to explore community discovery
algorithms for fraud gang identification. We will also study deep

69
Auto Insurance Knowledge Graph Construction and Its Application to Fraud Detection IJCKG’21, December 6–8, 2021, Virtual Event, Thailand

learn techniques combining knowledge graph to avoid complex [25] Edy Winarno, Wiwien Hadikurniawati, and Rendy Nusa Rosso. 2017. Location
feature engineering in single fraud case detection. Besides, we will based service for presence system using haversine method. In International
Conference on Innovative and Creative Information Technology (ICITech). 1–4.
continue to update AIKG to better support real-time fraud detection. [26] David S Wishart, Yannick D Feunang, An C Guo, Elvis J Lo, Ana Marcu, Ja-
son R Grant, Tanvir Sajed, Daniel Johnson, Carin Li, Zinat Sayeeda, et al. 2018.
DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic acids
ACKNOWLEDGMENTS research 46, D1 (2018), D1074–D1082.
[27] Tianxing Wu, Cong Gao, Guilin Qi, Lei Zhang, Chuanqi Dong, He Liu, and Du
This work is supported in part by the National Natural Science Zhang. 2017. KG-Buddhism: The Chinese Knowledge Graph on Buddhism. In
Foundation of China (No. 62006040), and the Project for the Doc- Joint International Semantic Technology Conference. 259–267.
[28] Tianxing Wu, Guilin Qi, Cheng Li, and Meng Wang. 2018. A Survey of Tech-
tor of Entrepreneurship and Innovation in Jiangsu Province (No. niques for Constructing Chinese Knowledge Graphs and Their Applications.
JSSCBS20210126). Sustainability 10, 9 (2018), 3245.
[29] Tianxing Wu, Haofen Wang, Cheng Li, Guilin Qi, Xing Niu, Meng Wang, Lin
Li, and Chaomin Shi. 2020. Knowledge graph construction from multiple online
encyclopedias. World Wide Web 23, 5 (2020), 2671–2698.
REFERENCES [30] Guoliang Yang. 2018. Analysis of the form and countermeasures of car insurance
[1] Martin Anthony and Peter L Bartlett. 2009. Neural network learning: Theoretical fraud cases. Gansu Finance 000, 007 (2018), 41–43.
foundations. cambridge university press.
[2] El Bachir Belhadji, George Dionne, and Faouzi Tarkhani. 2000. A model for the
detection of insurance fraud. The Geneva Papers on Risk and Insurance-Issues and
Practice 25, 4 (2000), 517–538.
[3] R. Bhowmik. 2011. Detecting Auto Insurance Fraud by Data Mining Techniques.
Journal of Emerging Trends in Computing & Information ences 2, 4 (2011).
[4] Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Ok-
sana Yakhnenko. 2013. Translating Embeddings for Modeling Multi-Relational
Data. In NIPS. 2787–2795.
[5] Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system.
In Proceedings of the 22nd acm sigkdd international conference on knowledge
discovery and data mining. 785–794.
[6] N. Dhieb, H. Ghazzai, H. Besbes, and Y. Massoud. 2019. Extreme Gradient Boosting
Machine Learning Algorithm For Safe Auto Insurance Operations. In 2019 IEEE
International Conference on Vehicular Electronics and Safety (ICVES).
[7] Ken Dornstein. 1996. Accidentally, on purpose: The making of a personal injury
underworld in America. St. Martin’s Press.
[8] M. A. Fauzan and H. Murfi. 2018. The Accuracy of XGBoost for Insurance
Claim Prediction. International Journal of Advances in Soft Computing and its
Applications 10, 2 (2018), 159–171.
[9] Mohamed Hanafy and Ruixing Ming. 2021. Machine learning approaches for
auto insurance big data. Risks (2021).
[10] Paul Jaccard. 1912. THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.
New Phytologist 11, 2 (1912), 37–50.
[11] Guoliang Ji, Kang Liu, Shizhu He, and Jun Zhao. 2016. Knowledge Graph Com-
pletion with Adaptive Sparse Transfer Matrix. In AAAI. 985–991.
[12] Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas,
Pablo N Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick Van Kleef, Sören
Auer, et al. 2015. DBpedia–a large-scale, multilingual knowledge base extracted
from Wikipedia. Semantic Web 6, 2 (2015), 167–195.
[13] Youxiang Li and Qingwei Kong. 2018. Anti-insurance fraud theory and practice
research. China Financial and Economic Publishing House.
[14] Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. 2015. Learning
Entity and Relation Embeddings for Knowledge Graph Completion. In AAAI.
2181–2187.
[15] Joint Research Group on Auto Insurance Anti-fraud. 2021. Research on Auto
Insurance Fraud and Anti-fraud Issues and Supervision Suggestions. insurance
studies (2021), 3–10.
[16] M. Paredes. 2018. A Case Study on Reducing Auto Insurance Attrition with
Econometrics, Machine Learning, and A/B Testing. In 2018 IEEE 5th International
Conference on Data Science and Advanced Analytics (DSAA).
[17] Jörg Schiller. 2006. The impact of insurance fraud detection systems. Journal of
Risk and Insurance 73, 3 (2006), 421–438.
[18] G. L. Simons and S. J. Andriole. 1985. Expert systems and micros. NCC Publications
(1985).
[19] Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, and Jian Tang. 2018. RotatE: Knowl-
edge Graph Embedding by Relational Rotation in Complex Space. In International
Conference on Learning Representations.
[20] Aaron Swartz. 2002. MusicBrainz: A Semantic Web Service. IEEE Intelligent
Systems 17, 1 (2002), 76–77.
[21] Thomas Pellissier Tanon, Gerhard Weikum, and Fabian Suchanek. 2020. YAGO 4:
A Reason-able Knowledge Base. In European Semantic Web Conference. 583–596.
[22] Ivan Viola, Armin Kanitsar, and M Eduard Groller. 2005. Importance-driven
feature enhancement in volume visualization. IEEE Transactions on Visualization
and Computer Graphics 11, 4 (2005), 408–418.
[23] Denny Vrandečić and Markus Krötzsch. 2014. Wikidata: a free collaborative
knowledgebase. Commun. ACM 57, 10 (2014), 78–85.
[24] Quan Wang, Zhendong Mao, Bin Wang, and Li Guo. 2017. Knowledge graph
embedding: A survey of approaches and applications. IEEE Transactions on
Knowledge and Data Engineering 29, 12 (2017), 2724–2743.

70

You might also like