0% found this document useful (0 votes)
1 views

A Fraud Detection Visualization System Utilizing R

The document presents a prototype fraud detection visualization system that utilizes radial drawings and heat-maps to identify fraudulent activities by employees. It combines information visualization and pattern-matching techniques, focusing on common fraud patterns suggested by internal auditors. The system processes data to assign severity values to employee-client interactions, enabling auditors to visualize suspicious activities effectively.

Uploaded by

dina maryana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

A Fraud Detection Visualization System Utilizing R

The document presents a prototype fraud detection visualization system that utilizes radial drawings and heat-maps to identify fraudulent activities by employees. It combines information visualization and pattern-matching techniques, focusing on common fraud patterns suggested by internal auditors. The system processes data to assign severity values to employee-client interactions, enabling auditors to visualize suspicious activities effectively.

Uploaded by

dina maryana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/259010182

A Fraud Detection Visualization System Utilizing Radial Drawings and Heat-


maps

Article · November 2013


Source: arXiv

CITATIONS READS
18 1,118

3 authors, including:

Evmorfia N. Argyriou Antonios Symvonis


YWorks GmbH National Technical University of Athens
14 PUBLICATIONS 256 CITATIONS 157 PUBLICATIONS 1,633 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Antonios Symvonis on 27 October 2014.

The user has requested enhancement of the downloaded file.


A Fraud Detection Visualization System Utilizing
Radial Drawings and Heat-maps

Evmorfia N. Argyriou1, A. Symvonis1 and Vassilis Vassiliou2


1
arXiv:1311.7259v1 [cs.OH] 28 Nov 2013

School of Applied Mathematical & Physical Sciences,


National Technical University of Athens, Greece.
{fargyriou,symvonis}@math.ntua.gr
2
Vodafone Greece.
[email protected]

Abstract. We present a prototype system developed in cooperation


with a business organization that combines information visualization and
pattern-matching techniques to detect fraudulent activity by employees.
The system is built upon common fraud patterns searched while trying
to detect occupational fraud suggested by internal auditors of a busi-
ness company. The main visualization of the system consists of a multi-
layer radial drawing that represents the activity of the employees and
clients. Each layer represents a different examined pattern whereas heat-
maps indicating suspicious activity are incorporated in the visualization.
The data are first preprocessed based on a decision tree generated by
the examined patterns and each employee is assigned a value indicat-
ing whether or not there exist indications of fraud. The visualization is
presented as an animation and the employees are visualized one by one
according to their severity values together with their related clients.

1 Introduction

Internal fraud detection in business organizations gains more and more


attention as fraudulent activity appears in ascendant trend during the
last years. Fraud is defined as the intentional misuse or abuse of the
assets of a company and may be committed by employees, clients or
other entities [3]. Studies on business fraud schemes show that most of the
reported fraud cases have been committed by trusted associates and this
is referred to as “occupational or employee fraud”. Occupational fraud can
be classified into three main categories: (i) False or fraudulent financial
statements, (ii) assets misappropriation and, (iii) corruption [3]. The
falsification of financial statements, produces great loss to a company and
is mostly committed by employees in senior management or executives.
Assets misappropriation is committed by lower-level personnel and due
to the fact that produces insignificant losses at an individual level it
cannot be easily traced by the auditors. Such schemes may continue for
years until fraud is confirmed, producing a huge cost both to the global
economy and to the company. As a result of fraud, business reputation,
company value, and public and client trust are negatively affected.
Even though advanced information technology has been incorporated
into organizations to reduce the risk of internal fraud, monitoring di-
verse systems that produce textual logs in non-uniform formats is a
time-consuming task. Information visualization can be promising since it
facilitates the quick identification of fraudulent activity. In this paper, we
present a system developed in cooperation with a business organization
that exploits the advantages of information visualization and pattern
recognition to detect suspicious patterns concerning fraudulent financial
statements in systems in which a pair of entities (employee and client)
are involved. Towards this direction, the system produces a multi-layer
radial drawing (see Figure 1) representing the activity of employees and
clients along with other significant information that enable the identifi-
cation of possible fraud patterns.

Fig. 1: A snapshot of the interface of the system.

Since occupational fraud schemes are well-hidden in the huge amount of


data, we were seeking for an approach that would present to the auditor
all the recorded events according to their severity. On the other hand,
visualizing large data-sets simultaneously is confusing and inefficient.
For this reason, the system measures the similarity of the activity of
the employees based on fraud detection patterns (suggested by auditors
based on their experience and the framework of the company on internal
fraud risk reduction) and appropriate heat-maps are generated and in-
corporated in the system. The produced visualization is presented as an
animation. The system supports supplementary functionalities such as a
database log viewer, export log mechanisms, storing and post-processing
of data, plots and charts (see Figure 2).

2
2 Detection Procedure
Fraud detection has been studied enough in the literature. To the best of
our knowledge, there exist only few works oriented exclusively on occu-
pational fraud detection. Luell [18] utilizes data-mining and visualization
techniques to detect client advisor fraud in a financial institution. Eberle
and Holder [12] detect structural anomalies in transactions and processes
propagated by employees using a graph representation. SynerScope [23]
is an industrial visualization tool for analyzing “Big Data” capable of de-
tecting financial fraud using a visualization scheme similar to ours rep-
resenting the billing links and relations between the company and other
entities. The main difference is that our system is oriented exclusively
on occupational fraud detection based on patterns suggested by auditors
and, thus, it is equipped with a detection mechanism that preprocesses
the data and the visualization conforms to these patterns. We also uti-
lize animations for the detection of fraud schemes to avoid cluttering
the visualization. A visualization system based on concentric circles was
presented in [2] aiming at identifying periodic events using an algorithm
for periodicity detection. Our system extends the one presented in [1]
that detects periodic patterns that may conceal occupational fraud in
several ways: (i) The visualization of our system provides a complete
view of all the examined patterns and the results of the examination on
each pattern (in [1], the detection procedure was a “black-box” determin-
ing the order of the presentation of the clients in a video representing
their activity in which suspicious clients appeared first; partial results of
the detection procedure were illustrated in additional plots and charts,
which hindered the investigation), (ii) the detection mechanism is based
on a decision tree even though we have incorporated most of the pat-
terns presented in [1], (iii) for periodicity detection, we apply a variation
of the Longest Common Subsequence algorithm [24] that tackles noisy
data, (iv) a parallel coordinates plot has been added to detect unusual
employee behavior (unauthorized access to computers, business systems,
etc), (v) the system provides a database-viewer to facilitate the investi-
gation procedure.
Many of the existing publications that deal with fraud detection in gen-
eral, use of data-mining techniques [4], [17], [19]. The Financial Crimes
Enforcement Network AI System [14], [21] is a system that correlates
and evaluates all reported transactions for indications of money laun-
dering using pattern-matching techniques. The NASD Regulation De-
tection System [16], [20] identifies patterns of violative activity in the
Nasdaq stock market by combining pattern matching and data mining
techniques and provides visualizations of the results. The Link Analy-
sis Workbench [25] searches for criminal activity and terrorism in noisy
and incomplete data also utilizing pattern matching techniques. Visual-
ization techniques have been also proposed for financial fraud detection.
3D-treemaps have been used to monitor the real-time stock market per-
formance and to identify a stock that may represent an unusual trading
pattern [15]. “WireVis” [7] is a system that provides interactive visual-
izations of financial wire transactions and aims to detect financial fraud.
A system that correlates data and discovers complex networks of po-

3
Fig. 2: The startup screen of the system. Custom queries can be performed and results
are presented in the log viewer. The data have been anonymized for security
and data privacy reasons.

tentially illegal financial activities based on visualization techniques was


presented in [13]. “VISFAN” [10] has been developed for the visual anal-
ysis of financial activity networks that tries to discover financial crimes,
like money laundering and frauds. “VIS4AUI” [9] is a system that tries
to detect money laundering and financial crimes by collecting financial
information related to ongoing bank relationships and high value trans-
actions. Radial drawings are widely used for the visualization of large
data-sets, especially in bioinformatics [8], [11] and social networks [6],
where a large portion of information has to be visualized simultaneously.
This paper is structured as follows: In Section 3, we describe the detec-
tion procedure. In Section 4, we present the features of the visualization
of the system.Section 5 presents a case-study based on real data. We
conclude in Section 6 with open problems and future work.

3 Overview of the Detection Procedure


As input, the system takes log files from diverse control systems that are
appropriately parsed and stored in a database using a uniform format.
Records generated by systems involving an employee and a client consist
(among other secondary fields) of a time-stamp, an employee ID, a client
ID and an action. An event, say e, is defined as a 4-tuple (t, u, c, a),
where: (i) t corresponds to the time-stamp of the occurrence of e, (ii) u
corresponds to an employee, (iii) c represents a client, and (iv) a is the
action taken by the employee. For an event e = (t, u, c, a), we say that

4
client c is related to e and is also related to employee u. For a pair
of employee-client (u, c), an event-series Tu,c = {e1(u,c) , e2(u,c) , . . .} is a
sequence of events ei(u,c) = (ti , u, c, ai ) related to client c and employee
u.

Employee

0 1 1 1
w1 Same pair of Unauthorized Unauthorized
No pattern Employee-Client Action System

1 0 Periodicity ?
w2
Yes No

0 1 0 1 Access Within
w3 Working Hours ?
Yes No Yes No

0 1 0 1 0 1 0 1 Is Employee a
w4 Frequent User
Yes No Yes No Yes No Yes No
of the System?

0 1 0 1 0 1 0 10 1 0 1 0 10 1
w5 Common
Yes No Yes No Yes No Yes No Yes No Yes No Yes No Yes No
Actions ?

Fig. 3: The decision tree based on which the event-series of the employees are assigned
a severity value.

The visualization may be generated either based on the whole data-set


of the database or on queries performed by the auditor in the startup
screen of the system (see Figure 2). As mentioned above, data have to
be preprocessed before producing the visualization such that employees
with strong indication of fraud are distinguished. For this reason, for a
given employee u, the event-series with each client related to u will be
evaluated based on possible fraud patterns and a value indicating the
severity of the related events (within range [0, 1]) will be assigned first
to the event-series and then, to the employee. If several fraud patterns
are identified, employee u is assigned the maximum severity value of the
already calculated event-series related to u. The evaluation is performed
based on a decision tree generated by the following patterns suggested by
the auditors (see Figure 3): (i) There exist more than X events related
to employee u and client c within a time interval of Y days/months
where X, Y are configurable by the auditor (refer to the green rectangle
node of the first layer below the root of the decision tree in Figure 3),
(ii) the employee has performed unauthorized actions (based on a list
of actions provided by the auditors) and, (iii) the employee operated in

5
systems that she is not authorized to use. In the case where patterns (ii)
or (iii) occur, the event-series of the employee is assigned the maximum
value (i.e., value 1) such that the employee is definitely distinguished
in the visualization. However, in the case where pattern (i) occurs, the
investigation has to proceed further. The patterns that are taken into
consideration in this case include the following:
– Event-series periodicity: A common pattern while examining such
fraud schemes is the occurrence of the events in regular time basis.
For instance, an employee modifies intentionally the account of a
client every month within the billing cycle of the account and more
precisely, before its billing date. Assuming that event-series Tu,c re-
lated to employee u and client c is ordered according to the time-
stamps of the events, the system aims to detect similarities between
pattern time-series based on a variation of Longest Common Subse-
quence (LCSS) algorithm for time-series [24] which is robust under
noisy conditions. The pattern time-series include the ideal time-series
if the events between the entities appear in time intervals that equal
exactly to 1, 7, 15, 30 days and other time-series identified in the past
as fraud patterns. In the case where similarity with any of the above
time-series is detected, we consider the event-series of the employee
to be periodical.
– Events occurring outside working hours: Fraudulent activities
usually occur outside working hours, on weekends, on holidays or at
the end of the employee’s shift. For this reason, if such events occur
they have to be taken under consideration.
– Employee frequency in recorded systems: Each employee ac-
cording to her responsibilities operates in specific business systems.
If this is not the case, then the employee has to justify the recorded
event. Also, in several systems such as fraud management systems
(FMS), it is expected that an employee monitors the activity of a
suspicious client. Hence, events stemming from these systems have
to be given smaller weight.
– Actions taken by the employee: Similarly to the previous case,
there exist some actions that an employee is unlikely, but not unau-
thorized to perform since they do not conform to her responsibilities.
The idea behind the decision tree was to correspond to each layer one
fraud pattern and create a path according to the result of the examination
on each pattern. We consider the importance of patterns based on their
corresponding layer in the decision tree, such that the higher ones (closer
to the root) are more important. Let x = [x1 , x2 , x3 , x4 , x5 ] be the pattern
vector examined (e.g., x = [0, 0, 0, 0, 0] corresponds to non-fraudulent
activity) and let y = [y1 , . . . , y5 ] be the vector resulting from the traversal
of the decision tree from its root to its leaves according to the evaluation
of the events of pair (u, c) on each factor. If the examination of the events
leads to “Unauthorized action” or “Unauthorized system” the event-series
is directly assigned value 1. Else, each tree layer i is assigned a weight,
say wi , i = 1, . . . , 5, based on the formula presented in [22] such that
dissimilarities between the two vectors that occur at higher levels of the
tree will be more important. For this reason, the distance between vectors
x, y, say d(x, y), representing the dissimilarity by the pattern vector,

6
is calculated by applying qthe normalized Weighted qPEuclidean Distance
P5 2 5
metric formula d(x, y) = i=1 (xi − yi ) ∗ wi / i=1 wi . This value
(or the maximum of the already calculated values, if more than one fraud
patterns exist) corresponds to the severity value of the event-series of
employee u, which is the value finally assigned to the employee. Based on
these values the system generates a heat-map representing all employees
by rectangular nodes and gradient colors from blue to red (refer to the
upper-left heat-map of Figure 1), such that nodes with color close to
color-red represent employees with strong indications of fraud, whereas
blue colored nodes employees with no suspicion of fraud. Similarly to the
severity calculation of the event-series of employees, the system assigns
also a severity value to clients based on the above patterns. The only
difference is that for a given client c, the severity value is calculated
on all the events related to c (not only the ones that concern a specific
employee). In this manner, a client involved in suspicious activity with
two or more employees will be distinguished.

4 Description of the System

The visualization window3 consists of two heat-maps representing the


severity of the activity of employees and clients (refer to the upper-left
and the bottom-left panels of Figure 1). Although a great deal of research
has focused on the deficiencies of rainbow color maps [5], they are still
widely used in the visualizations, since their effectiveness depends on
the nature of the data. In our data-sets, it was necessary to have three
colors (red, green, blue) representing clearly the severity value of the
examined entities (high, medium, low, resp), and thus, we have chosen
this type of color map. We have tried different types of color maps, but
the result was more misleading during the investigation. The particular
selection of colors was not confusing to the auditors since fraud cases
appear rarely in a company which implies that few red-colored rectangles
appear eventually in the heat-maps.
At startup only the heat-map representing the activity of the employees
is generated. The auditor selects a threshold value that determines the
employees that will be presented based on their severity values. The
visualization is animated and each time an employee together with her
related clients is illustrated. Before an employee is presented, the heat-
map that corresponds to the activity of related clients is generated.
The main visualization of the system is a multi-layer radial drawing
where each layer L1 , . . . , L5 (see Figure 4) represents a different aspect
of the audit data (systems, actions, within working hours or not, periodic-
ity) and each circular sector corresponds to an entity (employee, client or
cluster). Then, a graph is generated; its nodes correspond to each of the
above entities, whereas its edges correspond to the connections among
them. The innermost layer of the visualization (layer L1 of Figure 4)
accommodates the nodes of the graph (drawn as portions of a ring).
3
The reader is suggested to print the paper in color.

7
Fig. 4: Description of the main-visualization of the system. “Fat” edges (unless referring
to cluster nodes) and in particular, the red-colored ones may be indications of
fraud that have to be further examined.

Nodes representing employees are drawn to the left-part of the visualiza-


tion where the ones representing their related clients to the right part. To
avoid cluttering the visualization with nodes representing clients with no
indication of fraud, the auditor can specify thresholds that split clients in
one or two clusters (low-severity cluster and/or medium-severity cluster)
according to their severity values. These nodes are accommodated on top
and to the bottom part of the visualization. The color of the nodes (apart
from the ones representing clusters and the gray-colored ones that will
be explained later) follows the color of the corresponding entities in the
heat-maps. The light-blue (green, resp) colored cluster-node corresponds
to the low (medium, resp) severity cluster (refer to reference points 1
and 2 of Figure 4, resp.). Regarding the edges of the drawing, the system
supports either circular arc edges or straight-line edges. The thickness of
an edge is proportional to the number of connections between the em-
ployee and the client while its color is determined by the color of the
client. In a fraud context, “fat” edges (unless referring to cluster nodes)
and in particular, the red-colored ones will be indications of fraud that
have to be further examined.
Subsequent layers (i.e., layers L2 − L5 ) represent the patterns described
in Section 3 and are split into two regions A and B (refer to Figure 4).
Region A represents a heat-map indicating the result of the examination
of the entity in the specific pattern. Red color indicates identification of
suspicious pattern. Region B illustrates information about each corre-
sponding examined pattern. In layer L2 , the different business systems
related to the each of the entities are represented. Each such system is

8
characterized by a specific color and occupies space proportional to the
corresponding aggregate percentage of use by the entity. For each em-
ployee, this percentage is calculated based on the aggregated percentage
of use on all clients that are currently drawn in the visualization, whereas
for each client based on the percentage of use by the employee currently
visualized (unless more than one employees related to a client are drawn
simultaneously in the visualization). Similarly, for cluster nodes the ag-
gregated percentage of use for all clients that belong to the cluster is
calculated. Systems for which the employee is an unauthorized or not a
frequent user are marked by an X (refer to reference point 3 of Figure 4).
Layer L3 corresponds to the actions reported for each entity, and are
drawn in a similar manner as the ones in layer L2 . Again unauthorized
or suspicious actions are marked with an X. Layer L4 represents the
percentage of events that occur within or outside working hours. The
light-blue colored parts represent events occurring within working hours,
whereas the light-red colored parts indicate the existence of events occur-
ring outside working hours (e.g., see reference points 4 and 5 of Figure 4,
resp). For each client node there exists an additional layer (refer to L5
of Figure 4, e.g., see reference point 6) that indicates whether or not the
event-series of the client is periodical. The event-series is compared with
the pattern time-series currently stored in the system and a heat-map is
generated indicating the degree of similarity with each pattern. Again,
light-red colors indicate suspicious cases.
Regarding the investigation procedure, as already mentioned, the audi-
tor specifies a threshold and the employees with assigned severity value
above the threshold are presented in the visualization one by one, to-
gether with their related clients. The auditor is able to start, pause or
stop the video and process the visualization. In the case where a client
node is selected, additional employees related to the client can be added
to the visualization which facilitates the possible identification of two
or more employees that may cooperate in committing fraud (the case
where an employee node is selected is treated similarly). In Figure 4,
gray colored nodes (see reference point 7) represent nodes added during
post-processing when a client node is selected (refer to the node pointed
by the green arrow of Figure 4). In the case where more than one node
representing employees exist simultaneously in the visualization, the au-
ditor is able to select one of them and add the related clients to the
visualization. Gray-color is utilized for non-selected employee nodes to-
gether with their edges and related clients (if they are not related also
to the selected employee) to avoid distracting the auditor.
Since it is possible to switch between employees during the investigation,
if the post-processing of a case is completed, the animation resumes from
the last visualized employee before pausing the animation. In the case
where a cluster node is selected, the corresponding rectangles in the heat-
map representing clients are marked with an X such that they can be
added (if desired) to the visualization. However, the system permits a
specific number of additions of employee or client nodes in order to avoid
cluttering the visualization area. If this number is exceeded, the system
optionally is able to produce a visualization where only the inner-most
layer of the radial drawing (i.e., the one corresponding to the clients and

9
employees) is drawn along with the relations between them. Again, the
width of the edges is proportional to the number of events that relate
two entities. In the case where further investigation is needed the auditor
selects the desired node (which becomes larger) and the other layers of
the radial drawing (i.e., L2 − L5 ) that correspond to the particular node
appear in the visualization. In this manner, the system is able to visualize
simultaneously a larger set of entities and reveal the relations between
them. However, this may slow down the investigation process and for this
reason, we adopted both the animation approach and the simultaneous
visualization of all layers for each node.
The ordering of the nodes representing the employees is performed ac-
cording to their severity value (the more suspicious nodes are presented
first in the animation). The clients appear in arbitrary order since no
crossings between edges connecting a particular employee with her re-
lated clients can exist. The only crossings that may occur are caused
by gray-colored employees related to clients already visualized and since
these edges are also gray-colored, they do not confuse the auditor (see
Figure 4). The gray-colored employees that are added in the visualiza-
tion are placed either on the top or to the bottom of the already placed
employee-nodes to retain the relative positions of the already placed
nodes. We also have chosen not to apply crossing minimization heuris-
tics since, the addition of new nodes to the visualization may imply a
rearrangement of the positions of the already placed nodes, disturbing
the underlying mental map.

Fig. 5: A time-line plot representing the event-series for the selected pair of entities.

Figure 5 accommodates a time-line plot representing the event-series


for the selected pair of entities (refer to the blue series) where the x-
axis corresponds to the date of the occurrence of each event and the y-
axis to the number of events occurred during the specific date. The red
drawn columns represent the billing dates of the account of the specific
client. This plot facilitates the identification of possible periodic activity
especially close to the billing date of the account of the client. In employee
fraud schemes, it is also possible that the event-series related to a pair
of entities is periodical only based on a specific action. For this reason,
we have incorporated in the system a second plot (refer to Panel #5 of
Figure 6) that presents all reported actions by distinct series. In this plot,
the y−axis corresponds to the day of the month that event x related to
a specific action occurred (e.g., the first event related to an action that
occurred on the 15th day of a month will be drawn on point (1, 15). For
this case, we ignore multiple occurrences of events related to the same
action occurred the same day of the month. In the case where periodicity

10
occurs for a specific action, the corresponding series will have part (or
the whole series) almost parallel to x−axis.

Fig. 6: A periodicity-plot that presents all reported actions by distinct series for the
selected pair of entities.

The system supports mechanisms to detect patterns of unusual employee


behavior such as unauthorized access to computers, business systems and
accounts of employees or clients by producing a parallel coordinates plot
(see Figure 7). Each record consists of an employee, a time-stamp, an IP
indicating the address of the employee’s computer, a computer name and
an action (i.e., login, login failure, etc). The size of the nodes and the
edges is proportional to the number of their occurrences in the database.
The nodes on each layer are ordered by their number of occurrences
in the data-set. The patterns used in this scenario include the follow-
ing: (i) More than X failed login attempts within a time interval of Y
days/months, where X, Y are configurable by the auditor, and (ii) Login
attempts or failed login attempts occurring from different IPs and/or
computer names. The visualization of Figure 4 can also be adapted to
this scenario -either with or without the preprocessing step (since the
number of employees and actions is manageable in a radial drawing)- by
substituting the nodes representing the clients by nodes that represent
actions.

5 Case Study
In this section, we present the results of the evaluation of the system on
real data-sets stemming from two control systems of a telecommunication

11
Fig. 7: A parallel coordinates plot to monitor failed login attempts in a specific system.

company. All data provided to us were anonymous for security and data
privacy reasons. The data-set consists of approximately 180.637 entries
lying within a time interval of six months. The data-set consists of 710
distinct employees and 83.030 distinct clients. In the data-set, 66.2% of
clients had only one occurrence, 31.6% between 2 and 5, 1.4% between
5 and 10 while the remaining ones (i.e., 0.8%) had more than 10 occur-
rences. The auditors have also included a set of entries corresponding
to a “fictional” fraud case scenario where an employee modifies the ac-
count of a client. However, we were not communicated any information
regarding the billing date of the accounts of the clients.
Since one of the data-sets stems from a fraud management system, it is
expected that reoccurring activity between the same pair of employee and
client will occur (a “suspicious” client reported by a fraud management
system is expected to be supervised by an employee). For this reason,
we concentrated our study in identifying pairs of employees and clients
that appear to have more that 10 related events. The system identified
41 employees (5.8% of the total number of employees) that were related
with the same client with more than 10 events (refer to the orange and
red colored rectangles of the heat-map of Figure 8).
For each of the above employees we had to calculate the similarity of
the event-series with pattern time-series (see Section 3) in order to de-
tect periodicity. In particular, the auditors were interested for periodic
events that occur monthly (i.e., periodicity of 28 up to 30 days). Thus,
we could consider only the pattern time-series (refer to Section 3) that
corresponds to monthly activity. However, we decided to calculate the
severity values based on all pattern time-series in order to distinguish
any reoccurring activity and decide afterwards through the visualization

12
if suspicious activity really exists. Even though, this approach creates
more false positives that have to be investigated, it ensures that other
possible periodic events will not omitted. Among these 41 employees
that were related with the same client with more than 10 events, 17 of
them (2.4% of the total number of employees) appear to have periodic
activity and events occurring outside working hours (refer to the red-
colored rectangles of Figure 8), while the remaining ones (24 employees;
3.4% of the total number of employees) had events occurring outside
working hours (refer to the orange-colored rectangles of Figure 8). Also,
no employee had performed unauthorized access to systems or had used
unauthorized actions, whereas only one employee had used non common
actions. The results were communicated to the auditors who investigated
whether there exist real indications of fraud.

Fig. 8: A heat-map indicating the severity values for the employees participated in the
case study.

In the following, we present some of the first frames of the animation in


order to describe the investigation procedure. Figure 9 indicates a highly-
ranked employee (i.e., Employee-29). Employee-29 was related to four
clients, i.e., Client-1, Client-2, Client-3 and Client-4, more than 10 times
using one system for which the employee was a frequent user and common
actions (see reference points 1 and 2 of Figure 9; the parts representing
the systems and the actions are not marked by an X, which implies that
the user was frequent for the specific systems or had performed common
actions, and the corresponding heat-maps are blue-colored; see reference
points 3 and 4 of Figure 9). All other clients related to Employee-29,
where placed in the low-severity cluster, whereas no client was placed
in the medium-severity cluster (see reference points 5 and 6 of Figure 9,
resp.). Most of the events related to Employee-26 (see reference point 7

13
of Figure 9). The layer of the visualization which accommodates the ac-
tions that Employee-26 has performed, is split in two parts representing
“Action 6” and “Other”. In the “Other” action, we have clustered actions
whose percentage of use was too small (< 0.1%) to be visualized. The
problem was caused by the large number of possible actions in this par-
ticular system which does not permit the simultaneous visualization of
all actions. However, if one of the clustered actions is not common for a
particular employee, the red-colored heat-map in the corresponding layer
and the X marking in the part representing the “Other” action would
reveal the problem to the auditor.

Fig. 9: A frame of the animation indicating the activity of highly-ranked Employee-26.

We will now proceed to describe the investigation procedure for each of


the four clients. Client-1 and Client-2 have the majority of the events
that relate them with Employee-29 occurring outside working hours (re-
fer to the corresponding layer of Client-1 and Client-2). However, they
appear to have a small similarity (less than 50%) with only one fraud
pattern time-series. Note that, the pattern time-series that corresponds
to periodicity of one month corresponds to the first heat-map of the each
periodicity layer (see e.g., reference point 8 of Figure 9). Since the audi-
tors were interested in events that occur monthly, these clients were not
considered suspicious. Regarding Client-3 and Client-4, they have also
the majority of the events that relate them with Employee-29 occurring

14
outside working hours but, they appear to have strong periodic activity.
In particular, there exist six heat-maps indicating strong similarities with
pattern time-series including the one that represents monthly periodic
activity. For this reason, these clients have to be further investigated.

Fig. 10: All employees related to Client-3 are added in the visualization. The gray-
colored node corresponds to the new employee added (Employee-5) who sur-
prisingly is related to almost all high-ranked clients of Employee-29.

Client-4 was related to no other employee (when we selected the node


that corresponds to Client-4, no other employee was added to the visu-
alization). However, when we selected Client-3, a new employee, referred
to as Employee-5 (see the gray-colored node of the visualization; refer-
ence point 1 of Figure 10), was added to the visualization. The interest-
ing thing was that Employee-5 was related with the same high-ranked
clients as Employee-29 (except for Client-4; see reference points 2, 3 and
4 of Figure 10). There exist two possible explanations for this scenario.
Either two employees are both responsible for monitoring the activity of
the clients or these employees are accomplice to fraud. Thus, the inves-
tigation has to proceed further. The time-line plot of Figure 11a repre-
sents the time-stamps of the events relating Employee-29 and Client-3.
Obviously, there exists a continuous activity between the two entities.
However, this activity according to the auditors resembles more to mon-
itoring activity rather than to a fraud pattern. This assumption is also

15
reinforced by Figure 11b which represents all reported actions for the
selected pair of entities. In particular, only one action is performed by
Employee-29 towards Client-3 and according to the auditors this action
is part of a monitoring procedure. Studying, in a similar manner, the
time-line plot and the periodicity plot for Client-4, the auditors claimed
that there existed no indications of fraud (the recorded actions were also
part of a monitoring procedure).

(a)

(b)

Fig. 11: (i) The time-line plot for Employee-40 and Client-6 indicating an obvious
monthly activity. (ii) A plot illustrating the periodicity of performed actions
for Employee-40 and Client-6.

Figure 12 depicts another highly-ranked employee (i.e., Employee-26).


Employee-26 was involved only with Client-5, more than 10 times using
one system for which the employee was a frequent user and common
actions (see reference points 1, 2 3 and 4 of Figure 12). Again, all other
clients related to Employee-26, where placed in the low-severity cluster,
whereas no client was placed in the medium-severity cluster (see reference
points 5 and 6 of Figure 12, resp.). Also, Client-5 was related only to
Employee-26 (when we selected the node that corresponds to Client-
5, no other employee was added to the visualization). However, almost
all events that relate the two entities appear outside the working hours
(see reference point 7 of Figure 12). The visualization of Figure 12 also
indicates a strong periodic activity (refer to the heat-maps in the last
layer of the part of the visualization that corresponds to Client-5; see

16
reference point 8). In the next step, we examined the time-line plot for the
two entities (refer to Figure 13). One can distinguish a monthly periodic
activity between June and September (i.e., the actual dates are 22/6,
21/7, 25/9), even with small gaps (no entries in August) and some noisy
data (i.e., 7/7). The auditors that examined the case determined that
there were no indications of fraud in this particular case mostly because
of the actions performed by the employee which were again common
monitoring actions. Client-5 was reported by a fraud management system
as a suspicious client and thus, Employee-26 was monitored her in regular
time basis.

Fig. 12: A frame of the animation indicating the activity of a Employee-26.

Figure 14 illustrates the fictional fraud case which was added by the
auditors. In this scenario, Employee-40 is related to Client-6 using two
business systems for which she is a frequent user and common actions
(see reference points 1, 2, 3 and 4 of Figure 14). All the events occur
within working hours (see reference point 5 of Figure 14). However, the
event-series of Employee-40 appears to have strong similarity with six
fraud pattern time-series, including the one that corresponds to period-
icity of one month, which is represented by the first heat-map of the
each periodicity layer (see reference point 6 of Figure 14).Also, Client-6

17
Fig. 13: The corresponding time-line plot for Employee-26 and Client-5.

was not related to any other employee (when selected, no other employee
was added to the visualization). All other clients related to Employee-40,
where placed in the low-severity cluster, whereas no client was placed in
the medium-severity cluster (see reference points 7 and 8 of Figure 14,
resp.). Even though, this visualization resembles a lot to the one of
Employee-26 (see Figure 12), the time-line plot (see Figure 15) and the
periodicity plot (see Figure 16) explain why this case is considered as
fraud. The first suspicions according to the auditors, are raised by the
fact that there exists activity between the two entities stemming from
two business systems (they take also into consideration the type of the
systems; an information that was not communicated to us in full detail).
The auditors explained to us that the time-line plot (refer to Figure 15)
matches to a fraud case scenario according to which there exists some
activity at the beginning (between April - May) with no specific period-
icity and then, appears periodic activity (from May to September). In
the first time interval, the fraudster is trying to plan and organize her
fraud by performing a number of actions. Once the fraud is organized,
only periodic actions are required. Another suspicious fact in this case
is that the events from May to September occur close to the same dates
of each month (from 10th to 15th). Of course, there exist some “noisy”
data that have to be excluded in order to understand the fraud pattern.
These may have been caused either on purpose to cover up the fraud or
were part of the duties of the employee.
The above assumption is reinforced by the plot of Figure 16 which reveals
the periodic occurrence of each performed action. For instance, “Action
101” (see reference point 1) appears to have periodicity around the 15th
day of the month from its second occurrence and later. Also, “Action 107”
(see reference point 2) appears to have periodicity around the 11th-13th
day of the month from its third occurrence and later. In particular, the
vast majority of events are recorded between the 10th and the 15th day
of the month. We could be more convinced that this case consists fraud
if we knew exactly the billing cycle of the account of the client.
In a similar manner, the frames of the animation illustrating the other
highly-ranked employees were investigated. In particular, we were given
more attention to the 17 frames of the animation containing periodic
events. Since the data-sets provided for the case analysis were sensitive,
we were not communicated many details about the final results of the
investigation. Fortunately for the company, the only real evidence of
fraud existed in the fictional data added by the auditors. However, the
auditors had not identified all these cases while examining the data-sets
manually and they had to make an additional investigation for them.

18
Fig. 14: A frame of the animation illustrating the activity of Employee-40. This case
corresponds to a fictional fraud case scenario.

Fig. 15: The time-line plot for Employee-40 and Client-6 indicating an obvious
monthly activity.

6 CONCLUSIONS AND FUTURE WORK

We presented an integrated fraud management visualization system that


aims to identify patterns that may conceal occupational fraud through
a combination of pattern recognition and visualization. Our work opens
several aspects for future work such as incorporation of more fraud pat-
terns, use of more statistical methods and, extension of the system in
order to identify more complicated fraud schemes (client fraud, telecom-
munication fraud, etc.) in a wider variety of business systems.

19
Fig. 16: A plot indicating the periodic pattern for the performed actions of Employee-
40 and Client-6.

ACKNOWLEDGEMENTS
The work of Evmorfia N. Argyriou has been co-financed by the Euro-
pean Union (European Social Fund - ESF) and Greek national funds
through the Operational Program "Education and Lifelong Learning" of
the National Strategic Reference Framework (NSRF) - Research Fund-
ing Program: Heracleitus II. Investing in knowledge society through the
European Social Fund.

20
References
1. E. N. Argyriou, A. Sotiraki, and A. Symvonis. Occupational fraud
detection through visualization. In ISI, pages 4–7, 2013.
2. E. N. Argyriou and A. Symvonis. Detecting periodicity in serial data
through visualization. In ISVC, volume 7432, pages 295–304, 2012.
3. Association of Certified Fraud Examiners. Report to the Nation on
Occupational Fraud and Abuse, 2012.
4. R. J. Bolton and D. J. Hand. Statistical fraud detection: A review.
Statistical Science, 17:2002, 2002.
5. D. Borland and R. M. Taylor II. Rainbow color map (still) considered
harmful. IEEE Comput. Graph. Appl., 27(2):14–17, 2007.
6. U. Brandes, P. Kenis, and D. Wagner. Communicating centrality in
policy network drawings. IEEE Transactions on Visualization and
Computer Graphics, 9(2):241–253, 2003.
7. R. Chang, A. Lee, M. Ghoniem, R. Kosara, and W. Ribarsky. Scal-
able and interactive visual analysis of financial wire transactions for
fraud detection. Information Visualization, 7(1):63–76, 2008.
8. F. Chevenet, C. Brun, A.-L. Banuls, B. Jacq, and R. Christen. Tree-
dyn: towards dynamic graphics and annotations for analyses of trees.
BMC Bioinformatics, 7(1):1–9, 2007.
9. W. Didimo, G. Liotta, and F. Montecchiani. Vis4aui: Visual analysis
of banking activity networks. In GRAPP/IVAPP, pages 799–802,
2012.
10. W. Didimo, G. Liotta, F. Montecchiani, and P. Palladino. An ad-
vanced network visualization system for financial crime detection. In
PacificVis, pages 203–210, 2011.
11. S. N. Dorogovtsev and J. F. F. Mendes. Evolution of Networks:
From Biological Nets to the Internet and WWW (Physics). Oxford
University Press, Inc., New York, NY, USA, 2003.
12. W. Eberle and L. B. Holder. Mining for insider threats in business
transactions and processes. In CIDM, pages 163–170, 2009.
13. E. D. Giacomo, W. Didimo, G. Liotta, and P. Palladino. Visual
analysis of financial crimes. In AVI, pages 393–394, 2010.
14. H. G. Goldberg and T. E. Senator. Restructuring databases for
knowledge discovery by consolidation and link formation. In KDD,
pages 136–141, 1995.
15. M. L. Huang, J. Liang, and Q. V. Nguyen. A visualization approach
for frauds detection in financial market. IV ’09, pages 197–202, 2009.
16. J. D. Kirkland, T. E. Senator, J. J. Hayden, T. Dybala, H. G. Gold-
berg, and P. Shyr. The nasd regulation advanced detection system
(ads). In AAAI ’98/IAAI ’98, pages 1055–1062, 1998.
17. Y. Kou, C.-T. Lu, S. Sirwongwattana, and Y.-P. Huang. Survey
of fraud detection techniques. In Networking, Sensing and Control,
2004 IEEE Int. Conf., volume 2, pages 749–754, 2004.
18. J. Luell. Employee fraud detection under real world conditions. PhD
thesis, 2010.
19. C. Phua, V. C. S. Lee, K. Smith-Miles, and R. W. Gayler. A compre-
hensive survey of data mining-based fraud detection research. CoRR,
abs/1009.6119, 2010.

21
20. T. E. Senator, H. G. Goldberg, P. Shyr, S. Bennett, S. Donoho, and
C. Lovell. chapter The NASD regulation advanced detection system:
integrating data mining and visualization for break detection in the
NASDAQ stock market, pages 363–371. 2002.
21. T. E. Senator, H. G. Goldberg, J. Wooton, M. A. Cottini, A. F. U.
Khan, C. D. Klinger, W. M. Llamas, M. P. Marrone, and R. W. H.
Wong. The financial crimes enforcement network ai system (fais)
identifying potential money laundering from reports of large cash
transactions. AI Magazine, 16(4):21–39, 1995.
22. W. G. Stillwell, D. A. Seaver, and W. Edwards. A comparison of
weight approximation techniques in multiattribute utility decision
making. Organ. Behavior and Human Performance, 28(1):62 – 77,
1981.
23. SynerScope. 2011. http://www.synerscope.com/.
24. M. Vlachos, M. Hadjieleftheriou, D. Gunopulos, and E. Keogh. In-
dexing multi-dimensional time-series with support for multiple dis-
tance measures. In ACM SIGKDD int. conf. on Knowledge discovery
and data mining, KDD ’03, pages 216–225, 2003.
25. M. Wolverton, P. Berry, I. Harrison, J. Lowrance, D. Morley, A. Ro-
driguez, E. Ruspini, and J. Thomere. Law: A workbench for approx-
imate pattern matching in relational data. In IAAI, pages 143–150,
2003.

22

View publication stats

You might also like