Statistical Fraud Detection - A Review
Statistical Fraud Detection - A Review
235
236 R. J. BOLTON AND D. J. HAND
The development of new fraud detection methods the original data used to build the models. It also re-
is made more difficult by the fact that the exchange quires that one has examples of both classes. Further-
of ideas in fraud detection is severely limited. It does more, it can only be used to detect frauds of a type
not make sense to describe fraud detection techniques which have previously occurred.
in great detail in the public domain, as this gives In contrast, unsupervised methods simply seek those
criminals the information that they require to evade accounts, customers and so forth which are most
detection. Data sets are not made available and results dissimilar from the norm. These can then be examined
are often censored, making them difficult to assess more closely. Outliers are a basic form of nonstandard
(e.g., Leonard, 1993). observation. Tools used for checking data quality can
Many fraud detection problems involve huge data be used, but the detection of accidental errors is a rather
sets that are constantly evolving. For example, the different problem from the detection of deliberately
credit card company Barclaycard carries approximately falsified data or data which accurately describe a
350 million transactions a year in the United King- fraudulent pattern.
dom alone (Hand, Blunt, Kelly and Adams, 2000), This leads us to note the fundamental point that we
The Royal Bank of Scotland, which has the largest can seldom be certain, by statistical analysis alone,
credit card merchant acquiring business in Europe, that a fraud has been perpetrated. Rather, the analysis
carries over a billion transactions a year and AT&T should be regarded as alerting us to the fact that an ob-
carries around 275 million calls each weekday (Cortes servation is anomalous, or more likely to be fraudulent
and Pregibon, 1998). Processing these data sets in than others, so that it can then be investigated in more
a search for fraudulent transactions or calls requires detail. One can think of the objective of the statisti-
more than mere novelty of statistical model, and cal analysis as being to return a suspicion score (where
also needs fast and efficient algorithms: data mining we will regard a higher score as more suspicious than
techniques are relevant. These numbers also indicate a lower one). The higher the score is, then the more
the potential value of fraud detection: if 0.1% of a unusual is the observation or the more like previously
100 million transactions are fraudulent, each losing fraudulent values it is. The fact that there are many
the company just £10, then overall the company loses different ways in which fraud can be perpetrated and
£1 million. many different scenarios in which it can occur means
Statistical tools for fraud detection are many and that there are many different ways to compute suspi-
varied, since data from different applications can be cion scores.
diverse in both size and type, but there are common Suspicion scores can be computed for each record in
themes. Such tools are essentially based on comparing the database (for each customer with a bank account or
the observed data with expected values, but expected credit card, for each owner of a mobile phone, for each
values can be derived in various ways, depending on desktop computer and so on), and these can be updated
the context. They may be single numerical summaries as time progresses. These scores can then be rank
of some aspect of behavior and they are often simple ordered and investigative attention can be focussed on
graphical summaries in which an anomaly is readily those with the highest scores or on those which exhibit
apparent, but they are also often more complex (multi- a sudden increase. Here issues of cost enter: given that
variate) behavior profiles. Such behavior profiles may it is too expensive to undertake a detailed investigation
be based on past behavior of the system being studied of all records, one concentrates investigation on those
(e.g., the way a bank account has been previously used) thought most likely to be fraudulent.
or be extrapolated from other similar systems. Things One of the difficulties with fraud detection is that
are often further complicated by the fact that, in some typically there are many legitimate records for each
domains (e.g., trading on the stock market) a given ac- fraudulent one. A detection method which correctly
tor may behave in a fraudulent manner some of the time identifies 99% of the legitimate records as legitimate
and not at other times. and 99% of the fraudulent records as fraudulent might
Statistical fraud detection methods may be super- be regarded as a highly effective system. However, if
vised or unsupervised. In supervised methods, samples only 1 in 1000 records is fraudulent, then, on average,
of both fraudulent and nonfraudulent records are used in every 100 that the system flags as fraudulent, only
to construct models which allow one to assign new ob- about 9 will in fact be so. In particular, this means
servations into one of the two classes. Of course, this that to identify those 9 requires detailed examination of
requires one to be confident about the true classes of all 100—at possibly considerable cost. This leads us to
STATISTICAL FRAUD DETECTION 237
a more general point: fraud can be reduced to as low a costs of investigating observations and the benefits of
level as one likes, but only by virtue of a corresponding identifying fraud. Moreover, often class membership
level of effort and cost. In practice, some compromise is uncertain. For example, credit transactions may be
has to be reached, often a commercial compromise, labelled incorrectly: a fraudulent transaction may re-
between the cost of detecting a fraud and the savings main unobserved and thus be labeled legitimate (and
to be made by detecting it. Sometimes the issues are the extent of this may remain unknown) or a legit-
complicated by, for example, the adverse publicity imate transaction may be misreported as fraudulent.
accompanying fraud detection. At a business level, Some work has addressed misclassification of training
revealing that a bank is a significant target for fraud, samples (e.g., Lachenbruch, 1966, 1974; Chhikara and
even if much has been detected, does little to inspire McKeon, 1984), but not in the context of fraud detec-
confidence, and at a personal level, taking action which tion as far as we are aware. Issues such as these were
implies to an innocent customer that they may be discussed by Chan and Stolfo (1998) and Provost and
suspected of fraud is obviously detrimental to good Fawcett (2001).
customer relations. Link analysis relates known fraudsters to other
The body of this paper is structured according to individuals using record linkage and social network
different areas of fraud detection. Clearly we cannot methods (Wasserman and Faust, 1994). For example,
hope to cover all areas in which statistical methods can in telecommunications networks, security investigators
be applied. Instead, we have selected a few areas where have found that fraudsters seldom work in isolation
such methods are used and where there is a body of from each other. Also, after an account has been
expertise and of literature describing them. However, disconnected for fraud, the fraudster will often call the
before looking at the details of different application same numbers from another account (Cortes, Pregibon
areas, Section 2 provides a brief overview of some and Volinsky, 2001). Telephone calls from an account
tools for fraud detection. can thus be linked to fraudulent accounts to indicate
intrusion. A similar approach has been taken in money
2. FRAUD DETECTION TOOLS laundering (Goldberg and Senator, 1995, 1998; Senator
As we mentioned above, fraud detection can be su- et al., 1995).
pervised or unsupervised. Supervised methods use a Unsupervised methods are used when there are no
database of known fraudulent/legitimate cases from prior sets of legitimate and fraudulent observations.
which to construct a model which yields a suspicion Techniques employed here are usually a combination
score for new cases. Traditional statistical classifica- of profiling and outlier detection methods. We model
tion methods (Hand, 1981; McLachlan, 1992), such a baseline distribution that represents normal behav-
as linear discriminant analysis and logistic discrimina- ior and then attempt to detect observations that show
tion, have proved to be effective tools for many appli- the greatest departure from this norm. There are sim-
cations, but more powerful tools (Ripley, 1996; Hand, ilarities to author identification in text analysis. Digit
1997; Webb, 1999), especially neural networks, have analysis using Benford’s law is an example of such a
also been extensively applied. Rule-based methods are method. Benford’s law (Hill, 1995) says that the distri-
supervised learning algorithms that produce classifiers bution of the first significant digits of numbers drawn
using rules of the form If {certain conditions}, Then from a wide variety of random distributions will have
{a consequent}. Examples of such algorithms include (asymptotically) a certain form. Until recently, this law
BAYES (Clark and Niblett, 1989), FOIL (Quinlan, was regarded as merely a mathematical curiosity with
1990) and RIPPER (Cohen, 1995). Tree-based algo- no apparent useful application. However, Nigrini and
rithms such as CART (Breiman, Friedman, Olshen and Mittermaier (1997) and Nigrini (1999) showed that
Stone, 1984) and C4.5 (Quinlan, 1993) produce classi- Benford’s law can be used to detect fraud in accounting
fiers of a similar form. Combinations of some or all of data. The premise behind fraud detection using tools
these algorithms can be created using meta-learning al- such as Benford’s law is that fabricating data which
gorithms to improve prediction in fraud detection (e.g., conform to Benford’s law is difficult.
Chan, Fan, Prodromidis and Stolfo, 1999). Fraudsters adapt to new prevention and detection
Major considerations when building a supervised measures, so fraud detection needs to be adaptive and
tool for fraud detection include those of uneven class evolve over time. However, legitimate account users
sizes and different costs of different types of misclas- may gradually change their behavior over a longer
sification. We must also take into consideration the period of time and it is important to avoid spurious
238 R. J. BOLTON AND D. J. HAND
alarms. Models can be updated at fixed time points or and the company decreases and revenue is lost, in
continuously over time; see, for example, Burge and addition to the direct losses made through fraudulent
Shawe-Taylor (1997), Fawcett and Provost (1997a), sales. Because of the potential for loss of sales due to
Cortes, Pregibon and Volinsky (2001) and Senator loss of confidence, in general, the merchants assume
(2000). responsibility for fraud losses, even when the vendor
Although the basic statistical models for fraud detec- has obtained authorization from the card issuer.
tion can be categorized as supervised or unsupervised, Credit card fraud may be perpetrated in various ways
the application areas of fraud detection cannot be de- (a description of the credit card industry and how it
scribed so conveniently. Their diversity is reflected in functions is given in Blunt and Hand, 2000), including
their particular operational characteristics and the va- simple theft, application fraud and counterfeit cards.
riety and quantity of data available, both features that In all of these, the fraudster uses a physical card, but
drive the choice of a suitable fraud detection tool. physical possession is not essential to perpetrate credit
card fraud: one of the major fraud areas is “cardholder-
3. CREDIT CARD FRAUD not-present” fraud, where only the card’s details are
given (e.g., over the phone).
The extent of credit card fraud is difficult to quan- Use of a stolen card is perhaps the most straightfor-
tify, partly because companies are often loath to re- ward type of credit card fraud. In this case, the fraud-
lease fraud figures in case they frighten the spending ster typically spends as much as possible in as short a
public and partly because the figures change (prob- space of time as possible, before the theft is detected
ably grow) over time. Various estimates have been and the card is stopped; hence, detecting the theft early
given. For example, Leonard (1993) suggested the cost can prevent large losses.
of Visa/Mastercard fraud in Canada in 1989, 1990 Application fraud arises when individuals obtain
and 1991 was $19, 29 and 46 million (Canadian), re- new credit cards from issuing companies using false
spectively. Ghosh and Reilly (1994) suggested a fig- personal information. Traditional credit scorecards
ure of $850 million (U.S.) per year for all types of (Hand and Henley, 1997) are used to detect customers
credit card fraud in the United States, and Aleskerov, who are likely to default, and the reasons for this may
Freisleben and Rao (1997) cited estimates of $700 mil- include fraud. Such scorecards are based on the de-
lion in the United States each year for Visa/Mastercard tails given on the application forms and perhaps also
and $10 billion worldwide in 1996. Microsoft’s Expe- on other details such as bureau information. Statistical
dia set aside $6 million for credit card fraud in 1999 models which monitor behavior over time can be used
(Patient, 2000). Total losses through credit card fraud to detect cards which have been obtained from a fraud-
in the United Kingdom have been growing rapidly over ulent application (e.g., a first time card holder who runs
the last 4 years [1997, £122 million; 1998, £135 mil- out and rapidly makes many purchases should arouse
lion; 1999, £188 million; 2000, £293 million. Source: suspicion). With application fraud, however, urgency
Association for Payment Clearing Services, London is not as important to the fraudster and it might not be
(APACS)] and recently APACS reported £373.7 mil- until accounts are sent out or repayment dates begin to
lion losses in the 12 months ending August 2001. pass that fraud is suspected.
Jenkins (2000) says “for every £100 you spend on a Cardholder-not-present fraud occurs when the trans-
card in the UK, 13p is lost to fraudsters.” Matters are action is made remotely, so that only the card’s details
complicated by issues of exactly what one includes in are needed, and a manual signature and card imprint
the fraud figures. For example, bankruptcy fraud arises are not required at the time of purchase. Such transac-
when the cardholder makes purchases for which he/she tions include telephone sales and on-line transactions,
has no intention of paying and then files for personal and this type of fraud accounts for a high proportion of
bankruptcy, leaving the bank to cover the losses. Since losses. To undertake such fraud it is necessary to obtain
these are generally regarded as charge-off losses, they the details of the card without the cardholder’s knowl-
often are not included in fraud figures. However, they edge. This is done in various ways, including “skim-
can be substantial: Ghosh and Reilly (1994) cited one ming,” where employees illegally copy the magnetic
estimate of $2.65 billion for bankruptcy fraud in 1992. strip on a credit card by swiping it through a small
It is in a company and card issuer’s interests to handheld card reader, “shoulder surfers,” who enter
prevent fraud or, failing this, to detect fraud as soon card details into a mobile phone while standing be-
as possible. Otherwise consumer trust in both the card hind a purchaser in a queue, and people posing as credit
STATISTICAL FRAUD DETECTION 239
card company employees taking details of credit card where the aim is to illustrate new data analytic tools by
transactions from companies over the phone. Counter- applying them to the detection of fraud, rather than
feit cards, currently the largest source of credit card to describe methods of fraud detection per se. Fur-
fraud in the United Kingdom (source: APACS), can thermore, since anomaly detection methods are very
also be created using this information. Transactions context dependent, much of the published literature
made by fraudsters using counterfeit cards and mak- in the area concentrates on supervised classification
ing cardholder-not-present purchases can be detected methods. In particular, rule-based systems and neural
through methods which seek changes in transaction networks have attracted interest. Researchers who have
patterns, as well as checking for particular patterns used neural networks for credit card fraud detection
which are known to be indicative of counterfeiting. include Ghosh and Reilly (1994), Aleskerov et al.
Credit card databases contain information on each (1997), Dorronsoro, Ginel, Sanchez and Cruz (1997)
transaction. This information includes such things as and Brause, Langsdorf and Hepp (1999), mainly in the
merchant code, account number, type of credit card, context of supervised classification. HNC Software has
type of purchase, client name, size of transaction and developed Falcon, a software package that relies heav-
date of transaction. Some of these data are numerical ily on neural network technology to detect credit card
(e.g., transaction size) and others are nominal categor- fraud.
ical (e.g., merchant code, which can have hundreds of Supervised methods, using samples from the fraud-
thousands of categories) or symbolic. The mixed data ulent/nonfraudulent classes as the basis to construct
types have led to the application of a wide variety of classification rules to detect future cases of fraud, suf-
statistical, machine learning and data mining tools. fer from the problem of unbalanced class sizes men-
Suspicion scores to detect whether an account has tioned above: the legitimate transactions generally far
been compromised can be based on models of individ- outnumber the fraudulent ones. Brause, Langsdorf and
ual customers’ previous usage patterns, standard ex-
Hepp (1999) said that, in their database of credit
pected usage patterns, particular patterns which are
card transactions, “the probability of fraud is very
known to be often associated with fraud, and on su-
low (0.2%) and has been lowered in a preprocessing
pervised models. A simple example of the patterns ex-
step by a conventional fraud detecting system down
hibited by individual customers is given in Figure 16 of
to 0.1%.” Hassibi (2000) remarked that “out of some
Hand and Blunt (2001), which shows how the slopes of
12 billion transactions made annually, approximately
cumulative credit card spending over time are remark-
10 million—or one out of every 1200 transactions—
ably linear. Sudden jumps in these curves or sudden
changes of slope (transaction or expenditure rate sud- turn out to be fraudulent. Also, 0.04% (4 out of every
denly exceeding some threshold) merit investigation. 10,000) of all monthly active accounts are fraudulent.”
Likewise, some customers practice “jam jarring”— It follows from this sort of figure that simple mis-
restricting particular cards to particular types of pur- classification rate cannot be used as a performance
chases (e.g., using a given card for petrol purchases measure: with a bad rate of 0.1%, simply classifying
only and a different one for supermarket purchases), every transaction as legitimate will yield an error rate
so that usage of a card to make an unusual type of pur- of only 0.001. Instead, one must either minimize an
chase can trigger an alarm for such customers. At a appropriate cost-weighted loss or fix some parameter
more general level, suspicion scores can also be based (such as the number of cases one can afford to investi-
on expected overall usage profiles. For example, first gate in detail) and then try to maximize the number of
time credit card users are typically initially fairly ten- fraudulent cases detected subject to the constraints.
tative in their usage, whereas those transferring loans Stolfo et al. (1997a, b) outlined a meta-classifier
from another card are generally not so reticent. Fi- system for detecting credit card fraud that is based
nally, examples of overall transaction patterns known on the idea of using different local fraud detection
to be intrinsically suspicious are the sudden purchase tools within each different corporate environment and
of many small electrical items or jewelry (goods which merging the results to yield a more accurate global
permit easy black market resale) and the immediate use tool. This work was elaborated in Chan and Stolfo
of a new card in a wide range of different locations. (1998), Chan, Fan, Prodromidis and Stolfo (1999) and
We commented above that, for obvious reasons, Stolfo et al. (1999), who described a more realistic
there is a dearth of published literature on fraud de- cost model to accompany the different classification
tection. Much of that which has been published ap- outcomes. Wheeler and Aitken (2000) also explored
pears in the methodological data analytic literature, the combination of multiple classification rules.
240 R. J. BOLTON AND D. J. HAND
4. MONEY LAUNDERING credit card industry. Whereas credit card fraud comes
to light fairly early on, in money laundering it may be
Money laundering is the process of obscuring the
years before individual transfers or accounts are defin-
source, ownership or use of funds, usually cash, that
itively and legally identified as part of a laundering
are the profits of illicit activity. The size of the problem
process. While, in principle (assuming records have
is indicated in a 1995 U.S. Office of Technology As-
been kept), one could go back and trace the relevant
sessment (OTA) report (U.S. Congress, 1995): “Fed-
transactions, in practice not all of them would be iden-
eral agencies estimate that as much as $300 billion is
tified, so detracting from their use in supervised de-
laundered annually, worldwide. From $40 billion to tection methods. Furthermore, there is typically less
$80 billion of this may be drug profits made in the extensive information available for the account hold-
United States.” Prevention is attempted by means of le- ers in investment banks than there is in retail banking
gal constraints and requirements—the burden of which operations. Developing more detailed customer record
is gradually increasing—and there has been much de- systems might be a good way forward.
bate recently about the use of encryption. However, no As with other areas of fraud, money laundering
prevention strategy is foolproof and detection is essen- detection works hand in hand with prevention. In 1970,
tial. In particular, the September 11th terrorist attacks for example, in the United States the Bank Secrecy
on New York City and the Pentagon have focused at- Act required that banks report all currency transactions
tention on the detection of money laundering in an of over $10,000 to the authorities. However, also
attempt to starve terrorist networks of funds. as in other areas of fraud, the perpetrators adapt
Wire transfers provide a natural domain for launder- their modus operandi to match the changing tactics
ing: according to the OTA report, each day in 1995 of the authorities. So, following the requirement of
about half a million wire transfers, valued at more than banks to report currency transactions of over $10,000,
$2 trillion (U.S.), were carried out using the Fedwire the obvious strategy was developed to divide larger
and CHIPS systems, along with almost a quarter of a sums into multiple amounts of less than $10,000 and
million transfers using the SWIFT system. It is esti- deposit them in different banks (a practice termed
mated that around 0.05–0.1% of these transactions in- smurfing or structuring). In the United States, this is
volved laundering. Sophisticated statistical and other now illegal, but the way the money launderers adapt
on-line data analytic procedures are needed to detect to the prevailing detection methods can lead one to
such laundering activity. Since it is now becoming a le- the pessimistic perspective that only the incompetent
gal requirement to show that all reasonable means have money launderers are detected. This, clearly, also
been used to detect fraud, we may expect to see even limits the value of supervised detection methods: the
greater application of such tools. patterns detected will be those patterns which were
Wire transfers contain items such as date of transfer, characteristic of fraud in the past, but which may
identity of sender, routing number of originating bank, no longer be so. Other strategies used by money
identity of recipient, routing number of recipient bank launderers which limit the value of supervised methods
and amount transferred. Sometimes those fields not include switching between wire and physical cash
needed for transfer are left blank, free text fields may movements, the creation of shell businesses, false
be completed in different ways and, worse still, but invoicing and, of course, the fact that a single transfer,
inevitable, sometimes the data have errors. Automatic in itself, is unlikely to appear to be a laundering
error detection (and correction) software has been transaction. Furthermore, because of the large sums
developed, based on semantic and syntactic constraints involved, money launderers are highly professional and
on possible content, but, of course, this can never often have contacts in the banks who can feed back
be a complete solution. Matters are also complicated details of the detection strategies being applied.
by the fact that banks do not share their data. Of The number of currency transactions over $10,000
course, banks are not the only bodies that transfer in value increased dramatically after the mid-1980s,
money electronically, and other businesses have been to the extent that the number of reports filed is huge
established precisely for this purpose [the OTA report (over 10 million in 1994, with total worth of around
(U.S. Congress, 1995) estimates the number of such $500 billion), and this in itself can cause difficulties.
businesses as 200,000]. In an attempt to cope with this, the Financial Crimes
The detection of money laundering presents difficul- Enforcement Network (FinCEN) of the U.S. Depart-
ties not encountered in areas such as, for example, the ment of the Treasury processes all such reports using
STATISTICAL FRAUD DETECTION 241
the FinCEN artificial intelligence system (FAIS) de- money laundering schemes requires the ability to re-
scribed below. More generally, banks are also required construct these patterns of transactions by linking po-
to report any suspicious transactions, and about 0.5% tentially related transactions and then to distinguish
of currency transaction reports are so flagged. the legitimate sets of transactions from the illegitimate
Money laundering involves three steps: ones. This technique of finding relationships between
elements of information, called link analysis, is the pri-
1. Placement: the introduction of the cash into the
mary analytic technique used in law enforcement in-
banking system or legitimate business (e.g., trans-
telligence (Andrews and Peterson, 1990).” An obvi-
ferring the banknotes obtained from retail drugs
ous and simplistic illustration is the fact that a transac-
transactions into a cashier’s cheque). One way
tion with a known criminal may rouse suspicion. More
to do this is to pay vastly inflated amounts for
subtle methods are based on recognition of the sort
goods imported across international frontiers. Pak
of businesses with which money laundering operations
and Zdanowicz (1994) described statistical analy-
transact. Of course, these are all supervised methods
sis of trade databases to detect anomalies in gov-
and are subject to the weaknesses that those responsi-
ernment trade data such as charging $1694 a gram
ble may evolve their strategies. Similar tools are used
for imports of the drug erythromycin compared with
to detect telecom fraud, as outlined in the following
$0.08 a gram for exports.
section.
2. Layering: carrying out multiple transactions
Rule-based systems have been developed, often
through multiple accounts with different owners at
with the rules based on experience (“flag transactions
different financial institutions in the legitimate fi-
from countries X and Y”; “flag accounts showing
nancial system.
a large deposit followed immediately by a similar
3. Integration: merging the funds with money obtained
sized withdrawal”). Structuring can be detected by
from legitimate activities.
computing the cumulative sum of amounts entering
Detection strategies can be targeted at various lev- an account over a short window, such as a day. Other
els. In general (and in common with some other ar- methods have been developed based on straightforward
eas in which fraud is perpetrated), it is very difficult or descriptive statistics, such as rate of transactions and
impossible to characterize an individual transaction as proportion of transactions which are suspicious. The
fraudulent. Rather transaction patterns must be iden- use of the Benford distribution is an extension of this
tified as fraudulent or suspicious. A single deposit of idea. Although one may not usually be interested in
just under $10,000 is not suspicious, but multiple such detecting changes in an account’s behavior, methods
deposits are; a large sum being deposited is not sus- such as peer group analysis (Bolton and Hand, 2001)
picious, but a large sum being deposited and instantly and break detection (Goldberg and Senator, 1997) can
withdrawn is. In fact, one can distinguish several levels be applied to detect money laundering.
of (potential) analysis: the individual transaction level, One of the most elaborate money laundering detec-
the account level, the business level (and, indeed, indi- tion systems is the U.S. Financial Crimes Enforcement
viduals may have multiple accounts) and the “ring” of Network AI system (FAIS) described in Senator et al.
businesses level. Analyses can be targeted at particular (1995) and Goldberg and Senator (1998). This system
levels, but more complex approaches can examine sev- allows users to follow trails of linked transactions. It
eral levels simultaneously. (There is an analogy here is built around a “blackboard” architecture, in which
with speech recognition systems: simple systems fo- program modules can read and write to a central data-
cused at the individual phoneme and word levels are base that contains details of transactions, subjects and
not as effective as those which try to recognize these accounts. A key component of the system is its suspi-
elements in a higher level context of the way words cion score. This is a rule-based system based on an ear-
are put together when used.) In general, link analy- lier system developed by the U.S. Customs Service in
sis, which identifies groups of participants involved the mid-1980s. The system computes suspicion scores
in transactions, plays a key role in most money laun- for various different types of transaction and activity.
dering detection strategies. Senator et al. (1995) said Simple Bayesian updating is used to combine evidence
“Money laundering typically involves a multitude of that suggests that a transaction or activity is illicit to
transactions, perhaps by distinct individuals, into mul- yield an overall suspicion score. Senator et al. (1995)
tiple accounts with different owners at different banks included a brief but interesting discussion of an inves-
and other financial institutions. Detection of large-scale tigation of whether case-based reasoning (cf. nearest
242 R. J. BOLTON AND D. J. HAND
neighbor methods) and classification tree techniques used to derive them, there are other reasons for the
could usefully be added to the system. differences. One is the distinction between hard and
The American National Association of Securities soft currency. Hard currency is real money, paid by
Dealers, Inc., uses an advanced detection system (ADS; someone other than the perpetrator for the service
Kirkland et al., 1998; Senator, 2000) to flag “patterns the perpetrator has stolen. Hynninen (2000) gave the
or practices of regulatory concern.” ADS uses a rule example of the sum one mobile phone operator will
pattern matcher and a time-sequence pattern matcher, pay another for the use of their network. Soft currency
and (like FAIS) places great emphasis on visualization is the value of the service the perpetrator has stolen.
tools. Also as with FAIS, data mining techniques are At least part of this is only a loss if one assumes that
used to identify new patterns of potential interest. the thief would have used the same service even if he
A different approach to detecting similar fraudu- or she had had to pay for it. Another reason for the
lent behavior is taken by SearchSpace Ltd. (www. differences derives from the fact that such estimates
searchspace.com), which has developed a system for may be used for different purposes. Hynninen (2000)
the London Stock Exchange called MonITARS (moni- gave the examples of operators giving estimates on
toring insider trading and regulatory surveillance) that the high side, hoping for more stringent antifraud
combines genetic algorithms, fuzzy logic and neural legislation, and operators giving estimates on the low
network technology to detect insider dealing and mar- side to encourage customer confidence.
ket manipulation. Chartier and Spillane (2000) also We need to distinguish between fraud aimed at
described an application of neural networks to detect the service provider and fraud enabled by the service
money laundering. provider. An example of the former is the resale
of stolen call time and an example of the latter is
5. TELECOMMUNICATIONS FRAUD interfering with telephone banking instructions. (It is
the possibility of the latter sort of fraud which makes
The telecommunications industry has expanded dra- the public wary of using their credit cards over the
matically in the last few years with the development Internet.) We can also distinguish between revenue
of affordable mobile phone technology. With the in- fraud and nonrevenue fraud. The aim of the former is
creasing number of mobile phone users, global mo- to make money for the perpetrator, while the aim of the
bile phone fraud is also set to rise. Various estimates latter is simply to obtain a service free of charge (or,
have been presented for the cost of this fraud. For ex- as with computer hackers, e.g., the simple challenge
ample, Cox, Eick, Wills and Brachman (1997) gave a represented by the system).
figure of $1 billion a year. Telecom and Network Secu- There are many different types of telecom fraud (see,
rity Review [4(5) April 1997] gave a figure of between e.g., Shawe-Taylor et al., 2000) and these can occur at
4 and 6% of U.S. telecom revenue lost due to fraud. various levels. The two most prevalent types are sub-
Cahill, Lambert, Pinheiro and Sun (2002) suggested scription fraud and superimposed or “surfing” fraud.
that international figures are worse, with “several new Subscription fraud occurs when the fraudster obtains a
service providers reporting losses over 20%.” Moreau subscription to a service, often with false identity de-
et al. (1996) gave a value of “several million ECUs per tails, with no intention of paying. This is thus at the
year.” Presumably this refers to within the European level of a phone number—all transactions from this
Union and, given the size of the other estimates, we number will be fraudulent. Superimposed fraud is the
wonder if this should be billions. According to a re- use of a service without having the necessary authority
cent report (Neural Technologies, 2000), “the industry and is usually detected by the appearance of phantom
already reports a loss of £13 billion each year due to calls on a bill. There are several ways to carry out su-
fraud.” Mobile Europe (2000) gave a figure of $13 bil- perimposed fraud, including mobile phone cloning and
lion (U.S.). The latter article also claimed that it is es- obtaining calling card authorization details. Superim-
timated that fraudsters can steal up to 5% of some op- posed fraud will generally occur at the level of indi-
erators’ revenues, and that some expect telecom fraud vidual calls—the fraudulent calls will be mixed in with
as a whole to reach $28 billion per year within 3 years. the legitimate ones. Subscription fraud will generally
Despite the variety in these figures, it is clear that be detected at some point through the billing process—
they are all very large. Apart from the fact that they although the aim is to detect it well before that, since
are simply estimates, and hence subject to expected large costs can quickly be run up. Superimposed fraud
inaccuracies and variability based on the information can remain undetected for a long time. The distinction
STATISTICAL FRAUD DETECTION 243
between these two types of fraud follows a similar dis- also were described by Fawcett and Provost (1997a, b,
tinction in credit card fraud. 1999) and Moreau, Verrelst and Vandewalle (1997).
Other types of telecom fraud include “ghosting” Some work (see, e.g., Fawcett and Provost, 1997a) has
(technology that tricks the network so as to obtain free focused on detecting changes in behavior.
calls) and insider fraud, where telecom company em- A general complication is that signatures and thresh-
ployees sell information to criminals that can be ex- olds may need to depend on time of day, type of ac-
ploited for fraudulent gain. This, of course, is a univer- count and so on, and that they will probably need to
sal cause of fraud, whatever the domain. “Tumbling” be updated over time. Cahill et al. (2002) suggested
is a type of superimposed fraud in which rolling fake excluding the very suspicious scores in this updating
serial numbers are used on cloned handsets, so that process, although more work is needed in this area.
successive calls are attributed to different legitimate Once again, neural networks have been widely used.
phones. The chance of detection by spotting unusual The main fraud detection software of the Fraud
patterns is small and the illicit phone will operate un- Solutions Unit of Nortel Networks (Nortel, 2000) uses
til all of the assumed identities have been spotted. The a combination of profiling and neural networks. Like-
term “spoofing” is sometimes used to describe users wise, ASPeCT (Moreau et al., 1996; Shawe-Taylor
pretending to be someone else. et al., 2000), a project of the European Commis-
Telecommunications networks generate vast quanti- sion, Vodaphone, other European telecom compa-
ties of data, sometimes on the order of several giga- nies and academics, developed a combined rule-based
bytes per day, so that data mining techniques are of profiling and neural network approach. Taniguchi,
particular importance. The 1998 database of AT&T, for Haft, Hollmén and Tresp (1998) described neural net-
example, contained 350 million profiles and processed works, mixture models and Bayesian networks in
275 million call records per day (Cortes and Pregibon, telecom fraud detection based on call records stored
1998). for billing.
As with other fraud domains, apart from some do- Link analysis, with links updated over time, estab-
main specific tools, methods for detection hinge around lishes the “communities of interest” (Cortes, Pregi-
outlier detection and supervised classification, either bon and Volinsky, 2001) that can indicate networks of
using rule-based methods or based on comparing sta- fraudsters. These methods are based on the observation
tistically derived suspicion scores with some thresh- that fraudsters seldom change their calling habits, but
old. At a low level, simple rule-based detection sys- are often closely linked to other fraudsters. Using sim-
tems use rules such as the apparent use of the same ilar patterns of transactions to infer the presence of a
phone in two very distant geographical locations in particular fraudster is in the spirit of phenomenal data
quick succession, calls which appear to overlap in time, mining (McCarthy, 2000).
and very high value and very long calls. At a higher Visualization methods (Cox et al., 1997), developed
level, statistical summaries of call distributions (of- for mining very large data sets, have also been devel-
ten called profiles or signatures at the user level) are oped for use in telecom fraud detection. Here human
compared with thresholds determined either by experts pattern recognition skills interact with graphical com-
or by application of supervised learning methods to puter display of quantities of calls between different
known fraud/nonfraud cases. Murad and Pinkas (1999) subscribers in various geographical locations. A possi-
and Rosset et al. (1999) distinguished between profil- ble future scenario would be to code into software the
ing at the levels of individual calls, daily call patterns patterns which humans detect.
and overall call patterns, and described what are effec- The telecom market will become even more compli-
tively outlier detection methods for detecting anom- cated over time—with more opportunity for fraud. At
alous behavior. A particularly interesting description present the extent of fraud is measured by consider-
of profiling methods was given by Cortes and Pregibon ing factors such as call lengths and tariffs. The third
(1998). Cortes, Fisher, Pregibon and Rogers (2000) de- generation of mobile phone technology will also need
scribed the Hancock language for writing programs to take into account such things as the content of the
for processing profiles, basing the signatures on such calls (because of the packet switching technology used,
quantities as average call duration, longest call dura- equally long data transmissions may contain very dif-
tion, number of calls to particular regions in the last ferent numbers of data packets) and the priority of the
day and so on. Profiling and classification techniques call.
244 R. J. BOLTON AND D. J. HAND
and He, Graco and Yao (1999) described the use of accounting and management fraud in contexts broader
neural networks, genetic algorithms and nearest neigh- than those of money laundering. Digit analysis tools
bor methods to classify the practice profiles of general have found favor in accountancy (e.g., Nigrini and
practitioners in Australia into classes from normal to Mittermaier, 1997; Nigrini, 1999). Statistical sampling
abnormal. methods are important in financial audit, and screen-
Medical fraud is often linked to insurance fraud: ing tools are applied to decide which tax returns merit
Terry Allen, a statistician with the Utah Bureau of detailed investigation. We mentioned insurance fraud
Medicaid Fraud, estimated that up to 10% of the in the context of medicine, but it clearly occurs more
$800 million annual claims may be stolen (Allen, widely. Artís, Ayuso and Guillén (1999) described an
2000). Major and Riedinger (1992) created a know- approach to modelling fraud behavior in car insurance,
ledge/statistical-based system to detect healthcare and Fanning, Cogger and Srivastava (1995) and Green
fraud by comparing observations with those with and Choi (1997) examined neural network classifica-
which they should be most similar (e.g., having simi- tion methods for detecting management fraud. Statis-
lar geodemographics). Brockett, Xia and Derrig (1998) tical tools for fraud detection have also been applied
used neural networks to classify fraudulent and non- to sporting events. For example, Robinson and Tawn
fraudulent claims for automobile bodily injury in (1995), Smith (1997) and Barao and Tawn (1999) ex-
healthcare insurance claims. Glasgow (1997) gave a amined the results of running events to see if some ex-
short discussion of risk and fraud in the insurance in- ceptional times were out of line with what might be
dustry. A glossary of several of the different types of expected.
medical fraud is available at http://www.motherjones. Plagiarism is also a type of fraud. We briefly referred
com/mother_jones/MA95/davis2.html. to the use of statistical tools for author verification
Of course, medicine is not the only scientific area and such methods can be applied here. However,
where data have sometimes been fabricated, falsified statistical tools can also be applied more widely.
or carefully selected to support a pet theory. Problems For example, with the evolution of the Internet it is
of fraud in science are attracting increased attention, extremely easy for students to plagiarize articles and
but they have always been with us: errant scientists pass them off as their own in school or university
have been known to massage figures from experiments coursework. The website http://www.plagiarism.org
to push through development of a product or reach a describes a system that can take a manuscript and
magical significance level for a publication. Dmitriy compare it against their “substantial database” of
Yuryev described such a case on his webpages at articles from the Web. A statistical measure of the
http://www.orc.ru/∼yur77/statfr.htm. Moreover, there originality of the manuscript is returned.
are many classical cases in which the data have As we commented in the Introduction, fraud detec-
been suspected of being massaged (including the tion is a post hoc strategy, being applied after fraud
work of Galileo, Newton, Babbage, Kepler, Mendel, prevention has failed. Statistical tools are also applied
Millikan and Burt). Press and Tanur (2001) presented in some fraud prevention methods. For example, so-
a fascinating discussion of the role of subjectivity in called biometric methods of fraud detection are slowly
the scientific process, illustrating with many examples. becoming more widespread. These include computer-
The borderline between subconscious selection of data ized fingerprint and retinal identification, and also face
and out-and-out distortion is a fine one. recognition (although this has received most publicity
in the context of recognizing football hooligans).
8. CONCLUSIONS
In many of the applications we have discussed, speed
of processing is of the essence. This is particularly the
The areas we have outlined are perhaps those in case in transaction processing, especially with telecom
which statistical and other data analytic tools have and intrusion data, where vast numbers of records are
made the most impact on fraud detection. This is typi- processed every day, but also applies in credit card,
cally because there are large quantities of information, banking and retail sectors.
and this information is numerical or can easily be con- A key issue in all of this work is how effective the
verted into the numerical in the form of counts and pro- statistical tools are in detecting fraud and a fundamen-
portions. However, other areas, not mentioned above, tal problem is that one typically does not know how
have also used statistical tools for fraud detection. Ir- many fraudulent cases slip through the net. In applica-
regularities in financial statements can be used to detect tions such as banking fraud and telecom fraud, where
246 R. J. BOLTON AND D. J. HAND
speed of detection matters, measures such as average BARAO , M. I. and TAWN , J. A. (1999). Extremal analysis of short
time to detection after fraud starts (in minutes, num- series with outliers: Sea-levels and athletics records. Appl.
bers of transactions, etc.) should also be reported. Mea- Statist. 48 469–487.
B LUNT, G. and H AND , D. J. (2000). The UK credit card market.
sures of this aspect interact with measures of final de- Technical report, Dept. Mathematics, Imperial College, Lon-
tection rate: in many situations an account, telephone don.
and so forth, will have to be used for several fraudulent B OLTON , R. J. and H AND , D. J. (2001). Unsupervised profiling
transactions before it is detected as fraudulent, so that methods for fraud detection. In Conference on Credit Scoring
and Credit Control 7, Edinburgh, UK, 5–7 Sept.
several false negative classifications will necessarily be
B RAUSE , R., L ANGSDORF, T. and H EPP, M. (1999). Neural data
made. mining for credit card fraud detection. In Proceedings of the
An appropriate overall strategy is to use a graded 11th IEEE International Conference on Tools with Artificial
system of investigation. Accounts with very high Intelligence 103–106. IEEE Computer Society Press, Silver
suspicion scores merit immediate and intensive (and Spring, MD.
B REIMAN , L., F RIEDMAN , J. H., O LSHEN , R. A. and
expensive) investigation, while those with large but
S TONE , C. J. (1984). Classification and Regression Trees.
less dramatic scores merit closer (but not expensive) Wadsworth, Belmont, CA.
observation. Once again, it is a matter of choosing a B ROCKETT, P. L., X IA , X. and D ERRIG , R. A. (1998). Using
suitable compromise. Kohonen’s self-organising feature map to uncover automobile
Finally, it is worth repeating the conclusions reached bodily injury claims fraud. The Journal of Risk and Insurance
65 245–274.
by Schonlau et al. (2001), in the context of statisti- B URGE , P. and S HAWE -TAYLOR , J. (1997). Detecting cellular
cal tools for computer intrusion detection: “statistical fraud using adaptive prototypes. In AAAI Workshop on AI
methods can detect intrusions, even in difficult circum- Approaches to Fraud Detection and Risk Management 9–13.
stances,” but also “many challenges and opportunities AAAI Press, Menlo Park, CA.
for statistics and statisticians remain.” We believe this B UYSE , M., G EORGE , S. L., E VANS , S., G ELLER , N. L.,
R ANSTAM , J., S CHERRER , B., L ESAFFRE , E., M URRAY, G.,
positive conclusion holds more generally. Fraud detec- E DLER , L., H UTTON , J., C OLTON , T., L ACHENBRUCH , P.
tion is an important area, one in many ways ideal for and V ERMA , B. L. (1999). The role of biostatistics in the
the application of statistical and data analytic tools, and prevention, detection and treatment of fraud in clinical trials.
one where statisticians can make a very substantial and Statistics in Medicine 18 3435–3451.
important contribution. C AHILL , M. H., L AMBERT, D., P INHEIRO , J. C. and S UN , D. X.
(2002). Detecting fraud in the real world. In Handbook of Mas-
sive Datasets (J. Abello, P. M. Pardalos and M. G. C. Resende,
ACKNOWLEDGMENT eds.). Kluwer, Dordrecht.
C HAN , P. K., FAN , W., P RODROMIDIS , A. L. and S TOLFO , S. J.
The work of Richard Bolton was supported by (1999). Distributed data mining in credit card fraud detection.
a ROPA award from the Engineering and Physical IEEE Intelligent Systems 14(6) 67–74.
Sciences Research Council of the United Kingdom. C HAN , P. and S TOLFO , S. (1998). Toward scalable learning
with non-uniform class and cost distributions: A case study
in credit card fraud detection. In Proceedings of the Fourth
REFERENCES International Conference on Knowledge Discovery and Data
Mining 164–168. AAAI Press, Menlo Park, CA.
A LESKEROV, E., F REISLEBEN , B. and R AO , B. (1997). CARD- C HARTIER , B. and S PILLANE , T. (2000). Money laundering
WATCH: A neural network based database mining system for detection with a neural network. In Business Applications of
credit card fraud detection. In Computational Intelligence for Neural Networks (P. J. G. Lisboa, A. Vellido and B. Edisbury,
Financial Engineering. Proceedings of the IEEE/IAFE 220– eds.) 159–172. World Scientific, Singapore.
226. IEEE, Piscataway, NJ. C HHIKARA , R. S. and M C K EON , J. (1984). Linear discriminant
A LLEN , T. (2000). A day in the life of a Medicaid fraud statistician. analysis with misallocation in training samples. J. Amer.
Stats 29 20–22. Statist. Assoc. 79 899–906.
A NDERSON , D., F RIVOLD , T. and VALDES , A. (1995). Next- C LARK , P. and N IBLETT, T. (1989). The CN2 induction algorithm.
generation intrusion detection expert system (NIDES): A sum- Machine Learning 3 261–285.
mary. Technical Report SRI-CSL-95-07, Computer Science C OHEN , W. (1995). Fast effective rule induction. In Proceedings of
Laboratory, SRI International, Menlo Park, CA. the 12th International Conference on Machine Learning 115–
A NDREWS , P. P. and P ETERSON , M. B., eds. (1990). Criminal 123. Morgan Kaufmann, Palo Alto, CA.
Intelligence Analysis. Palmer Enterprises, Loomis, CA. C ORTES , C., F ISHER , K., P REGIBON , D. and ROGERS , A.
A RTÍS , M., AYUSO , M. and G UILLÉN , M. (1999). Modelling (2000). Hancock: A language for extracting signatures from
different types of automobile insurance fraud behaviour in the data streams. In Proceedings of the Sixth ACM SIGKDD
Spanish market. Insurance Mathematics and Economics 24 International Conference on Knowledge Discovery and Data
67–81. Mining 9–17. ACM Press, New York.
STATISTICAL FRAUD DETECTION 247
C ORTES , C. and P REGIBON , D. (1998). Giga-mining. In Proceed- G OLDBERG , H. and S ENATOR , T. E. (1998). The FinCEN AI
ings of the Fourth International Conference on Knowledge system: Finding financial crimes in a large database of cash
Discovery and Data Mining 174–178. AAAI Press, Menlo transactions. In Agent Technology: Foundations, Applications,
Park, CA. and Markets (N. Jennings and M. Wooldridge, eds.) 283–302.
Springer, Berlin.
C ORTES , C, P REGIBON , D. and VOLINSKY, C. (2001). Commu-
G REEN , B. P. and C HOI , J. H. (1997). Assessing the risk
nities of interest. Lecture Notes in Comput. Sci. 2189 105–114. of management fraud through neural network technology.
C OX , K. C., E ICK , S. G. and W ILLS , G. J. (1997). Visual data Auditing 16 14–28.
mining: Recognizing telephone calling fraud. Data Mining and H AND , D. J. (1981). Discrimination and Classification. Wiley,
Knowledge Discovery 1 225–231. Chichester.
CSIDS (1999). Cisco secure intrusion detection system tech- H AND , D. J. (1997). Construction and Assessment of Classifica-
tion Rules. Wiley, Chichester.
nical overview. Available at http://www.wheelgroup.com/
H AND , D. J. and B LUNT, G. (2001). Prospecting for gems in credit
warp/public/cc/cisco/mkt/security/nranger/tech/ntran_tc.htm.
card data. IMA Journal of Management Mathematics 12 173–
D ENNING , D. E. (1997). Cyberspace attacks and countermeasures. 200.
In Internet Besieged (D. E. Denning and P. J. Denning, eds.) H AND , D. J., B LUNT, G., K ELLY, M. G. and A DAMS , N. M.
29–55. ACM Press, New York. (2000). Data mining for fun and profit (with discussion).
D ORRONSORO , J. R., G INEL , F., S ANCHEZ , C. and C RUZ , C. S. Statist. Sci. 15 111–131.
(1997). Neural fraud detection in credit card operations. IEEE H AND , D. J. and H ENLEY, W. E. (1997). Statistical classification
Transactions on Neural Networks 8 827–834. methods in consumer credit scoring: A review. J. Roy. Statist.
Soc. Ser. A 160 523–541.
FANNING , K., C OGGER , K. O. and S RIVASTAVA , R. (1995). H ASSIBI , K. (2000). Detecting payment card fraud with neural
Detection of management fraud: A neural network approach. networks. In Business Applications of Neural Networks
International Journal of Intelligent Systems in Accounting, (P. J. G. Lisboa, A. Vellido and B. Edisbury, eds.). World Sci-
Finance and Management 4 113–126. entific, Singapore.
FAWCETT, T. and P ROVOST, F. (1997a). Adaptive fraud detection. H E , H., G RACO , W. and YAO , X. (1999). Application of genetic
Data Mining and Knowledge Discovery 1 291–316. algorithm and k-nearest neighbour method in medical fraud
detection. Lecture Notes in Comput. Sci. 1585 74–81. Springer,
FAWCETT, T. and P ROVOST, F. (1997b). Combining data mining
Berlin.
and machine learning for effective fraud detection. In AAAI H E , H. X., WANG , J. C., G RACO , W. and H AWKINS , S. (1997).
Workshop on AI Approaches to Fraud Detection and Risk Application of neural networks to detection of medical fraud.
Management 14–19. AAAI Press, Menlo Park, CA. Expert Systems with Applications 13 329–336.
FAWCETT, T. and P ROVOST, F. (1999). Activity monitoring: H ILL , T. P. (1995). A statistical derivation of the significant-digit
Noticing interesting changes in behavior. In Proceedings of the law. Statist. Sci. 10 354–363.
Fifth ACM SIGKDD International Conference on Knowledge H YNNINEN , J. (2000). Experiences in mobile phone fraud. Semi-
Discovery and Data Mining 53–62. ACM Press, New York. nar on Network Security. Report Tik-110.501, Helsinki Univ.
Technology.
F ORREST, S., H OFMEYR , S., S OMAYAJI , A. and L ONGSTAFF , T.
J ENKINS , P. (2000). Getting smart with fraudsters. Financial
(1996). A sense of self for UNIX processes. In Proceedings of Times, September 23.
the 1996 IEEE Symposium on Security and Privacy 120–128. J ENSEN , D. (1997). Prospective assessment of AI technologies
IEEE Computer Society Press, Silver Spring, MD. for fraud detection: a case study. In AAAI Workshop on AI
G HOSH , S. and R EILLY, D. L. (1994). Credit card fraud detection Approaches to Fraud Detection and Risk Management 34–38.
with a neural network. In Proceedings of the 27th Hawaii AAAI Press, Menlo Park, CA.
International Conference on System Sciences (J. F. Nunamaker J U , W.-H. and VARDI , Y. (2001). A hybrid high-order Markov
and R. H. Sprague, eds.) 3 621–630. IEEE Computer Society chain model for computer intrusion detection. J. Comput.
Graph. Statist. 10 277–295.
Press, Los Alamitos, CA.
K IRKLAND , J. D., S ENATOR , T. E., H AYDEN , J. J., DYBALA , T.,
G LASGOW, B. (1997). Risk and fraud in the insurance industry. G OLDBERG , H. G. and S HYR , P. (1998). The NASD regula-
In AAAI Workshop on AI Approaches to Fraud Detection and tion advanced detection system (ADS). In Proceedings of the
Risk Management 20–21. AAAI Press, Menlo Park, CA. 15th National Conference on Artificial Intelligence (AAAI-98)
G OLDBERG , H. and S ENATOR , T. E. (1995). Restructuring data- and of the 10th Conference on Innovative Applications of Ar-
bases for knowledge discovery by consolidation and link for- tificial Intelligence (IAAI-98) 1055–1062. AAAI Press, Menlo
mation. In Proceedings of the First International Conference Park, CA.
on Knowledge Discovery and Data Mining 136–141. AAAI KOSORESOW, A. P. and H OFMEYR , S. A. (1997). Intrusion
detection via system call traces. IEEE Software 14 35–42.
Press, Menlo Park, CA.
K UMAR , S. and S PAFFORD , E. (1994). A pattern matching model
G OLDBERG , H. and S ENATOR , T. E. (1997). Break detection for misuse intrusion detection. In Proceedings of the 17th
systems. In AAAI Workshop on AI Approaches to Fraud National Computer Security Conference 11–21.
Detection and Risk Management 22–28. AAAI Press, Menlo L ACHENBRUCH , P. A. (1966). Discriminant analysis when the
Park, CA. initial samples are misclassified. Technometrics 8 657–662.
248 R. J. BOLTON AND D. J. HAND
L ACHENBRUCH , P. A. (1974). Discriminant analysis when the ini- PATIENT, S. (2000). Reducing online credit card fraud.
tial samples are misclassified. II: Non-random misclassifica- Web Developer’s Journal. Available at http://www.
tion models. Technometrics 16 419–424. webdevelopersjournal.com/articles/card_fraud.html
L ANE , T. and B RODLEY, C. E. (1998). Temporal sequence learn- P RESS , S. J. and TANUR , J. M. (2001). The Subjectivity of
ing and data reduction for anomaly detection. In Proceedings Scientists and the Bayesian Approach. Wiley, New York.
of the 5th ACM Conference on Computer and Communications P ROVOST, F. and FAWCETT, T. (2001). Robust classification for
Security (CCS-98) 150–158. ACM Press, New York. imprecise environments. Machine Learning 42 203–210.
L EE , W. and S TOLFO , S. (1998). Data mining approaches for Q U , D., V ETTER , B. M., WANG , F., NARAYAN , R., W U , S. F.,
intrusion detection. In Proceedings of the 7th USENIX Security H OU , Y. F., G ONG , F. and S ARGOR , C. (1998). Statistical
Symposium, San Antonio, TX 79–93. USENIX Association, anomaly detection for link-state routing protocols. In Proceed-
Berkeley, CA. ings of the Sixth International Conference on Network Proto-
L EONARD , K. J. (1993). Detecting credit card fraud using expert cols 62–70. IEEE Computer Society Press, Los Alamitos, CA.
systems. Computers and Industrial Engineering 25 103–106. Q UINLAN , J. R. (1990). Learning logical definitions from rela-
tions. Machine Learning 5 239–266.
L IPPMANN , R., F RIED , D., G RAF, I., H AINES , J.,
Q UINLAN , J. R. (1993). C4.5: Programs for Machine Learning.
K ENDALL , K., M C C LUNG , D., W EBER , D., W EBSTER , S.,
Morgan Kaufmann, San Mateo, CA.
W YSCHOGROD , D., C UNNINGHAM , R. and Z ISSMAN , M.
R IPLEY, B. D. (1996). Pattern Recognition and Neural Networks.
(2000). Evaluating intrusion detection systems: The 1998
Cambridge Univ. Press.
DARPA off-line intrusion-detection evaluation. Unpublished
ROBINSON , M. E. and TAWN , J. A. (1995). Statistics for excep-
manuscript, MIT Lincoln Laboratory.
tional athletics records. Appl. Statist. 44 499–511.
M AJOR , J. A. and R IEDINGER , D. R. (1992). EFD: A hybrid
ROSSET, S., M URAD , U., N EUMANN , E., I DAN , Y. and
knowledge/statistical-based system for the detection of fraud.
P INKAS , G. (1999). Discovery of fraud rules for
International Journal of Intelligent Systems 7 687–703. telecommunications—challenges and solutions. In Pro-
M ARCHETTE , D. J. (2001). Computer Intrusion Detection and ceedings of the Fifth ACM SIGKDD International Conference
Network Monitoring: A Statistical Viewpoint. Springer, New on Knowledge Discovery and Data Mining 409–413. ACM
York. Press, New York.
M C C ARTHY, J. (2000). Phenomenal data mining. Comm. ACM 43 RYAN , J., L IN , M. and M IIKKULAINEN , R. (1997). Intrusion
75–79. detection with neural networks. In AAAI Workshop on AI
M C L ACHLAN , G. J. (1992). Discriminant Analysis and Statistical Approaches to Fraud Detection and Risk Management 72–79.
Pattern Recognition. Wiley, New York. AAAI Press, Menlo Park, CA.
M OBILE E UROPE (2000). New IP world, new dangers. Mobile S CHONLAU , M., D U M OUCHEL , W., J U , W.-H., K ARR , A. F.,
Europe, March. T HEUS , M. and VARDI , Y. (2001). Computer intrusion:
M OREAU , Y., P RENEEL , B., B URGE , P., S HAWE -TAYLOR , J., Detecting masquerades. Statist. Sci. 16 58–74.
S TOERMANN , C. and C OOKE , C. (1996). Novel techniques S ENATOR , T. E. (2000). Ongoing management and application
for fraud detection in mobile communications. In ACTS Mobile of discovered knowledge in a large regulatory organization:
Summit, Grenada. A case study of the use and impact of NASD regulation’s
M OREAU , Y., V ERRELST, H. and VANDEWALLE , J. (1997). De- advanced detection system (ADS). In Proceedings of the
tection of mobile phone fraud using supervised neural net- Sixth ACM SIGKDD International Conference on Knowledge
works: A first prototype. In Proceedings of 7th International Discovery and Data Mining 44–53. ACM Press, New York.
Conference on Artificial Neural Networks (ICANN’97) 1065– S ENATOR , T. E., G OLDBERG , H. G., W OOTON , J., C OT-
1070. Springer, Berlin. TINI , M. A., U MAR K HAN , A. F., K LINGER , C. D., L LA -
MAS , W. M., M ARRONE , M. P. and W ONG , R. W. H. (1995).
M URAD , U. and P INKAS , G. (1999). Unsupervised profiling for
identifying superimposed fraud. Principles of Data Mining The financial crimes enforcement network AI system (FAIS)—
Identifying potential money laundering from reports of large
and Knowledge Discovery. Lecture Notes in Artificial Intelli-
cash transactions. AI Magazine 16 21–39.
gence 1704 251–261. Springer, Berlin.
S HAWE -TAYLOR , J., H OWKER , K., G OSSET, P., H YLAND ,
N EURAL T ECHNOLOGIES (2000). Reducing telecoms fraud and
M., V ERRELST, H., M OREAU , Y., S TOERMANN , C. and
churn. Report, Neural Technologies, Ltd., Petersfield, U.K.
B URGE , P. (2000). Novel techniques for profiling and fraud
N IGRINI , M. J. (1999). I’ve got your number. Journal of Accoun- detection in mobile telecommunications. In Business Appli-
tancy May 79–83. cations of Neural Networks (P. J. G. Lisboa, A. Vellido and
N IGRINI , M. J. and M ITTERMAIER , L. J. (1997). The use of B.Edisbury, eds.) 113–139. World Scientific, Singapore.
Benford’s law as an aid in analytical procedures. Auditing: A S HIEH , S.-P. W. and G LIGOR , V. D. (1991). A pattern-oriented
Journal of Practice and Theory 16 52–67. intrusion-detection model and its applications. In Proceedings
N ORTEL (2000). Nortel networks fraud solutions. Fraud Primer, of the 1991 IEEE Computer Society Symposium on Research in
Issue 2.0. Nortel Networks Corporation. Security and Privacy 327–342. IEEE Computer Society Press,
PAK , S. J. and Z DANOWICZ , J. S. (1994). A statistical analysis of Silver Spring, MD.
the U.S. Merchandise Trade Database and its uses in trans- S HIEH , S.-P. W. and G LIGOR , V. D. (1997). On a pattern-
fer pricing compliance and enforcement. Tax Management, oriented model for intrusion detection. IEEE Transactions on
May 11. Knowledge and Data Engineering 9 661–667.
STATISTICAL FRAUD DETECTION 249
S MITH , R. L. (1997). Comment on “Statistics for exceptional TANIGUCHI , M., H AFT, M., H OLLMÉN , J. and T RESP, V.
athletics records,” by M. E. Robinson and J. A. Tawn. Appl. (1998). Fraud detection in communication networks using
Statist. 46 123–128. neural and probabilistic methods. In Proceedings of the 1998
S TOLFO , S. J., FAN , D. W., L EE , W., P RODROMIDIS , A. L. and IEEE International Conference on Acoustics, Speech and
C HAN , P. K. (1997a). Credit card fraud detection using meta- Signal Processing (ICASSP’98) 2 1241–1244. IEEE Computer
learning: Issues and initial results. In AAAI Workshop on AI Society Press, Silver Spring, MD.
Approaches to Fraud Detection and Risk Management 83–90. U.S. C ONGRESS (1995). Information technologies for the control
AAAI Press, Menlo Park, CA. of money laundering. Office of Technology Assessment, Re-
S TOLFO , S., FAN , W., L EE , W., P RODROMIDIS , A. L. and port OTA-ITC-630, U.S. Government Printing Office, Wash-
C HAN , P. (1999). Cost-based modeling for fraud and intrusion ington, DC.
detection: Results from the JAM Project. In Proceedings of the WASSERMAN , S. and FAUST, K. (1994). Social Network Analysis:
DARPA Information Survivability Conference and Exposition Methods and Applications. Cambridge Univ. Press.
2 130–144. IEEE Computer Press, New York. W EBB , A. R. (1999). Statistical Pattern Recognition. Arnold,
S TOLFO , S. J., P RODROMIDIS , A. L., T SELEPIS , S., L EE , W., London.
FAN , D. W. and C HAN , P. K. (1997b). JAM: Java agents for W HEELER , R. and A ITKEN , S. (2000). Multiple algorithms
meta-learning over distributed databases. In AAAI Workshop for fraud detection. Knowledge-Based Systems 13(2/3)
on AI Approaches to Fraud Detection and Risk Management 93–99.
91–98. AAAI Press, Menlo Park, CA.
Comment
Foster Provost
The state of research on fraud detection recalls John cett and Provost, 2002)]. Consider fraud detection as a
Godfrey Saxe’s 19th-century poem “The Blind Men classification problem. Fraud detection certainly must
and the Elephant” (Felleman, 1936, page 521). Based be “cost-sensitive”—rather than minimizing error rate,
on a Hindu fable, each blind man experiences only a some other loss function must be minimized. In addi-
part of the elephant, which shapes his opinion of the tion, usually the marginal class distribution is skewed
nature of the elephant: the leg makes it seem like a strongly toward one class (legitimate behavior). There-
tree, the tail a rope, the trunk a snake and so on. In fact, fore, modeling for fraud detection at least is a diffi-
“. . . though each was partly in the right . . . all were in cult problem of estimating class membership probabil-
the wrong.” Saxe’s poem was a criticism of theological ity, rather than simple classification. However, this still
debates, and I do not intend such a harsh criticism is an unsatisfying attempt to transform the true prob-
of research on fraud detection. However, because the lem into one for which we have existing tools (prac-
problem is so complex, each research project takes tical and conceptual). The objective function for fraud
a particular angle of attack, which often obscures detection systems actually is much more complicated.
the view of other parts of the problem. So, some For example, the value of detection is a function of
researchers see the problem as one of classification, time. Immediate detection is much more valuable than
others of temporal pattern discovery; to some it is delayed detection. Unfortunately, evidence builds up
a problem perfect for a hidden Markov model and over time, so detection is easier the longer it is de-
so on. layed. In cases of self-revealing fraud, eventually, de-
So why is fraud detection not simply classification tection is trivial (e.g., a defrauded customer calls to
or a member of some other already well-understood complain about fraudulent transactions on his or her
problem class? Bolton and Hand outline several char- bill).
acteristics of fraud detection problems that differenti- In most research on modeling for fraud detection,
ate them [as did Tom Fawcett and I in our review of a subproblem is extracted (e.g., classifying transac-
the problems and techniques of fraud detection (Faw- tions or accounts as being fraudulent) and techniques
are compared for solving this subproblem—without
Foster Provost is Associate Professor, Leonard N. Stern moving on to compare the techniques for the greater
School of Business, New York University, New York, problem of detecting fraud. Each particular subprob-
New York 10012 (e-mail: [email protected]). lem naturally will abstract away those parts that are
250 R. J. BOLTON AND D. J. HAND
problematic for the technique at hand (e.g., temporal detection systems create “cases” comprising the ev-
aspects are ignored for research on applying standard idence collected so far that indicates fraud. Fraud
classification approaches). However, fraud detection analysts process these cases, often going to auxil-
can benefit from classification, regression, time-series iary sources of data to augment their analyses. At
analysis, temporal pattern discovery, techniques for any time, a case list can be sorted by some score:
combining evidence and others. For example, tempo- a probability of fraud, computed from all the evi-
ral sequences of particular actions can provide strong dence collected so far, an expected loss or simply an
clues to the existence of fraud. A common example ad hoc score. The unit of analysis for the produc-
of such a temporal sequence is a triggering event fol- tion of the score is complicated: it is composed of
lowed in a day or two by an acceleration of usage. a series of transactions, which comprises the poten-
In credit card fraud, bandits purchase small amounts tially fraudulent activity and possibly legitimate ac-
of gasoline (at a safe, automatic pump) to verify that tivity as well. The unit of analysis also could include
a card is active before selling it. In wireless tele- other information, such as that taken from account
phone fraud, bandits call a movie theater informa- applications, background databases, behavior profiles
tion number for verification. In a standard classifica- (which may have been compiled from previous trans-
tion framework, temporal patterns must be engineered action activity) and possibly account annotations made
carefully into the representation. On the other hand, by prior analysts (e.g., “this customer often triggers
in a framework designed to focus on the discovery of rule X”).
temporal sequences, many facets of advanced classi- A part of the fraud detection elephant that has not
fication may be ignored; for example, classifier learn- received much attention is the peculiar nonstationary
ers can take advantage (automatically) of mixed-type nature of the problem. Not only does the phenom-
variables, including numeric, categorical, set-valued enon being modeled change over time—sometimes
and text, and hierarchical background knowledge (Aro- dramatically—it changes in direct response to the mod-
nis and Provost, 1997) such as geographic hierar- eling of it. As soon as a model is put into place, it
chies. begins to lose effectiveness. For example, after real-
This is just one example of a pair of different izing that the appearance of a large volume of trans-
views of the problem, each with its advantages and actions on a brand new account is used as an indi-
disadvantages. Another is, as Bolton and Hand point cator of application/subscription fraud, criminals be-
out, the supervised/unsupervised duality to modeling gin to lie low and even pay initial bills before ramp-
for fraud detection: some fraudulent activity can be ing up spending. After realizing that “calling dens” in
detected by applying knowledge generalized from past, certain locations had led to models that detect wire-
labeled cases; other activity is better detected by less fraud based on those locations, criminals con-
noticing behavior that differs significantly from the structed roving calling dens (where fraudulent wire-
norm. less service was provided in the back of a van that
Fraud detection and intervention can have two drove around the city). This adaptation is problematic
modes: automatic and mixed initiative (human/compu- for the typical information systems development life
ter). Automatic intervention only occurs when there cycle (analysis → design → programming → deploy-
is very strong evidence that fraud exists; otherwise, ment → maintenance). At the very least it is necessary
false alarms would be disastrous. Remember that fraud for models to be able to be changed quickly and fre-
detection systems consider millions (sometimes tens quently. A more satisfying (but perhaps not yet prac-
or hundreds of millions) of accounts. On a customer ticable) solution would be to have a learning system,
base of only 1 million accounts, a daily false-alarm which can modify its own models in the ongoing arms
rate of even 1% would yield 10,000 false alarms race.
a day; the cost of dealing with these (e.g., if ac- A practical view of the fraud detection elephant
counts were incorrectly shut down) could be enor- shows other issues that make fraud detection problems
mous. difficult. They must be kept in mind if one intends
Mixed-initiative detection and intervention deals results actually to apply to real fraud detection. Sys-
with cases that do not have enough evidence for au- tems for fraud detection, in many applications, face
tomatic intervention (or with applications for which tremendous computational demands. Transactions ar-
automatic intervention does not make sense). Fraud rive in real time; often only milliseconds (or less) can
STATISTICAL FRAUD DETECTION 251
be allocated to process each. In this short time, the sys- neither of these captures all of the important character-
tem must record the transaction in its database, access istics.
relevant account-specific data, process the transaction The characterization of such a class of problems
and historical data through the fraud detection model is important for several reasons. First of all, different
and create a case, update a case or issue an alarm if fraud detection problems are considerably similar—it
warranted (and if not, possibly update a customer’s is important to understand how well success of dif-
profile). Fraud models must be very efficient to apply. ferent techniques generalizes. Is the similarity super-
Furthermore, the models must be very space efficient. ficial? Are there deeper characteristics of the prob-
Storing a neural network or a decision tree for each lem or data that must be considered? [This seems to
customer is not feasible for millions of customers; it be the case, e.g., with classification problems (Perlich,
may be possible only to store for each customer a few Provost and Simonoff, 2001).] Also, to succeed at de-
parameters to a general model. Thus, both time and tecting fraud, different sorts of modeling techniques
space constraints argue for simple fraud detection mod- must be composed, for example, temporal patterns may
els. become features for a system for estimating class mem-
A user perspective of fraud detection (as a mixed- bership probabilities, and estimators of class member-
initiative process) argues for the use of models that ship probability could be used in temporal evidence
are comprehensible to the analysts. For example, for gathering. Furthermore, systems using different solu-
many analysts, rule-based models are easier to inter- tion methods should be on equal footing for compari-
pret than are neural network models. The set of rules son. Seeming success on any subproblem does not nec-
that apply to a particular case may guide the subse- essarily imply success on the greater problem. Finally,
quent (human) investigation. On the other hand, the it would be beneficial to focus researchers from many
disciplines, with many complementary techniques, on
most commercially successful vendor of fraud detec-
a common, very important set of problems. The juxta-
tion systems (to my knowledge) uses neural networks
position of knowledge and ideas from multiple disci-
extensively for detecting fraud. Of course, commercial
plines will benefit them all and will be facilitated by
success is a dubious measure of technical quality; how-
the precise formulation of a problem of common inter-
ever, one can get an interesting view into real world
est.
fraud detection systems by studying HNC Software’s
Of course I am not arguing that research must ad-
patent (Gopinathan et al., 1998). (As of this writing, a
dress all of these criteria simultaneously (immedi-
patent search on keywords “fraud detection” yields 80 ately), and I am not being strongly critical of prior work
patents.) In particular, their extensive list of variables, on fraud detection: we all must abstract away parts of
created to summarize past activity so that a neural net- such a complicated problem to make progress on oth-
work can be applied, illustrates the problem engineer- ers. Nevertheless, it is important that researchers take
ing necessary to transform the fraud detection problem as an ultimate goal the solution to the full problem.
into one that is amenable to standard modeling tech- We all should consider carefully whether partial solu-
niques. tions will or will not be extensible. Fraud detection is
It would be useful to have a precise definition of a a real, important problem with many real, interesting
class (or of several classes) of fraud detection prob- subproblems. Bolton and Hand’s review of the state
lems, which takes into account the variety of charac- of the art shows that there is a lot of room for use-
teristics that make statistical fraud detection difficult. ful research. However, the research community should
If such a characterization exists already in statistics, make sure that work is progressing toward the solu-
the machine learning and data mining communities tion to the larger problem, whether by the development
would benefit from its introduction. Not knowing of of techniques that solve larger portions or by facilitat-
one, Tom Fawcett and I attempted to define one class ing the composition of techniques in a principled man-
of “activity monitoring” problems and illustrate sev- ner.
eral instances (Fawcett and Provost, 1999). Earlier we
defined “superimposition fraud” (Fawcett and Provost,
1997a) to try to unify similar forms of wireless tele- ACKNOWLEDGMENT
phone fraud, calling card fraud, credit card fraud, cer- Tom Fawcett and I worked very closely on prob-
tain computer intrusions and so on, where fraudulent lems of fraud detection, and my views have been in-
usage is superimposed upon legitimate usage and for fluenced considerably by our discussions and collabo-
which similar solution methods may apply. However, rative work.
252 R. J. BOLTON AND D. J. HAND
Comment
Leo Breiman
This is an enjoyable and illuminating article. It deals go back to the data and try to understand why the false
with an area that few statisticians are aware of, but alarm rate is high. Understanding will help to lower the
that is of critical importance economically and in terms false alarm rate.
of security. I am appreciative to the authors for the The process is an alternation between algorithm and
education in fraud detection this article gave me and data. Personally, if a user reports that an algorithm
to Statistical Science for publishing it. There are some I have devised gives anomalous results on his data
interesting aspects that make this class of problems set, the first thing I do is to request that he ship me
unique and that I comment on, running the risk of the data. By running the data myself and trying to
repeating points made in the article. understand what it is about the data that causes the poor
The analysis has to deal with a large number of prob- performance, I can learn a lot about the deficiencies of
lems simultaneously. For instance, in credit card fraud, the algorithm and, possibly, improve it. Granted that
the records of millions of customers have to be ana- with a changing database running to gigabytes and
lyzed one by one to set up individual alarm settings. It terrabytes, it may be difficult to look at and understand
is not a single unsupervised or supervised problem— the data. However, this should not deter analysts—in
a multitude of such problems have to be simultane- fact, looking for good ways to display and understand
ously addressed and “solved” for diverse data records. the data is an essential foundation for the construction
Yet the algorithm selected, modulo a few tunable pa- of good algorithms.
rameters, has to be “one size fits all.” Otherwise the There are other difficult boundary conditions in the
on-line computations are not feasible. The alarm bell instances of fraud detection I have looked at. If one
settings have to be constantly updated. For instance, as tries to design algorithms that use multidimensional
customers age and change their economic level and life information, the problem is that the algorithm may
styles, usage characteristics change. There are also se- become too wrapped in the individual data and the
rious database issues—how to structure the large data- false alarm rate rises. However, simple and robust
algorithms may not utilize enough information to give
bases so that the incoming streams of data are acces-
a satisfactory detection rate.
sible for the kind of analysis necessary. Collaboration
The choice between supervised and unsupervised
with database experts is essential.
learning may be difficult and interesting. Assume that
Most of all, these problems require an uninhibited
in the database, examples are available of verified fraud
sense of exploration and can be enjoyable adventures
and uncontaminated data. As the authors mention, the
in living with data. The goal is predictive accuracy
cases of verified fraud in the data are a tiny fraction of
and the tools are algorithmic models (see Breiman,
all of the data.
2001). The class of problems is novel, even in machine
In detecting credit card fraud, for instance, there
learning. No one tool (neural nets, etc.) is instantly are two ways to go. The first is to consider one user
applicable to all of these problems. The algorithms (G.B.S.) and let his weekly purchases be instances of
have to be designed to fit the data. This means class 1. Take all records of a week of fraudulent use
that an essential part of the venture is immersion and assign them to class 2. Then run a classification
in and exploration of the data. My experience is algorithm on the two class data constructing a method
that good predictive algorithms do not appear by a that discriminates between the two classes. Weight the
selection, unguided by the data, from what algorithms probabilities of class 1 and class 2 assignment so as to
are available. Furthermore, the process is one of keep the false alarm rate down to a preassigned level.
successive informed revision. If an algorithm, for Then run the discrimination method on all future weeks
instance, has too high a false alarm rate, then one has to of G.B.S.’s purchases.
This, in machine learning, is called supervised learn-
Leo Breiman is Professor, Department of Statistics, ing. It relies on having two labeled classes of instances
University of California, Berkeley, California 94720- to discriminate between. Unsupervised learning occurs
3860 (e-mail: [email protected]) where there are no class labels or responses attached
STATISTICAL FRAUD DETECTION 253
to the data vectors. Applied to fraud detection, it takes seen to date. I have thought about this problem from
all weekly purchases by G.B.S. in the recent past and time to time, but see no satisfactory solution.
summarizes them in a few descriptive statistics. For in- In a number of fields a common problem, in both
stance, one could be total average weekly purchases supervised and unsupervised learning, is that the num-
and their standard deviation. If, in the current week, ber of data vectors is large, but the number of class 2
the total purchases exceed the average by many stan- cases (i.e., fraudulent data vectors) is an extremely
dard deviations, then an alarm bell goes off—that is, small fraction of the total. Using human judgment to
a high suspicion score is recorded. go over a large database and recognize all class 2 data
My impression is that, where applicable, supervised is not feasible. For example, in astronomy, an interest-
learning will give lower false alarm rates. Think of the ing class of objects are butterfly stars—stars that have
uncontaminated weekly data for G.B.S. as forming a a visual picture that resembles a butterfly. A project at
fuzzy ball in high dimensions. Unsupervised learning the Lawrence Livermore National Laboratory hoped to
puts a boundary around this ball and assigns a high identify all butterfly stars in a gigabyte database re-
suspicion score to anything outside of the boundary. sulting from a sky survey. Working on a small frac-
Supervised learning creates a second fuzzy ball con- tion of the data, a team of astronomers identified about
sisting of fraudulent weekly data and assigns a high 300 butterfly stars.
suspicion score only if the probability of being in The goal of the machine learning group working on
class 2 (fraud) is sufficiently higher than being in this project was to identify almost all of the butterfly
class 1. Data that are outside of the unsupervised stars in the survey while requiring minimal further
boundary may not be in the direction of class 2. How- identification work by the astronomers. This required
ever, the supervised approach makes the assumption the construction of an optimal incremental strategy.
that future fraudulent data will have the same char- Use the first 300 identifications to find further objects
acteristics as past fraudulent data and further assumes with high probability of being butterflies, ask the
that fraudulent use of the G.B.S. account will result in astronomers to say “yes” or “no” on these and then
characteristics similar to those in the fraudulent use of repeat using the larger sample.
other accounts. The challenges in fraud detection are both formi-
Fraud detection has some echoes in other areas. For dable and intriguing. Many of the problems are
instance, in the 1970s, Los Angeles had metal detectors nowhere near solution in terms of satisfactory false
buried every 14 mile in every lane in a 17 mile triangular alarm and detection rates. It is an open field for the ex-
section of heavily traveled freeways. Each detector ercise of ingenuity, algorithm creation and data snoop-
produced a signal as a car passed over it, resulting in ing. It is also a field worth billions.
estimates of traffic density and average speed. One goal The authors titled their paper “Statistical Fraud
was to use the data from these detectors, channeled Detection,” implying that this area is within the realm
into a central computer, to give early warning of of statistics—would that it were—but the number of
accidents that were blocking the traffic flow. However, statisticians involved is small. The authors write that
at the most critical times, when these freeways were they are covering a few areas “in which statistical
operating at near capacity traffic, stoppages in traffic methods can be applied.” The list of statistical methods
flow could develop spontaneously. Some sections of that I extracted from the article are
freeway were more likely to develop stoppages, for
Neural nets
example, a slight upgrade or a curve. A false alarm
Rule-based methods
could generate a dispatch of a tow truck, patrol car or
Tree-based algorithms
helicopter. My mission, as a consultant, was to develop
Genetic algorithms
an algorithm, specific to each section of freeway, to
Fuzzy logic
detect accident blockages with high accuracy and low
Mixture models
false alarm rate.
Bayesian networks
In astronomy, an important problem is to develop
Meta-learning
algorithms that can be applied to the finely detailed
pictures of millions of stellar objects and locate those These were developed in machine learning, not
that “are unlike anything we’re familiar with to date.” statistics (with the exception of mixture models), and
Here “unlike” does not mean bigger or smaller, but lead to algorithmic modeling. Because of the emphasis
having different physical characteristics than anything on stochastic data modeling in statistics, very few
254 R. J. BOLTON AND D. J. HAND
statisticians are familiar with algorithm modeling, learn about algorithmic modeling and how it applies
which is sometimes referred to (with a touch of to a large variety of statistical problems. The Berke-
prudishness) as “ad hoc.” ley Statistics Department made a move in this direc-
We are ceding some of the most interesting of cur- tion a few years ago by making a joint appointment
rent statistical problems to computer scientists and en- with the Computer Science Department of an excellent
gineers allied to the machine learning area. Detection scientist in the machine learning area. We will be doing
of fraud is an example. Young statisticians need to more.
Rejoinder
Richard J. Bolton and David J. Hand
We would like to thank the discussants for their ior of individuals who remain in a population can also
valuable contributions. They have reinforced some of change. Breiman describes some interesting examples
our points and also drawn attention to points which from outside the fraud detection domain which illus-
we glossed over or failed to make. Their contributions trate that there are other applications where statisti-
have significantly enhanced the value of the paper. cal research may offer solutions similar to those re-
We emphasized that many and varied tools would quired for fraud detection. One such domain, which
be required to attack the fraud detection problem and is affected by changing populations, is credit scoring
this has been echoed by the discussants, who make the (Kelly, Hand and Adams, 1999). Still on a temporal
additional important point that, whatever subproblems theme, the adaptability of fraud detection tools to the
are identified, the tools that are adapted or developed changing behavior of fraudsters must be addressed so
to attack them should do so in combination and to the as to ensure the continued effectiveness of a fraud de-
benefit of the fraud detection process as a whole. The tection system: as new detection strategies are intro-
message is that fraud detection is greater than the sum duced, so fraudsters will change their behavior accord-
of its parts and that it can be easy to lose sight of this ingly. Models of behavior can help with this, although
when dissecting the problem. In a similar vein, Provost the indicators of fraud that are independent of a partic-
also rightly draws attention to the fact that there are ular account may require a different strategy.
additional subtleties in applying even standard tools to We take Breiman’s point that many of the methods
fraud detection that may not at first be apparent. For we described were developed outside the narrow sta-
example, his observation that the value of detection tistical community. However, we had not intended the
is greater the sooner it is made, but that detection word “statistical” to refer merely to the stochastic data
becomes easier the more time that has passed. In fact, model-based statistics of his recent article (Breiman,
Hand (1996, 1997) suggested that many, if not most, 2001). Rather, we had intended it in the sense of Cham-
classification problems have such concealed subtleties, bers’ “greater statistics” (Chambers, 1993), “every-
and that researchers in statistics and machine learning thing related to learning from data.” Of course, the
have typically extracted only the basic form of the point that Breiman makes, that the tools we have de-
problem. So, as tools for classification bump against scribed have not been developed by conventional sta-
the ceiling of the greatest classification accuracy that tisticians, is something of an indictment of statisticians
can be achieved in practice, so it becomes more and (Hand, 1998).
more important to take note of these other aspects of We endorse Provost’s conclusion about the impor-
the problems. tance of looking at the full problem. It is all too easy
Both discussants comment on the importance of the to abstract a component problem and then overrefine
temporal aspect of fraud. We agree that the incorpora- the solution to this, way beyond a level which can be
tion of temporal information into the (commonly) sta- useful or relevant in the context of the overall problem.
tic classification structure is essential in most cases of Conversely, it is all too easy to be misled to a focus on a
fraud detection and that further research on tools for peripheral or irrelevant aspect of the subproblem. Aca-
tackling this would be of great benefit. Populations demic researchers have often been criticized for this in
evolve as people enter and exit them, but the behav- other contexts. Of course, the fact is that many of the
STATISTICAL FRAUD DETECTION 255
subproblems require specialist expertise and specialists B REIMAN , L. (2001). Statistical modeling: The two cultures (with
in a narrow area may find it difficult to see the broader discussion). Statist. Sci. 16 199–231.
picture. Moreover, naturally, such specialists will want C HAMBERS , J. M. (1993). Greater or lesser statistics: A choice for
future research. Statist. Comput. 3 182–184.
to apply their specialist tool: to those who have a ham-
FAWCETT, T. and P ROVOST, F. (2002). Fraud detection. In Hand-
mer, everything looks like a nail. book of Knowledge Discovery and Data Mining (W. Kloesgen
The discussion contributions have emphasized the and J. Zytkow, eds.). Oxford Univ. Press.
fact that fraud detection is an important and challeng- F ELLEMAN , H., ed. (1936). The Best Loved Poems of the American
ing area for statisticians; indeed, for data analysts in People. Doubleday, New York.
general. Challenging aspects include the large data G OPINATHAN , K. M., B IAFORE , L. S., F ERGUSON , W. M.,
sets, the fact that one class is typically very small, L AZARUS , M. A., PATHRIA , A. K. and J OST, A. (1998).
Fraud detection using predictive modeling. U.S. Patent
that the data are dynamic and that speedy decisions 5819226, October 6.
may be very important, that the nature of the frauds H AND , D. J. (1996). Classification and computers: Shifting the fo-
changes over time, often in response to the very de- cus. In COMPSTAT-96: Proceedings in Computational Statis-
tection strategies that may be put in place, that there tics (A. Prat, ed.) 77–88. Physica, Heidelberg.
may be no training instances and that detecting fraud H AND , D. J. (1998). Breaking misconceptions—statistics and its
involves multiple interconnected approaches. All of relationship to mathematics (with discussion). The Statistician
47 245–250, 284–286.
these and other aspects mean that collaboration with
K ELLY, M. G., H AND , D. J. and A DAMS , N. M. (1999). The
data experts, who can provide human insight into the impact of changing populations on classifier performance.
underlying processes, is essential. In Proceedings of the Fifth ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining (S.
Chaudhuri and D. Madigan, eds.) 367–371. ACM Press, New
ADDITIONAL REFERENCES
York.
A RONIS , J. and P ROVOST, F. (1997). Increasing the efficiency P ERLICH , C., P ROVOST, F. and S IMONOFF , J. S. (2001). Tree
of data mining algorithms with breadth-first marker propaga- induction vs. logistic regression: A learning-curve analysis.
tion. In Proceedings of the Third International Conference on Journal of Machine Learning Research. To appear.
Knowledge Discovery and Data Mining 119–122. AAAI Press,
Menlo Park, CA.