Ecmlpkdd08 Lazarevic Dmfa
Ecmlpkdd08 Lazarevic Dmfa
Aleksandar Lazarevic
United Technologies Research Center
• Introduction
• Aspects of Anomaly Detection Problem
• Applications
• Different Types of Anomaly Detection
Techniques
• Case Study
• Discussion and Conclusions
Introduction
We are drowning in the deluge
of data that are being collected
world-wide, while starving for
knowledge at the same time*
Anomalous events occur
relatively infrequently
However, when they do occur, “Mining needle in a haystack.
So much hay and so little time”
their consequences can be quite
dramatic and quite often in a
negative sense
* - J. Naisbitt, Megatrends: Ten New Directions Transforming Our Lives. New York: Warner Books, 1982.
What are Anomalies?
• Anomaly is a pattern in the data that does
not conform to the expected behavior
• Also referred to as outliers, exceptions,
peculiarities, surprise, etc.
• Anomalies translate to significant (often
critical) real life entities
– Cyber intrusions
– Credit card fraud
– Faults in mechanical systems
Real World Anomalies
• Cyber Intrusions
– A web server involved in ftp
traffic
Simple Examples
Y
• N1 and N2 are
regions of normal N1 o1
O3
behavior
• Points o1 and o2
are anomalies o2
• Points in region O3 N2
are anomalies
X
Related problems
• Rare Class Mining
• Chance discovery
• Novelty Detection
• Exception Mining
• Noise Removal
• Black Swan*
* N. Taleb, The Black Swan: The Impact of the Highly Probable?, 2007
Key Challenges
• Defining a representative normal region is
challenging
• The boundary between normal and outlying
behavior is often not precise
• Availability of labeled data for training/validation
• The exact notion of an outlier is different for
different application domains
• Malicious adversaries
• Data might contain noise
• Normal behavior keeps evolving
Aspects of Anomaly Detection Problem
Data 19
177
– Univariate 172
– Multivariate 285
195
163
10
Input Data
– Sequential CCAACCGAGTCCGACCAGGTGCC
CCCTCTGCTCGGCCTAGACCTGA
GCTCATTAGGCGGCAGCGGACAG
• Temporal GCCAAGTAGAACACGCGAAGCGC
TGGGCTGCCTGCTGCGACCAGGG
– Spatial
– Spatio-temporal
– Graph
Data Labels
• Supervised Anomaly Detection
– Labels available for both normal data and
anomalies
– Similar to rare class mining
• Semi-supervised Anomaly Detection
– Labels available only for normal data
• Unsupervised Anomaly Detection
– No labels assumed
– Based on the assumption that anomalies are
very rare compared to normal data
Type of Anomaly
• Point Anomalies
• Contextual Anomalies
• Collective Anomalies
Point Anomalies
N1 o1
O3
o2
N2
X
Contextual Anomalies
• An individual data instance is anomalous within a context
• Requires a notion of context
• Also referred to as conditional anomalies*
Anomaly
Normal
* Xiuyao Song, Mingxi Wu, Christopher Jermaine, Sanjay Ranka, Conditional Anomaly Detection, IEEE
Transactions on Data and Knowledge Engineering, 2006.
Collective Anomalies
• A collection of related data instances is anomalous
• Requires a relationship among data instances
– Sequential Data
– Spatial Data
– Graph Data
• The individual instances within a collective anomaly are not
anomalous by themselves
Anomalous Subsequence
Output of Anomaly Detection
• Label
– Each test instance is given a normal or anomaly
label
– This is especially true of classification-based
approaches
• Score
– Each test instance is assigned an anomaly score
• Allows the output to be ranked
• Requires an additional threshold parameter
Evaluation of Anomaly Detection – F-value
Accuracy is not sufficient metric for evaluation
– Example: network traffic data set with 99.9% of normal data
and 0.1% of intrusions
– Trivial classifier that labels everything with the normal class
can achieve 99.9% accuracy !!!!!
Confusion Predicted
matrix class
NC C
Actual NC TN FP
class C FN TP
• Focus on both recall and precision
– Recall (R) = TP/(TP + FN)
– Precision (P) = TP/(TP + FP)
(1+ β 2 ) ⋅ R ⋅ P
• F – measure = 2*R*P/(R+P) =
β2 ⋅P+ R
Evaluation of Outlier Detection – ROC & AUC
Confusion Predicted
matrix class
NC C
Actual NC TN FP
class C FN TP
•Standard measures for evaluating anomaly detection problems:
– Recall (Detection rate) - ratio between the number of correctly detected
anomalies and the total number of anomalies
ROC curves for different outlier detection techniques
– False alarm (false positive) rate – ratio 1
Detection rate
0.7
0.1
100
200
within an image
• Used in
– mammography image analysis
– video surveillance
– satellite image analysis
• Key Challenges
– Detecting collective anomalies
– Data sets are very large Anomaly
Taxonomy*
Anomaly Detection Point Anomaly Detection
* Anomaly Detection – A Survey, Varun Chandola, Arindam Banerjee, and Vipin Kumar, To Appear in ACM
Computing Surveys 2008.
Classification Based Techniques
• Main idea: build a classification model for normal (and
anomalous (rare)) events based on labeled training data, and
use it to classify each new unseen event
• Classification models must be able to handle skewed
(imbalanced) class distributions
• Categories:
– Supervised classification techniques
• Require knowledge of both normal and anomaly class
• Build classifier to distinguish between normal and known anomalies
– Semi-supervised classification techniques
• Require knowledge of normal class only!
• Use modified classification model to learn the normal behavior and then
detect any deviations from normal behavior as anomalous
Classification Based Techniques
• Advantages:
– Supervised classification techniques
• Models that can be easily understood
• High accuracy in detecting many kinds of known anomalies
– Semi-supervised classification techniques
• Models that can be easily understood
• Normal behavior can be accurately learned
• Drawbacks:
– Supervised classification techniques
• Require both labels from both normal and anomaly class
• Cannot detect unknown and emerging anomalies
– Semi-supervised classification techniques
• Require labels from normal class
• Possible high false alarm rate - previously unseen (yet legitimate) data
records may be recognized as anomalies
Supervised Classification Techniques
• Manipulating data records (oversampling /
undersampling / generating artificial examples)
• Rule based techniques
• Model based techniques
– Neural network based approaches
– Support Vector machines (SVM) based approaches
– Bayesian networks based approaches
• Cost-sensitive classification techniques
• Ensemble based algorithms (SMOTEBoost,
RareBoost
Manipulating Data Records
•Over-sampling the rare class [Ling98]
– Make the duplicates of the rare events until the data set contains as many
examples as the majority class => balance the classes
– Does not increase information but increase misclassification cost
•Down-sizing (undersampling) the majority class [Kubat97]
– Sample the data records from majority class (Randomly, Near miss examples,
Examples far from minority class examples (far from decision boundaries)
– Introduce sampled data records into the original data set instead of original data
records from the majority class
– Usually results in a general loss of information and overly general rules
•Generating artificial anomalies
– SMOTE (Synthetic Minority Over-sampling TEchnique) [Chawla02] - new rare
class examples are generated inside the regions of existing rare class examples
– Artificial anomalies are generated around the edges of the sparsely populated
data regions [Fan01]
– Classify synthetic outliers vs. real normal data using active learning [Abe06]
Rule Based Techniques
•Creating new rule based algorithms (PN-rule, CREDOS)
•Adapting existing rule based techniques
–Robust C4.5 algorithm [John95]
–Adapting multi-class classification methods to single-class classification
problem
•Association rules
–Rules with support higher than pre specified threshold may characterize
normal behavior [Barbara01, Otey03]
–Anomalous data record occurs in fewer frequent itemsets compared to
normal data record [He04]
–Frequent episodes for describing temporal normal behavior [Lee00,Qin04]
•Case specific feature/rule weighting
–Case specific feature weighting [Cardey97] - Decision tree learning, where
for each rare class test example replace global weight vector with
dynamically generated weight vector that depends on the path taken by
that example
–Case specific rule weighting [Grzymala00] - LERS (Learning from
Examples based on Rough Sets) algorithm increases the rule strength for
all rules describing the rare class
New Rule-based Algorithms: PN-rule Learning*
• P-phase:
• cover most of the positive examples with high support
• seek good recall
• N-phase:
• remove FP from examples covered in P-phase
• N-rules give high accuracy and significant support
C C
NC NC
Existing techniques can possibly PNrule can learn strong signatures for
learn erroneous small signatures for presence of NC in N-phase
absence of C * M. Joshi, et al., PNrule, Mining Needles in a Haystack: Classifying Rare Classes via Two-Phase
Rule Induction, ACM SIGMOD 2001
New Rule-based Algorithms: CREDOS*
• Ripple Down Rules (RDRs) can be represented as a decision tree
where each node has a predictive rule associated with it
• RDRs specialize a generic form of multi-phase
PNrule model
• Two phases: growth and pruning
• Growth phase:
– Use RDRs to overfit the training data
– Generate a binary tree where each node is characterized
by the rule Rh, a default class and links to two child subtrees
– Grow the RDS structure in a recursive manner
• Prune the structure to improve generalization
– Different mechanism from decision trees
* M. Joshi, et al., CREDOS: Classification Using Ripple Down Structure (A Case for Rare Classes),
SIAM International Conference on Data Mining, (SDM'04), 2004.
Using Neural Networks
• Multi-layer Perceptrons
– Measuring the activation of output nodes [Augusteijn02]
– Extending the learning beyond decision boundaries
• Equivalent error bars as a measure of confidence for classification [Sykacek97]
• Creating hyper-planes for separating between various classes, but also to have
flexible boundaries where points far from them are outliers [Vasconcelos95]
• Auto-associative neural networks
– Replicator NNs [Hawkins02]
– Hopfield networks [Jagota91, Crook01]
• Adaptive Resonance Theory based [Dasgupta00, Caudel93]
• Radial Basis Functions based
– Adding reverse connections from output to central layer allows each neuron to
have associated normal distribution, and any new instance that does not fit any of
these distributions is an anomaly [Albrecht00, Li02]
• Oscillatory networks
– Relaxation time of oscillatory NNs is used as a criterion for novelty detection when
a new instance is presented [Ho98, Borisyuk00]
Using Support Vector Machines
• SVM Classifiers [Steinwart05,Mukkamala02]
• Main idea [Steinwart05] :
– Normal data records belong to high density data regions
– Anomalies belong to low density data regions
– Use unsupervised approach to learn high density and low
density data regions
– Use SVM to classify data density level
• Main idea: [Mukkamala02]
– Data records are labeled (normal network behavior vs.
intrusive)
– Use standard SVM for classification
* A. Lazarevic, et al., A Comparative Study of Anomaly Detection Schemes in Network Intrusion Detection, SIAM 2003
Semi-supervised Classification Techniques
• Recent approaches:
– Neural network based approaches
– Support Vector machines (SVM) based approaches
– Markov model based approaches
– Rule-based approaches
Using Replicator Neural Networks*
• Use a replicator 4-layer feed-forward neural network (RNN)
with the same number of input and output nodes
• Input variables are the output variables so that RNN forms a
compressed model of the data during training
• A measure of outlyingness is the reconstruction error of
individual data points.
Input Target
variables
* S. Hawkins, et al. Outlier detection using replicator neural networks, DaWaK02 2002.
Using Support Vector Machines
• Converting into one class classification problem
– Separate the entire set of training data from the
origin, i.e. to find a small region where most of the
data lies and label data points in this region as one
class [Ratsch02, Tax01, Eskin02, Lazarevic03]
• Parameters
– Expected number of outliers
– Variance of rbf kernel (As the variance of the rbf
kernel gets smaller, the number of support vectors
is larger and the separating surface gets more complex)
– Not suitable for datasets that have modes with varying density
– Example:
p2 NN approach may
× p1 consider p3 as outlier, but
×
LOF approach does not
Local Outlier Factor (LOF)*
• For each data point q compute the distance to the k-th nearest neighbor
(k-distance)
•Compute reachability distance (reach-dist) for each data example q with
respect to data example p as:
reach-dist(q, p) = max{k-distance(p), d(q,p)}
•Compute local reachability density (lrd) of data example q as inverse of the
average reachabaility distance based on the MinPts nearest neighbors of
data example q
MinPts
lrd(q) =
reach _ dist MinPts (q, p)
p
•Compaute LOF(q) as ratio of average local reachability density of q’s k-
nearest neighbors and local reachability density of the data record q
1 lrd ( p)
LOF(q) = ⋅
MinPts p lrd (q)
* - Breunig, et al, LOF: Identifying Density-Based Local Outliers, KDD 2000.
Connectivity Outlier Factor (COF)*
• Outliers are points p where average
chaining distance ac-distkNN(p)(p)
is larger than the average chaining
distance (ac-dist) of their k-nearest
neighborhood kNN(p)
p q
p4G\{p1, p2}
p3 G
G\{p1}
p2
p1
e2
G\{p1} dist (ei )
p2
e1
p1
Distances dist(ei) between two sets {p1,…, pi} and G\{p1,…, pi} for each i are called
COST DESCRIPTIONS
Edges ei for each i are called SBN trail
SBN trail may not be a connected graph!
Average Chaining Distance (ac-dist)
• We average cost descriptions!
• We would like to give more weights to points
closer to the point p1
• This leads to the following formula:
r
2(r − i )
ac − dist G ( p ) ≡ dist (ei )
i =1 r (r − 1)
* E. Eskin et al., A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in
Unlabeled Data, 2002.
Cluster based Local Outlier Factor*-CBLOF
• Use squeezer clustering algorithm
to perform clustering.
• Determine CBLOF for each data
instance
– if the data record lies in a small cluster,
CBLOF = (size of cluster) X (distance
between the data instance and the
closest larger cluster).
– if the object belongs to a large cluster,
CBLOF = (size of cluster) X (distance
between the data instance and the
cluster it belongs to).
*He, Z., Xu, X. i Deng, S. (2003). Discovering cluster based local outliers, Pattern Recognition Letters,
24 (9-10), str. 1651-1660
Taxonomy
Anomaly Detection Point Anomaly Detection
• Drawbacks
– With high dimensions, difficult to estimate parameters,
and to construct hypothesis tests.
– Parametric assumptions might not hold true for real data
sets.
Types of Statistical Techniques
• Parametric Techniques
– Assume that the normal (and possibly anomalous) data is generated
from an underlying parametric distribution.
– Learn the parameters from the training sample.
• Non-parametric Techniques
– Do not assume any knowledge of parameters.
– Use non-parametric techniques to estimate the density of the
distribution – e.g., histograms, parzen window estimation.
Using Chi-square Statistic*
Ye, N. and Chen, Q. 2001. An anomaly detection technique based on a chi-square statistic for detecting
intrusions into information systems. Quality and Reliability Engineering International 17, 105-112.
SmartSifter (SS)*
• Statistical modeling of data with continuous and categorical attributes.
– Histogram density used to represent a probability density for categorical
attributes.
– Finite mixture model used to represent a probability density for continuous
attributes.
• For a test instance, SS estimates the probability of the test instance to
be generated by the learnt statistical model – pt-1
• The test instance is then added to the sample, and the model is re-
estimated.
• The probability of the test instance to be generated from the new model
is estimated – pt.
• Anomaly score for the test instance is the difference |pt – pt-1|.
* K. Yamanishi, On-line unsupervised outlier detection using finite mixtures with discounting learning
algorithms, KDD 2000
Modeling Normal and Anomalous Data*
• Distribution for the data D is given by:
– D = (1-λ)·M + λ·A
M - majority distribution, A - anomalous distribution.
– M, A : sets of normal, anomalous elements respectively.
– Step 1 : Assign all instances to M, A is initially empty.
– Step 2 : For each instance xi in M,
• Step 2.1 : Estimate parameters for M and A.
• Step 2.2 : Compute log-likelihood L of distribution D.
• Step 2.3 : Remove x from M and insert in A.
• Step 2.4 : Re-estimate parameters for M and A.
• Step 2.5 : Compute the log-likelihood L’ of distribution D.
• Step 2.6 : If L’ – L > , x is an anomaly, otherwise x is moved back to M.
– Step 3 : Go back to Step 2.
* E. Eskin, Anomaly Detection over Noisy Data using Learned Probability Distributions, ICML 2000
Taxonomy
Anomaly Detection Point Anomaly Detection
• Drawbacks
– Require an information theoretic measure sensitive
enough to detect irregularity induced by very few
anomalies.
Using Entropy*
• Find a k-sized subset whose removal leads to
the maximal decrease in entropy of the data set.
He, Z., Xu, X., and Deng, S. 2005. An optimization model for outlier detection in categorical data. In
Proceedings of International Conference on Intelligent Computing. Vol. 3644. Springer.
Spectral Techniques
• Analysis based on Eigen decomposition of data.
• Key Idea
– Find combination of attributes that capture bulk of
variability.
– Reduced set of attributes can explain normal data well,
but not necessarily the anomalies.
• Advantage
– Can operate in an unsupervised mode.
• Drawback
– Based on the assumption that anomalies and normal
instances are distinguishable in the reduced space.
Using Robust PCA*
• Compute the principal components of the dataset
• For each test point, compute its projection on these components
• If yi denotes the ith component, then the following has a chi-squared
distribution
* Ide, T. and Kashima, H. Eigenspace-based anomaly detection in computer systems. KDD, 2004
Visualization Based Techniques
• Use visualization tools to observe the data.
• Provide alternate views of data for manual inspection.
• Anomalies are detected visually.
• Advantages
– Keeps a human in the loop.
• Drawbacks
– Works well for low dimensional data.
– Anomalies might be not identifiable in the aggregated or partial views
for high dimension data.
– Not suitable for real-time anomaly detection.
Visual Data Mining*
• Detecting Tele-
communication fraud.
• Display telephone call
patterns as a graph.
• Use colors to identify
fraudulent telephone
calls (anomalies).
* Cox et al 1997. Visual data mining: Recognizing telephone calling fraud. Journal of Data Mining and Knowledge Discovery.
Taxonomy
Anomaly Detection Point Anomaly Detection
* Xiuyao Song, Mingxi Wu, Christopher Jermaine, Sanjay Ranka, Conditional Anomaly Detection, IEEE Transactions on Data
and Knowledge Engineering, 2006.
Taxonomy
Anomaly Detection Point Anomaly Detection
• Multiple sub-formulations
– Detect anomalous sequences in a database of
sequences, or
– Detect anomalous subsequence within a
sequence.
Sequence Time Delay Embedding (STIDE)*
• Assumes a training data containing normal sequences
• Training
– Extracts fixed length (k) subsequences by sliding a window over the
training data.
– Maintain counts for all subsequences observed in the training data.
• Testing
– Extract fixed length subsequences from the test sequence.
– Find empirical probability of each test subsequence from the above
counts.
– If probability for a subsequence is below a threshold, the
subsequence is declared as anomalous.
– Number of anomalous subsequences in a test sequence is its
anomaly score.
• Applied for system call intrusion detection.
* Warrender, Christina, Stephanie Forrest, and Barak Pearlmutter. Detecting Intrusions Using System Calls: Alternative Data
Models. To appear, 1999 IEEE Symposium on Security and Privacy. 1999.
Sequential Anomaly Detection –
Current State of Art
87
State Based Model Based Kernel Based
FSA PST SMT HMM Ripper Clustering kNN
Data/Applications
Operating System Call [4][7] [3] [4][5] [11] [4][8]
Univariate Data [10] [12]
Symbolic
Protein Data [9]
Sequences
Flight Safety Data [14] [13]
– Aircraft safety
….. …..
Di Di+1
Time
* D. Pokrajac, A. Lazarevic, and L. J. Latecki. Incremental local outlier detection for data streams. In
Proceedings of IEEE Symposium on Computational Intelligence and Data Mining, 2007.
Taxonomy
Anomaly Detection Point Anomaly Detection
40000
20000
0
1990
1 1991
2 1992
3 1993
4 1994
5 1995
6 1996
7 1997
8 1998
9 1999
10 2000
11 2001
12 2002
13 2003
14
Scanning Computer
activity Network
Compromised
Machine with
Attacker Machine
vulnerability
IDS - Analysis Strategy
• Misuse detection is based on extensive knowledge of patterns
associated with known attacks provided by human experts
– Existing approaches: pattern (signature) matching, expert systems, state
transition analysis, data mining
– Major limitations:
• Unable to detect novel & unanticipated attacks
• Signature database has to be revised for each new type of discovered attack
• Anomaly detection is based on profiles that represent normal behavior of
users, hosts, or networks, and detecting attacks as significant deviations
from this profile
– Major benefit - potentially able to recognize unforeseen attacks.
– Major limitation - possible high false alarm rate, since detected deviations do
not necessarily represent actual attacks
– Major approaches: statistical methods, expert systems, clustering, neural
networks, support vector machines, outlier detection schemes
Intrusion Detection
Intrusion Detection System
– combination of software
and hardware that attempts
to perform intrusion detection
– raises the alarm when possible
intrusion happens
Traditional intrusion detection system IDS tools (e.g. SNORT) are based
on signatures of known attacks
– Example of SNORT rule (MS-SQL “Slammer” worm)
any -> udp port 1434 (content:"|81 F1 03 01 04 9B 81 F1 01|";
content:"sock"; content:"send")
Limitations www.snort.org
– Signature database has to be manually revised for each new type of
discovered intrusion
– They cannot detect emerging cyber threats
– Substantial latency in deployment of newly created signatures across the
computer system
• Data Mining can alleviate these limitations
Data Mining for Intrusion Detection
Increased interest in data mining based intrusion detection
– Attacks for which it is difficult to build signatures
– Attack stealthiness
– Unforeseen/Unknown/Emerging attacks
– Distributed/coordinated attacks
Data mining approaches for intrusion detection
– Misuse detection
Building predictive models from labeled labeled data sets (instances
are labeled as “normal” or “intrusive”) to identify known intrusions
High accuracy in detecting many kinds of known attacks
Cannot detect unknown and emerging attacks
– Anomaly detection
Detect novel attacks as deviations from “normal” behavior
Potential high false alarm rate - previously unseen (yet legitimate) system
behaviors may also be recognized as anomalies
– Summarization of network traffic
Data Mining for Intrusion Detection
cal l cal us
Misuse Detection – r i r a r i u o
Building Predictive e go po e go tin
ss
a t e m a t on la
Start Number Models c t c c c
Tid SrcIP Dest IP Dest Attack
time Port of bytes Start Number
Number
Tid SrcIP Dest
DestPort
IP Attack
Attack
1 206.135.38.95 11:07:20 160.94.179.223 139 192 No time of bytes
of bytes
∈∈ !
!""
Anomaly Detection on Real Network Data
• Anomaly detection was used at U of Minnesota and Army Research Lab to
detect various intrusive/suspicious activities
• Many of these could not be detected using widely used intrusion detection
tools like SNORT
• Anomalies/attacks picked by MINDS
– Scanning activities
– Non-standard behavior
• Policy violations M
I
• Worms N
MINDS – Minnesota Intrusion Detection System D
S
Association Summary and
Anomaly pattern analysis characterization
scores of attacks
network
• Policy Violations
–August 8, 2005, Identified machine running Microsoft PPTP VPN server on non-standard ports (Ranked #1)
• Undetected by SNORT since the collected GRE traffic was part of the normal traffic
– August 10 2005 & October 30, 2005, Identified compromised machines running FTP servers on non-standard ports, which is a policy violation (Ranked #1)
• Example of anomalous behavior following a successful Trojan horse attack
–February 6, 2006, The IP address 128.101.X.0 (not a real computer, but a network itself) has been targeted with IP Protocol 0 traffic from Korea (61.84.X.97) (bad since
IP Protocol 0 is not legitimate)
–February 6, 2006, Detected a computer on the network apparently communicating with a computer in California over a VPN or on IPv6
• Worms
–October 10, 2005, Detected several instances of slapper worm that were not identified by SNORT since they were variations of existing worm code
–February 6, 2006, Detected unsolicited ICMP ECHOREPLY messages to a computer previously infected with Stacheldract worm (a DDos agent)
Conclusions
• Questions?
References
• Ling, C., Li, C. Data mining for direct marketing: Problems and solutions, KDD, 1998.
• Kubat M., Matwin, S., Addressing the Curse of Imbalanced Training Sets: One-Sided Selection, ICML 97.
• N. Chawla et al., SMOTE: Synthetic Minority Over-Sampling Technique, JAIR, 2002.
• W. Fan et al, Using Artificial Anomalies to Detect Unknown and Known Network Intrusions, ICDM 2001
• N. Abe, et al, Outlier Detection by Active Learning, KDD 2006
• C. Cardie, N. Howe, Improving Minority Class Prediction Using Case specific feature weighting, ICML 97.
• J. Grzymala et al, An Approach to Imbalanced Data Sets Based on Changing Rule Strength, AAAI
Workshop on Learning from Imbalanced Data Sets, 2000.
• George H. John. Robust linear discriminant trees. AI&Statistics, 1995
• Barbara, D., Couto, J., Jajodia, S., and Wu, N. Adam: a testbed for exploring the use of data mining in
intrusion detection. SIGMOD Rec., 2001
• Otey, M., Parthasarathy, S., Ghoting, A., Li, G., Narravula, S., and Panda, D. Towards nic-based
intrusion detection. KDD 2003
• He, Z., Xu, X., and Deng, S. 2005. An optimization model for outlier detection in categorical data. In
Proceedings of International Conference on Intelligent Computing. Vol. 3644. Springer.
• Lee, W., Stolfo, S. J., and Mok, K. W. Adaptive intrusion detection: A data mining approach. Artificial
Intelligence Review, 2000
• Qin, M. and Hwang, K. Frequent episode rules for internet anomaly detection. In Proceedings of the 3rd
IEEE International Symposium on Network Computing and Applications, 2004
• Ide, T. and Kashima, H. Eigenspace-based anomaly detection in computer systems. KDD, 2004
• Sun, J. et al., Less is more: Compact matrix representation of large sparse graphs. SDM 2007
References
• Lee, W. and Xiang, D. Information-theoretic measures for anomaly detection. In Proceedings of the IEEE
Symposium on Security and Privacy. IEEE Computer Society, 2001
• Ratsch, G., Mika, S., Scholkopf, B., and Muller, K.-R. Constructing boosting algorithms from SVMs: An
application to one-class classification. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2002
• Tax, D. M. J. One-class classification; concept-learning in the absence of counter-examples. Ph.D.
thesis, Delft University of Technology, 2001
• Eskin, E., Arnold, A., Prerau, M., Portnoy, L., and Stolfo, S. A geometric framework for unsupervised
anomaly detection. In Proceedings of Applications of Data Mining in Computer Security, 2002
• A. Lazarevic, et al., A Comparative Study of Anomaly Detection Schemes in Network Intrusion Detection,
SDM 2003
• Scholkopf, B., Platt, O., Shawe-Taylor, J., Smola, A., and Williamson, R. Estimating the support of a
high-dimensional distribution. Tech. Rep. 99-87, Microsoft Research, 1999
• Baker, D. et al., A hierarchical probabilistic model for novelty detection in text. ICML 1999
• Das, K. and Schneider, J. Detecting anomalous records in categorical datasets. KDD 2007
• Augusteijn, M. and Folkert, B. Neural network classification and novelty detection. International Journal
on Remote Sensing, 2002
• Sykacek, P. Equivalent error bars for neural network classifiers trained by Bayesian inference. In
Proceedings of the European Symposium on Artificial Neural Networks. 121–126, 1997
• Vasconcelos, G. C., Fairhurst, M. C., and Bisset, D. L. Investigating feedforward neural networks with
respect to the rejection of spurious patterns. Pattern Recognition Letter, 1995
References
• S. Hawkins, et al. Outlier detection using Replicator neural networks, DaWaK02 2002.
• Jagota, A. Novelty detection on a very large number of memories stored in a Hopfield-style network. In
Proceedings of the International Joint Conference on Neural Networks, 1991
• Crook, P. and Hayes, G. A robot implementation of a biologically inspired method for novelty detection.
In Proceedings of Towards Intelligent Mobile Robots Conference, 2001
• Dasgupta, D. and Nino, F. 2000. A comparison of negative and positive selection algorithms in novel
pattern detection. IEEE International Conference on Systems, Man, and Cybernetics, 2000
• Caudell, T. and Newman, D. An adaptive resonance architecture to define normality and detect novelties
in time series and databases. World Congress on Neural Networks, 1993
• Albrecht, S. et al. Generalized radial basis function networks for classification and novelty detection: self-
organization of optional Bayesian decision. Neural Networks, 2000
• Steinwart, I., Hush, D., and Scovel, C. A classification framework for anomaly detection. JMLR, 2005
• Srinivas Mukkamala et al. Intrusion Detection Systems Using Adaptive Regression Splines. ICEIS 2004
• Li, Y., Pont et al. Improving the performance of radial basis function classifiers in condition monitoring
and fault diagnosis applications where unknown faults may occur. Pattern Recognition Letters, 2002
• Borisyuk, R. et al. An oscillatory neural network model of sparse distributed memory and novelty
detection. Biosystems, 2000
• Ho, T. V. and Rouat, J. Novelty detection based on relaxation time of a network of integrate-and-fire
neurons. Proceedings of Second IEEE World Congress on Computational Intelligence, 1998
• J. Vaidya and C. Clifton. Privacy-preserving outlier detection. In Proceedings of the 4th IEEE
International Conference on Data Mining, pages 233–240, 2004.
References
113
• R. Blender, K. Fraedrich, and F. Lunkeit. Identification of cyclone track regimes in the north atlantic.
Quarterly Journal of the Royal Meteorological Society, 123(539):727–741, 1997.
• Y. Bu, T.-W. Leung, A. Fu, E. Keogh, J. Pei, and S. Meshkin. Wat: Finding top-k discords in time series
database. In Proceedings of 7th SIAM International Conference on Data Mining, 2007.
• E. Eskin and S. Stolfo. Modeling system call for intrusion detection using dynamic window sizes. In
Proceedings of DARPA Information Survivability Conference and Exposition, 2001.
• S. Forrest, C. Warrender, and B. Pearlmutter. Detecting intrusions using system calls: Alternate data
models. In Proceedings of the 1999 IEEE Symposium on Security and Privacy, pages 133–145,
Washington, DC, USA, 1999. IEEE Computer Society.
• B. Gao, H.-Y. Ma, and Y.-H. Yang. Hmms (hidden markov models) based on anomaly intrusion
detection method. In Proceedings of International Conference on Machine Learning and Cybernetics,
pages 381–385. IEEE Computer Society, 2002.
• R. Gwadera, M. J. Atallah, and W. Szpankowski. Detection of significant sets of episodes in event
sequences. In Proceedings of the Fourth IEEE International Conference on Data Mining, pages 3–10,
Washington, DC, USA, 2004. IEEE Computer Society.
• S. A. Hofmeyr, S. Forrest, and A. Somayaji. Intrusion detection using sequences of system calls.
Journal of Computer Security, 6(3):151–180, 1998.
• E. Keogh, J. Lin, S.-H. Lee, and H. V. Herle. Finding the most unusual time series subsequence:
algorithms and applications. Knowledge and Information Systems, 11(1):1–27, 2006.
References
114
• W. Lee and S. Stolfo. Data mining approaches for intrusion detection. In Proceedings of the 7th
USENIX Security Symposium, San Antonio, TX, 1998.
• P. Sun, S. Chawla, and B. Arunasalam. Mining for outliers in sequential databases. In Proceedings of
SIAM Conference on Data Mining, 2006.
• N. Ye. A markov chain model of temporal behavior for anomaly detection. In Proceedings of the 5th
Annual IEEE Information Assurance Workshop. IEEE, 2004.
• X. Zhang, P. Fan, and Z. Zhu. A new anomaly detection method based on hierarchical hmm. In
Proceedings of the 4th International Conference on Parallel and Distributed Computing, Applications
and Technologies, pages 249–252, 2003.
• C. C. Michael and A. Ghosh. Two state-based approaches to program based anomaly detection. In
Proceedings of the 16th Annual Computer Security Applications Conference, page 21, 2000.
• S. Budalakoti, A. Srivastava, R. Akella, and E. Turkov. Anomaly detection in large sets of high-
dimensional symbol sequences. Technical Report NASA TM-2006-214553, NASA Ames Research
Center, 2006.
• A. N. Srivastava. Discovering system health anomalies using data mining techniques. In Proceedings of
2005 Joint Army Navy NASA Airforce Conference on Propulsion, 2005.
• P. K. Chan and M. V. Mahoney. Modeling multiple time series for anomaly detection. In Proceedings of
the Fifth IEEE International Conference on Data Mining, pages 90–97, Washington, USA, 2005.
Backup Slides