Outlier Detection
Outlier Detection
Detection
Method is unsupervised
Validation can be quite challenging (just like for
clustering)
Finding needle in a haystack
Working assumption:
There are considerably more “normal”
Profile can be patterns or summary statistics for the
overall population
Use the “normal” profile to detect anomalies
Anomalies are observations whose characteristics
differ significantly from the normal profile
Distance-based
Model-based
Limitations
Time consuming
Subjective
Density based
Clustering based
The top n data points whose distance to the kth
nearest neighbor is greatest
The top n data points whose average distance to the k
nearest neighbors is greatest
In the NN approach, p2
is not considered as
outlier, while LOF
approach find both p1
p2 and p2 as outliers
p1
of different density
Choose points in small
cluster as candidate
outliers
Compute the distance