Outlier Detection
Outlier Detection
Challenges
How many outliers are there in the data?
Method is unsupervised
Working assumption:
There are considerably more “normal”
Distance-based
Model-based
Limitations
Time consuming
Subjective
Likelihood at time t:
N |At |
Lt ( D ) PD ( xi ) (1 ) PM t ( xi ) PAt ( xi )
|M t |
i 1 xi M t xiAt
LLt ( D ) M t log( 1 ) log PM t ( xi ) At log log PAt ( xi )
xi M t xi At
Density based
Clustering based
In the NN approach, p2 is
not considered as outlier,
while LOF approach find
both p1 and p2 as outliers
p2
p1
different density
Choose points in small cluster
as candidate outliers
Compute the distance between