0% found this document useful (0 votes)
198 views2 pages

Understanding Outlier Detection Methods

The document discusses outlier detection techniques used to identify abnormal data points that differ from expected behavior. It covers global vs local outlier detection, labeling vs scoring outliers, and classification of approaches based on modeling properties such as depth, deviation, distance, density and angles.

Uploaded by

Rehman Ali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
198 views2 pages

Understanding Outlier Detection Methods

The document discusses outlier detection techniques used to identify abnormal data points that differ from expected behavior. It covers global vs local outlier detection, labeling vs scoring outliers, and classification of approaches based on modeling properties such as depth, deviation, distance, density and angles.

Uploaded by

Rehman Ali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Outlier

An outlier is an observation which deviates so much from the other observations as to arouse suspicions
that it was generated by a different mechanism.

Sample application for outlier

Fraud detection → changing purchasing behavior of card

Medicine → Unusual symptoms

Public health → The occurrence of a particular disease,

Sports statistics→ Sometimes, players show abnormal values only on a subset or a special combination
of the recorded parameters

General application scenarios


• Supervised Scenario: You have data with both normal and abnormal examples, sometimes from
multiple categories. However, there might be a lot more of one type than the other, making it
tricky to classify accurately.
• Semi-supervised Scenario: You only have examples of what's considered normal (or abnormal)
without the counterpart.
• Unsupervised Scenario: No training data is available at all. You have to figure out patterns and
anomalies on your own without any labeled examples to guide you.

Classification of approaches
1. Global vs. Local Outlier Detection:
• Global Outlier Detection: In global outlier detection, each data point's "outlierness" is
assessed relative to the entire dataset or a global reference set. It looks at the dataset as a
whole to identify outliers.
• Local Outlier Detection: In local outlier detection, each data point's "outlierness" is
assessed relative to a local neighborhood or subset of the data. It focuses on detecting
outliers within smaller, more localized groups of data points.
2. Labeling vs. Scoring Outliers:
• Labeling Outliers: Labeling outliers involves categorizing data points as either outliers or
non-outliers based on the output of an algorithm. It assigns a binary label (outlier or not) to
each data point.
• Scoring Outliers: Scoring outliers involves assigning a score or a numerical measure of
outlierness to each data point. It provides a continuous measure indicating how much a
data point deviates from the norm.
3. Modeling Properties:
• Modeling Properties of Outliers: This considers the characteristics or properties of outliers that
are used to model their outlierness. It involves defining what makes a data point an outlier,
which could be based on statistical properties, distance from the centroid, or other relevant
features of the data.
Approaches classified by the properties of the underlying modeling approach

1. Model base approach: a model is defined in this approach the data outside the model is consider
as the outlier.
Example: Probabilistic tests based on statistical models. Depth-based approaches. Deviation
based approaches.
2. Proximity based Approaches: proximity meaning sate of quality being near or close to something.
In this approach the distance is measure to a point the high distance causes the outlier in data.
Example: Distance-based approaches. Density-based approaches
3. Angle base approaches: Examine the spectrum of pairwise angles between a given point and all
other points. Outliers are points that have a spectrum featuring high fluctuation.
Example: linear angle-based approach.

Now we will see their example approaches

Depth base approaches


• A data is defined at the border of the data, the outliers are the outside of
that data.
• Organize data objects in convex hull layers means the actual data makes a
depth at center.
• Outliers are located at the border of the data space
• Normal objects are in the center of the data space
• Algorithms: ISODEPTH, FDC

Deviation-based Approaches
• Given a set of data points (local group or global set)
• Outliers are points that do not fit to the general characteristics of that set,
i.e., the variance of the set is minimized when removing the outliers

Distance based approach


• Judge a point based on the distance(s) to its neighbors
• Normal data objects have a dense neighborhood
• Outliers are far apart from their neighbors, i.e., have a less dense neighborhood
• Algorithms: index base, nested loop bases, grid based

Density base approach


• Compare the density around a point with the density around its local neighbors.
• The density around a normal data object is similar to the density around its
neighbors.
• The density around an outlier is considerably different to the density around its
neighbors
• Algorithm: local outlier factor

You might also like