MBA Analytics For Finance 08
MBA Analytics For Finance 08
08 Anomaly Detection
Names of Sub-Units
Overview
The unit begins by discussing the concept of anomaly detection. Further, the unit explains the
importance of anomaly detection. Towards the end, the unit outlines the practical implementation in
identifying the anomaly detection.
Learning Objectives
Learning Outcomes
http://cucis.ece.northwestern.edu/projects/DMS/publications/AnomalyDetection.pdf
8.1 INTRODUCTION
For a long time, the buzzword has been data. Whether its data generated by major corporations or data
generated by individuals, every piece of data must be studied to reap the benefits. Data analytics is used
to discover hidden insights, generate reports, do market analysis and improve business requirements
and it plays a crucial part in developing your business.
The techniques used to evaluate data in order to improve efficiency and business gain are referred
to as data analytics. To assess distinct behavioural patterns, data is extracted from multiple sources,
cleaned and categorised. Depending on the business or individual, different strategies and instruments
are used.
When the volume of information is large, the dataset also for sure becomes larger and needs to be
analysed properly. The tools and techniques for data analysis such as various programming languages
and algorithms are used. Larger the dataset, it can become complicated to analyse the same and achieve
the expected outcomes. In the recent past, the task of data analysis is performed by computerised
systems and those are equipped with various techniques. The terms ‘Artificial Intelligence’ and Machine
Learning came into existence with the rise of computerised analytical activities. Machine learning is a
field of artificial intelligence that provides the ability to computers to learn automatically from data
and make a decision without the intervention of humans. Machine learning helps computers to identify
the pattern easily from the data which is difficult for a human.
During the analysis of the data, the machines or the computer systems also go through some errors or
hindrances and those need to be overcome for achievement of the expected results. In the language of
AI and ML, these errors are termed anomalies. Those anomalies need to be detected and surpassed, as a
result of which the dataset will be able to generate the required results which are expected.
2
UNIT 08: Anomaly Detection JGI JAIN
DEEMED-TO-BE UNI VE RSI TY
applied to unlabelled data by data scientists in a process known as unsupervised anomaly detection, is
based on two basic assumptions:
Anomalies in data are extremely infrequent.
The characteristics of data anomalies differ greatly from those of regular occurrences.
Anomaly data is usually associated with a problem or an uncommon event, such as hacking, bank fraud,
malfunctioning equipment, structural faults/infrastructural breakdowns or typographical errors. As a
result, from a commercial standpoint, recognising true anomalies rather than false positives or data
noise is critical.
Anomaly detection is the process of identifying unusual events, items or observations that are unusual
in comparison to standard behaviours or patterns. Standard deviations, outliers, noise, novelty and
exceptions are all terms used to describe data anomalies. Anomaly detection is critical in the development
of reliable distributed software systems. Anomaly detection can be used to:
Improve system behaviour communication.
Enhance your root cause analysis.
Threats to the software ecosystem should be reduced.
Anomaly detection has always been done by hand. Machine learning approaches, on the other hand,
are enhancing the accuracy of anomaly detectors. Of course, there are start-up costs associated with
machine learning, such as data requirements and engineering talent.
Anomaly detection approaches are divided into three categories: unsupervised, semi-supervised and
supervised. Essentially, the best anomaly detection approach is determined by the dataset’s labels.
These categories are elaborated in the detailed form with the help of Figure 1:
It discovers anomalies in
It requires a dataset with It builds a model reflecting an unlabeled test set of data
complete set of ‘normal’ normal behaviour using a simply based on the data’s
and ‘abnormal’ labels for normal, labelled training inherent qualities assuming
classification algorithm data set. that most of the instances in
the dataset are normal.
After all the discussion above on anomaly detection, let us now check about what are anomalies. The
term, ‘Anomaly’ can be defined as something anomalous that deviates from the common or standard
rule and something different, irregular, abnormal or not easily classified. Anomaly is a descendant of
the Greek word anomalous, which means “uneven” or “irregular.”
3
JGI JAIN
DEEMED-TO-BE UNI VE RSI TY
Financial Analytics
As a result, anomaly detection is critical for extracting critical business insights and ensuring the
continuity of vital processes. Consider these patterns, which all necessitate the capacity to distinguish
between normal and pathological behaviour with pinpoint accuracy:
An online retailer must forecast whether discounts, events or new products will result in sales spikes,
putting more strain on their web servers.
An IT security team must recognise anomalous login patterns and user activities to avoid hacking.
A cloud provider must allocate traffic and services, as well as evaluate infrastructure improvements
in light of traffic trends and previous resource failures.
Awell-constructedbehaviouralmodelbasedonevidencecanassistusersnotonlydescribedatabehaviour, but
also detect outliers and doing significant prediction analysis. Because of the overwhelming volume of
the operating parameters and the ease with which abnormalities in false positives or negatives can be
missed, static alerts and thresholds are insufficient.
Newer systems use clever algorithms for spotting outliers in seasonal time series data and properly
forecasting periodic data patterns to overcome these types of operational restrictions.
4
UNIT 08: Anomaly Detection JGI JAIN
DEEMED-TO-BE UNI VE RSI TY
Managing and monitoring the functioning of distributed systems is a chore—albeit a necessary one—
in today’s society. With hundreds or thousands of items to monitor, anomaly detection can assist in
identifying where an error is occurring, improving root cause investigation and allowing for faster
tech assistance. The task of anomaly detection can be performed with the help of Machine learning.
The following are some of the areas/domains which require the application of anomaly detection in the
business world:
Intrusion detection, for example, detects anomalies in network traffic
The patient’s health monitoring and control system
Banks efficiently detect frauds in credit card transactions
Fault detection in real-world atmospheres
Internet detects the presence of fake news and miscommunication
Damage detection in the industry
Monitoring safety, security and surveillance
According to statistician Pierre Lafaye de Micheaux, “Outliers are not always a bad thing. These are just
observations that are not following the same pattern as the other ones. But it can be the case that an
outlier is very interesting. For example, if in a biological experiment, a rat is not dead whereas all others
are, then it would be very interesting to understand why. This could lead to new scientific discoveries.
So, it is important to detect outliers.”
Anomaly detection is a method for locating an out-of-the-ordinary point or pattern in a batch of data. The
term outlier is sometimes used to describe an abnormality. Outliers are data objects that stand out from
the rest of the data collection and do not follow the expected pattern of behaviour. Anomaly detection
is a data science application that incorporates numerous data science tasks, such as classification,
regression and clustering, into a single application. The anomaly in the dataset can be pointed out as
shown in Figure 3:
0.5
0.4
0.3
0.2
0.1
0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
5
JGI JAIN
DEEMED-TO-BE UNI VE RSI TY
Financial Analytics
For simplifying the process of anomaly detection, the anomalies are classified into the following three
categories:
Point anomalies: If a single data point differs significantly from the rest of the data, it is considered
anomalous. Credit card fraud detection based on “amount spent” is an example.
Contextual anomalies: The abnormality is context-specific since it is dependent on contextual
information to determine if it is the anomaly. In time-series data, this form of anomaly is common.
People, for example, spend a lot of money around the holidays, but it can be different otherwise.
Collective anomalies: If a group of connected data examples is abnormal when viewed as a whole,
but not as individual values. Example Unexpectedly, someone is attempting to copy data from a
faraway machine to a local host (a potential cyber-attack).
There are various approaches toward anomaly detection in machine learning. Some of them are as
follows:
Clustering based anomaly detection: Comparable data points tend to belong to similar groups or
clusters, as indicated by their distance from local centroids, under this technique, which focuses on
unsupervised learning.
The k-means technique can be used to divide a dataset into a predetermined number of clusters.
Anomalies are data points that do not fall into one of these clusters.
Density based anomaly detection: The K-nearest neighbours algorithm is used in this method.
Normal data points always occur near a populated neighbourhood, while abnormal data points
wander far away. You can use Euclidian distance or a similar measure, depending on the sort of data
you have, to determine the closest group of data points.
Support vector machine-based anomaly detection: Another successful tool for finding
abnormalities is a support vector machine. One-Class SVMs were created for situations when only
one class is known and the difficulty is identifying anything outside of that class. This is known as
novelty detection and it pertains to the automatic detection of unexpected or aberrant phenomena,
often known as outliers, within a huge volume of typical data.
Supervised deep anomaly detection: When it comes to deep learning, supervised deep anomaly
detection entails using labels from both normal and anomalous data examples to train a deep
supervised binary or multi-class classifier.
For example, multi-class classifier supervised Deep anomaly detection models aid in spotting
unusual brands, forbidden medicine name mentions and fraudulent health-care transactions.
Semi supervised deep anomaly detection: semi-supervised DAD methods are used more in
comparison to the supervised DAD methods because labels for frequent patterns have been far
easier to acquire than labels for anomalies. Outliers are separated using existing labels from a
single class.
One-Class Neural Networks: The one-class neural network (OC-NN) approaches are based on kernel-
based one-class classification, which combines deep networks’ capacity to extract progressively
richer representations of data with the one-class goal of constructing a tight envelope around
normal data.
6
UNIT 08: Anomaly Detection JGI JAIN
DEEMED-TO-BE UNI VE RSI TY
The techniques used to evaluate data to improve efficiency and business gain are referred to as data
analytics.
The terms ‘Artificial Intelligence’ and Machine Learning came into existence with the rise of
computerised analytical activities.
Machine learning is a field of artificial intelligence that provides the ability to computers to learn
automatically from data and make a decision without the intervention of humans.
Machine learning helps computers to identify the pattern easily from the data that is difficult for a
human.
Anomaly detection (also known as outlier detection) is the recognition of unforeseen events,
inferences or aspects that vary considerably from the standard.
Anomaly data is usually associated with a problem or an uncommon event, such as hacking, bank
fraud, malfunctioning equipment, structural faults/infrastructural breakdowns or typographical
errors.
Anomaly detection is critical in the development of reliable distributed software systems.
Anomaly detection approaches are divided into three categories: unsupervised, semi-supervised
and supervised.
Network administrators must be able to recognise and respond to changing operational conditions.
Anomaly detection is critical for extracting critical business insights and ensuring the continuity of
vital processes.
Any procedure that detects the outliers in a dataset; those objects that do not belong, is known as
anomaly detection.
Outliers are data objects that stand out from the rest of the data collection and do not follow the
expected pattern of behaviour.
8.6 GLOSSARY
Data analytics: The techniques used to evaluate data to improve efficiency and business gain
Machine learning: A field of artificial intelligence that provides the ability for computers to learn
automatically from data and make decision without the intervention of human
Anomaly detection: It is the recognition of unforeseen events, inferences or aspects that vary
considerably from the standard
Outliers: Data objects that stand out from the rest of the data collection and do not follow the
expected pattern of behaviour
7
JGI JAINDEEMED-TO-BE UNI VE RSI TY
Financial Analytics
1. Anomaly detection is the recognition of unforeseen events, inferences or aspects that vary
considerably from the standard. Any sort of anomaly detection, which is frequently applied to
unlabelled data by data scientists in a process known as unsupervised anomaly detection, is based
on two basic assumptions:
Anomalies in data are extremely infrequent.
The characteristics of data anomalies differ greatly from those of regular occurrences.
Refer to Section Introduction to Anomaly Detection
2. Anomaly detection approaches are divided into three categories: unsupervised, semi-supervised
and supervised. Essentially, the best anomaly detection approach is determined by the dataset’s
labels. Refer to Section Introduction to Anomaly Detection
3. Network administrators must be able to recognise and respond to changing operational conditions.
Any variations in a data centre or cloud application operational circumstances can indicate
excessive levels of business risk. Some divergences, on the other hand, may indicate good growth.
Refer to Section Importance of Anomaly Detection
4. For simplifying the process of anomaly detection, the anomalies are classified into the following
three categories:
Point anomalies: If a single data point differs significantly from the rest of the data, it is
considered anomalous. Credit card fraud detection based on “amount spent” is an example.
Refer to Section Practical Implementation in Identifying the Anomaly Detection
5. There are various approaches to anomaly detection in machine learning. Some of them are as
follows:
8
UNIT 08: Anomaly Detection JGI JAIN
DEEMED-TO-BE UNI VE RSI TY
Clustering based anomaly detection: Comparable data points tend to belong to similar groups
or clusters, as indicated by their distance from local centroids, under this technique, which
focuses on unsupervised learning.
Refer to Section Practical Implementation in Identifying the Anomaly Detection
https://datrics.ai/anomaly-detection-best-practices
https://journalofbigdata.springeropen.com/articles/10.1186/s40537-020-00320-x
https://zindi.africa/learn/introduction-to-anomaly-detection-using-machine-learning-with-a- case-
study
Make a research about the process of anomaly detection with the help of machine learning and
discuss it with you Classmates. Also, find out the various websites or tools which prove helpful for
the same task of anomaly detection.