0% found this document useful (0 votes)
11 views

MBA Analytics For Finance 08

The document discusses anomaly detection including introducing the concept, explaining the importance, and outlining practical implementation. Anomaly detection recognizes unusual events that differ from normal patterns and is important for issues like security, infrastructure problems, and improving systems. It can be unsupervised, semi-supervised, or supervised depending on available data labels.

Uploaded by

JOJO
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

MBA Analytics For Finance 08

The document discusses anomaly detection including introducing the concept, explaining the importance, and outlining practical implementation. Anomaly detection recognizes unusual events that differ from normal patterns and is important for issues like security, infrastructure problems, and improving systems. It can be unsupervised, semi-supervised, or supervised depending on available data labels.

Uploaded by

JOJO
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

UNIT

08 Anomaly Detection

Names of Sub-Units

Introduction to anomaly detection, Importance of Anomaly Detection, Practical Implementation in


Identifying the Anomaly Detection

Overview
The unit begins by discussing the concept of anomaly detection. Further, the unit explains the
importance of anomaly detection. Towards the end, the unit outlines the practical implementation in
identifying the anomaly detection.

Learning Objectives

In this unit, you will learn to:


 Explain the concept of anomaly detection
 Define the significance of anomaly detection
 Describe the importance of anomaly detection
 Explain the practical implementation in identifying the anomaly detection
JGI JAIN
DEEMED-TO-BE UNI VE RSI TY
Financial Analytics

Learning Outcomes

At the end of this unit, you would:


 Assess the knowledge about the anomaly detection
 Examine the role of anomaly detection
 Analyse the importance of anomaly detection
 Evaluate the practical implementation in identifying the anomaly detection

Pre-Unit Preparatory Material

 http://cucis.ece.northwestern.edu/projects/DMS/publications/AnomalyDetection.pdf

8.1 INTRODUCTION
For a long time, the buzzword has been data. Whether its data generated by major corporations or data
generated by individuals, every piece of data must be studied to reap the benefits. Data analytics is used
to discover hidden insights, generate reports, do market analysis and improve business requirements
and it plays a crucial part in developing your business.

The techniques used to evaluate data in order to improve efficiency and business gain are referred
to as data analytics. To assess distinct behavioural patterns, data is extracted from multiple sources,
cleaned and categorised. Depending on the business or individual, different strategies and instruments
are used.

When the volume of information is large, the dataset also for sure becomes larger and needs to be
analysed properly. The tools and techniques for data analysis such as various programming languages
and algorithms are used. Larger the dataset, it can become complicated to analyse the same and achieve
the expected outcomes. In the recent past, the task of data analysis is performed by computerised
systems and those are equipped with various techniques. The terms ‘Artificial Intelligence’ and Machine
Learning came into existence with the rise of computerised analytical activities. Machine learning is a
field of artificial intelligence that provides the ability to computers to learn automatically from data
and make a decision without the intervention of humans. Machine learning helps computers to identify
the pattern easily from the data which is difficult for a human.

During the analysis of the data, the machines or the computer systems also go through some errors or
hindrances and those need to be overcome for achievement of the expected results. In the language of
AI and ML, these errors are termed anomalies. Those anomalies need to be detected and surpassed, as a
result of which the dataset will be able to generate the required results which are expected.

8.2 INTRODUCTION TO ANOMALY DETECTION


Anomaly detection(also known as outlier detection) is the recognition of unforeseen events, inferences
or aspects that vary considerably from the standard. Any sort of anomaly detection, which is frequently

2
UNIT 08: Anomaly Detection JGI JAIN
DEEMED-TO-BE UNI VE RSI TY

applied to unlabelled data by data scientists in a process known as unsupervised anomaly detection, is
based on two basic assumptions:
 Anomalies in data are extremely infrequent.
 The characteristics of data anomalies differ greatly from those of regular occurrences.

Anomaly data is usually associated with a problem or an uncommon event, such as hacking, bank fraud,
malfunctioning equipment, structural faults/infrastructural breakdowns or typographical errors. As a
result, from a commercial standpoint, recognising true anomalies rather than false positives or data
noise is critical.

Anomaly detection is the process of identifying unusual events, items or observations that are unusual
in comparison to standard behaviours or patterns. Standard deviations, outliers, noise, novelty and
exceptions are all terms used to describe data anomalies. Anomaly detection is critical in the development
of reliable distributed software systems. Anomaly detection can be used to:
 Improve system behaviour communication.
 Enhance your root cause analysis.
 Threats to the software ecosystem should be reduced.

Anomaly detection has always been done by hand. Machine learning approaches, on the other hand,
are enhancing the accuracy of anomaly detectors. Of course, there are start-up costs associated with
machine learning, such as data requirements and engineering talent.

Anomaly detection approaches are divided into three categories: unsupervised, semi-supervised and
supervised. Essentially, the best anomaly detection approach is determined by the dataset’s labels.
These categories are elaborated in the detailed form with the help of Figure 1:

It discovers anomalies in
It requires a dataset with It builds a model reflecting an unlabeled test set of data
complete set of ‘normal’ normal behaviour using a simply based on the data’s
and ‘abnormal’ labels for normal, labelled training inherent qualities assuming
classification algorithm data set. that most of the instances in
the dataset are normal.

The model is utilised to spot The anomaly detection


It involves training the
anomalies by determining algorithm will then look
classifier, and it is similar
how likely it is for the model for occurrences that do not
to traditional pattern
to generate any given seem to fit in with the rest of
recognition.
instance. the data set.

Figure 1: Anomaly Detection Approaches

After all the discussion above on anomaly detection, let us now check about what are anomalies. The
term, ‘Anomaly’ can be defined as something anomalous that deviates from the common or standard
rule and something different, irregular, abnormal or not easily classified. Anomaly is a descendant of
the Greek word anomalous, which means “uneven” or “irregular.”

3
JGI JAIN
DEEMED-TO-BE UNI VE RSI TY
Financial Analytics

The classification of anomalies can be in several ways as shown in Figure 2:

These systems monitor Any other unusual or


The detection of anomalies
how applications work, suspect web application
in network behaviour
gathering data on any behaviour that could
necessitates ongoing
issues that arise, such as compromise security, such
network monitoring for
supporting infrastructure as CSS or DDOS attacks,
unusual patterns or events.
and app dependencies. falls into this category.

Figure 2: Classification of Anomalies

8.3 IMPORTANCE OF ANOMALY DETECTION


Network administrators must be able to recognise and respond to changing operational conditions.
Any variations in a data centre or cloud application operational circumstances can indicate excessive
levels of business risk. Some divergences, on the other hand, may indicate good growth.

As a result, anomaly detection is critical for extracting critical business insights and ensuring the
continuity of vital processes. Consider these patterns, which all necessitate the capacity to distinguish
between normal and pathological behaviour with pinpoint accuracy:
 An online retailer must forecast whether discounts, events or new products will result in sales spikes,
putting more strain on their web servers.
 An IT security team must recognise anomalous login patterns and user activities to avoid hacking.
 A cloud provider must allocate traffic and services, as well as evaluate infrastructure improvements
in light of traffic trends and previous resource failures.

Awell-constructedbehaviouralmodelbasedonevidencecanassistusersnotonlydescribedatabehaviour, but
also detect outliers and doing significant prediction analysis. Because of the overwhelming volume of
the operating parameters and the ease with which abnormalities in false positives or negatives can be
missed, static alerts and thresholds are insufficient.

Newer systems use clever algorithms for spotting outliers in seasonal time series data and properly
forecasting periodic data patterns to overcome these types of operational restrictions.

8.4 PRACTICAL IMPLEMENTATION IN IDENTIFYING THE ANOMALY DETECTION


Any procedure that detects the outliers in a dataset; those objects that do not belong, is known as
anomaly detection. These anomalies could indicate unexpected network traffic, reveal a malfunctioning
sensor or simply identify data that has to be cleaned before analysis.

4
UNIT 08: Anomaly Detection JGI JAIN
DEEMED-TO-BE UNI VE RSI TY

Managing and monitoring the functioning of distributed systems is a chore—albeit a necessary one—
in today’s society. With hundreds or thousands of items to monitor, anomaly detection can assist in
identifying where an error is occurring, improving root cause investigation and allowing for faster
tech assistance. The task of anomaly detection can be performed with the help of Machine learning.
The following are some of the areas/domains which require the application of anomaly detection in the
business world:
 Intrusion detection, for example, detects anomalies in network traffic
 The patient’s health monitoring and control system
 Banks efficiently detect frauds in credit card transactions
 Fault detection in real-world atmospheres
 Internet detects the presence of fake news and miscommunication
 Damage detection in the industry
 Monitoring safety, security and surveillance

According to statistician Pierre Lafaye de Micheaux, “Outliers are not always a bad thing. These are just
observations that are not following the same pattern as the other ones. But it can be the case that an
outlier is very interesting. For example, if in a biological experiment, a rat is not dead whereas all others
are, then it would be very interesting to understand why. This could lead to new scientific discoveries.
So, it is important to detect outliers.”

Anomaly detection is a method for locating an out-of-the-ordinary point or pattern in a batch of data. The
term outlier is sometimes used to describe an abnormality. Outliers are data objects that stand out from
the rest of the data collection and do not follow the expected pattern of behaviour. Anomaly detection
is a data science application that incorporates numerous data science tasks, such as classification,
regression and clustering, into a single application. The anomaly in the dataset can be pointed out as
shown in Figure 3:

Anomaly data point


0.6 Normal data point

0.5

0.4

0.3

0.2

0.1

0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

Figure 3: Anomaly of Dataset

5
JGI JAIN
DEEMED-TO-BE UNI VE RSI TY
Financial Analytics

For simplifying the process of anomaly detection, the anomalies are classified into the following three
categories:
 Point anomalies: If a single data point differs significantly from the rest of the data, it is considered
anomalous. Credit card fraud detection based on “amount spent” is an example.
 Contextual anomalies: The abnormality is context-specific since it is dependent on contextual
information to determine if it is the anomaly. In time-series data, this form of anomaly is common.
People, for example, spend a lot of money around the holidays, but it can be different otherwise.
 Collective anomalies: If a group of connected data examples is abnormal when viewed as a whole,
but not as individual values. Example Unexpectedly, someone is attempting to copy data from a
faraway machine to a local host (a potential cyber-attack).

There are various approaches toward anomaly detection in machine learning. Some of them are as
follows:
 Clustering based anomaly detection: Comparable data points tend to belong to similar groups or
clusters, as indicated by their distance from local centroids, under this technique, which focuses on
unsupervised learning.
The k-means technique can be used to divide a dataset into a predetermined number of clusters.
Anomalies are data points that do not fall into one of these clusters.
 Density based anomaly detection: The K-nearest neighbours algorithm is used in this method.
Normal data points always occur near a populated neighbourhood, while abnormal data points
wander far away. You can use Euclidian distance or a similar measure, depending on the sort of data
you have, to determine the closest group of data points.
 Support vector machine-based anomaly detection: Another successful tool for finding
abnormalities is a support vector machine. One-Class SVMs were created for situations when only
one class is known and the difficulty is identifying anything outside of that class. This is known as
novelty detection and it pertains to the automatic detection of unexpected or aberrant phenomena,
often known as outliers, within a huge volume of typical data.
 Supervised deep anomaly detection: When it comes to deep learning, supervised deep anomaly
detection entails using labels from both normal and anomalous data examples to train a deep
supervised binary or multi-class classifier.
For example, multi-class classifier supervised Deep anomaly detection models aid in spotting
unusual brands, forbidden medicine name mentions and fraudulent health-care transactions.
 Semi supervised deep anomaly detection: semi-supervised DAD methods are used more in
comparison to the supervised DAD methods because labels for frequent patterns have been far
easier to acquire than labels for anomalies. Outliers are separated using existing labels from a
single class.
 One-Class Neural Networks: The one-class neural network (OC-NN) approaches are based on kernel-
based one-class classification, which combines deep networks’ capacity to extract progressively
richer representations of data with the one-class goal of constructing a tight envelope around
normal data.

6
UNIT 08: Anomaly Detection JGI JAIN
DEEMED-TO-BE UNI VE RSI TY

Conclusion 8.5 CONCLUSION

 The techniques used to evaluate data to improve efficiency and business gain are referred to as data
analytics.
 The terms ‘Artificial Intelligence’ and Machine Learning came into existence with the rise of
computerised analytical activities.
 Machine learning is a field of artificial intelligence that provides the ability to computers to learn
automatically from data and make a decision without the intervention of humans.
 Machine learning helps computers to identify the pattern easily from the data that is difficult for a
human.
 Anomaly detection (also known as outlier detection) is the recognition of unforeseen events,
inferences or aspects that vary considerably from the standard.
 Anomaly data is usually associated with a problem or an uncommon event, such as hacking, bank
fraud, malfunctioning equipment, structural faults/infrastructural breakdowns or typographical
errors.
 Anomaly detection is critical in the development of reliable distributed software systems.
 Anomaly detection approaches are divided into three categories: unsupervised, semi-supervised
and supervised.
 Network administrators must be able to recognise and respond to changing operational conditions.
 Anomaly detection is critical for extracting critical business insights and ensuring the continuity of
vital processes.
 Any procedure that detects the outliers in a dataset; those objects that do not belong, is known as
anomaly detection.
 Outliers are data objects that stand out from the rest of the data collection and do not follow the
expected pattern of behaviour.

8.6 GLOSSARY

 Data analytics: The techniques used to evaluate data to improve efficiency and business gain
 Machine learning: A field of artificial intelligence that provides the ability for computers to learn
automatically from data and make decision without the intervention of human
 Anomaly detection: It is the recognition of unforeseen events, inferences or aspects that vary
considerably from the standard
 Outliers: Data objects that stand out from the rest of the data collection and do not follow the
expected pattern of behaviour

7
JGI JAINDEEMED-TO-BE UNI VE RSI TY
Financial Analytics

8.7 SELF-ASSESSMENT QUESTIONS

A. Essay Type Questions


1. Explain the concept ‘Anomaly Detection’.
2. Describe the approaches of anomaly detection.
3. Outline the importance of anomaly detection.
4. Define outliers and enlist the three categories of anomalies.
5. Discuss the anomaly detection approaches in machine learning.

8.8 ANSWERS AND HINTS FOR SELF-ASSESSMENT QUESTIONS

A. Answers to Multiple Choice Questions

B. Hints for Essay Type Questions

1. Anomaly detection is the recognition of unforeseen events, inferences or aspects that vary
considerably from the standard. Any sort of anomaly detection, which is frequently applied to
unlabelled data by data scientists in a process known as unsupervised anomaly detection, is based
on two basic assumptions:
 Anomalies in data are extremely infrequent.
 The characteristics of data anomalies differ greatly from those of regular occurrences.
Refer to Section Introduction to Anomaly Detection
2. Anomaly detection approaches are divided into three categories: unsupervised, semi-supervised
and supervised. Essentially, the best anomaly detection approach is determined by the dataset’s
labels. Refer to Section Introduction to Anomaly Detection
3. Network administrators must be able to recognise and respond to changing operational conditions.
Any variations in a data centre or cloud application operational circumstances can indicate
excessive levels of business risk. Some divergences, on the other hand, may indicate good growth.
Refer to Section Importance of Anomaly Detection
4. For simplifying the process of anomaly detection, the anomalies are classified into the following
three categories:
 Point anomalies: If a single data point differs significantly from the rest of the data, it is
considered anomalous. Credit card fraud detection based on “amount spent” is an example.
Refer to Section Practical Implementation in Identifying the Anomaly Detection
5. There are various approaches to anomaly detection in machine learning. Some of them are as
follows:
8
UNIT 08: Anomaly Detection JGI JAIN
DEEMED-TO-BE UNI VE RSI TY

 Clustering based anomaly detection: Comparable data points tend to belong to similar groups
or clusters, as indicated by their distance from local centroids, under this technique, which
focuses on unsupervised learning.
Refer to Section Practical Implementation in Identifying the Anomaly Detection

@ 8.9 POST-UNIT READING MATERIAL

 https://datrics.ai/anomaly-detection-best-practices
 https://journalofbigdata.springeropen.com/articles/10.1186/s40537-020-00320-x
 https://zindi.africa/learn/introduction-to-anomaly-detection-using-machine-learning-with-a- case-
study

8.10 TOPICS FOR DISCUSSION FORUMS

 Make a research about the process of anomaly detection with the help of machine learning and
discuss it with you Classmates. Also, find out the various websites or tools which prove helpful for
the same task of anomaly detection.

You might also like