Docs Slides Lecture15
Docs Slides Lecture15
detection
Problem
motivation
Machine Learning
Anomaly detection example
Aircraft engine features: Dataset:
= heat generated
= vibration intensity New engine:
…
(vibration)
(heat)
Andrew Ng
Density estimation
Dataset:
Is anomalous?
(vibration)
(heat)
Andrew Ng
Anomaly detection example
Fraud detection:
= features of user ’s activities
Model from data.
Identify unusual users by checking which have
Manufacturing
Monitoring computers in a data center.
= features of machine
= memory use, = number of disk accesses/sec,
= CPU load, = CPU load/network traffic.
…
Andrew Ng
Anomaly
detection
Gaussian
distribution
Machine Learning
Gaussian (Normal) distribution
Say . If is a distributed Gaussian with mean , variance .
Andrew Ng
Gaussian distribution example
Andrew Ng
Parameter estimation
Dataset:
Andrew Ng
Anomaly
detection
Algorithm
Machine Learning
Density estimation
Training set:
Each example is
Andrew Ng
Anomaly detection algorithm
1. Choose features that you think might be indicative of
anomalous examples.
2. Fit parameters
Anomaly if
Andrew Ng
Anomaly detection example
Andrew Ng
Anomaly
detection
Developing and
evaluating an anomaly
detection system
Machine Learning
The importance of real-number evaluation
When developing a learning algorithm (choosing features, etc.),
making decisions is much easier if we have a way of evaluating
our learning algorithm.
Assume we have some labeled data, of anomalous and non-
anomalous examples. ( if normal, if anomalous).
Training set: (assume normal examples/not
anomalous)
Cross validation set:
Test set:
Andrew Ng
Aircraft engines motivating example
10000 good (normal) engines
20 flawed engines (anomalous)
Alternative:
Training set: 6000 good engines
CV: 4000 good engines ( ), 10 anomalous ( )
Test: 4000 good engines ( ), 10 anomalous ( )
Andrew Ng
Algorithm evaluation
Fit model on training set
On a cross validation/test example , predict
Andrew Ng
Anomaly
detection
Choosing what
features to use
Machine Learning
Non-gaussian features
Error analysis for anomaly detection
Want large for normal examples .
small for anomalous examples .
Most common problem:
is comparable (say, both large) for normal
and anomalous examples
Monitoring computers in a data center
Choose features that might take on unusually large or
small values in the event of an anomaly.
= memory use of computer
= number of disk accesses/sec
= CPU load
= network traffic
Anomaly
detection
Multivariate
Gaussian distribution
Machine Learning
Motivating example: Monitoring machines in a data center
(Memory Use)
(CPU Load)
(CPU Load)
(Memory Use)
Andrew Ng
Multivariate Gaussian (Normal) distribution
. Don’t model etc. separately.
Model all in one go.
Parameters: (covariance matrix)
Andrew Ng
Multivariate Gaussian (Normal) examples
Andrew Ng
Multivariate Gaussian (Normal) examples
Andrew Ng
Multivariate Gaussian (Normal) examples
Andrew Ng
Multivariate Gaussian (Normal) examples
Andrew Ng
Multivariate Gaussian (Normal) examples
Andrew Ng
Multivariate Gaussian (Normal) examples
Andrew Ng
Anomaly
detection
Anomaly detection using
the multivariate
Gaussian distribution
Machine Learning
Multivariate Gaussian (Normal) distribution
Parameters
Parameter fitting:
Given training set
Andrew Ng
Anomaly detection with the multivariate Gaussian
1. Fit model by setting
Flag an anomaly if
Andrew Ng
Relationship to original model
Original model:
where
Andrew Ng
Original model vs. Multivariate Gaussian