Skip to content

Conversation

@Craigacp
Copy link
Member

Description

Adds a configurable data source for anomaly detection demos and testing. Updated the anomaly detection tutorial to use it.

Motivation

The reproducibility system we're adding in 4.2 should be able to reproduce all the tutorial models, and unfortunately several of those models are based on static data generators which would require special casing in the reproducibility system. Additionally those data generators aren't flexible enough to be useful demos or sufficiently complex to stress more complicated algorithms (both in terms of statistical complexity and runtime speed). This PR is the first in a series which will add configurable data sources which generate data for anomaly detection, clustering, multi-class classification and multi-label classification. We'll update some test code and any tutorials which depend upon the static data generators to use the new configurable data generators. The old static generators won't go away, they still are useful for unit testing basic feature handling functionality where it's useful to have something extremely simple.

Regression already has two such configurable generators added in 4.1 as part of the test harness for the output scaling feature which will remain unchanged.

@Craigacp Craigacp added the Oracle employee This PR is from an Oracle employee label Aug 17, 2021
Copy link
Member

@jhalexand jhalexand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Oracle employee This PR is from an Oracle employee

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants