Massive Online Analysis: Manual
Massive Online Analysis: Manual
Analysis
Manual
Albert Bifet and Richard Kirkby
August 2009
Contents
1 Introduction 1
1.1 Data streams Evaluation . . . . . . . . . . . . . . . . . . . . . 2
2 Installation 5
5 Tasks in MOA 13
5.1 WriteStreamToARFFFile . . . . . . . . . . . . . . . . . . . . . 13
5.2 MeasureStreamSpeed . . . . . . . . . . . . . . . . . . . . . . . 13
5.3 LearnModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.4 EvaluateModel . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.5 EvaluatePeriodicHeldOutTest . . . . . . . . . . . . . . . . . . 15
5.6 EvaluateInterleavedTestThenTrain . . . . . . . . . . . . . . . 15
5.7 EvaluatePrequential . . . . . . . . . . . . . . . . . . . . . . . 16
i
CONTENTS
7 Classifiers 29
7.1 Classifiers for static streams . . . . . . . . . . . . . . . . . . . 29
7.1.1 MajorityClass . . . . . . . . . . . . . . . . . . . . . . . 29
7.1.2 Naive Bayes . . . . . . . . . . . . . . . . . . . . . . . . 30
7.1.3 DecisionStump . . . . . . . . . . . . . . . . . . . . . . 30
7.1.4 HoeffdingTree . . . . . . . . . . . . . . . . . . . . . . . 31
7.1.5 HoeffdingTreeNB . . . . . . . . . . . . . . . . . . . . . 32
7.1.6 HoeffdingTreeNBAdaptive . . . . . . . . . . . . . . . 32
7.1.7 HoeffdingOptionTree . . . . . . . . . . . . . . . . . . 33
7.1.8 HoeffdingOptionTreeNB . . . . . . . . . . . . . . . . . 34
7.1.9 HoeffdingTreeOptionNBAdaptive . . . . . . . . . . . 34
7.1.10 OzaBag . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
7.1.11 OzaBoost . . . . . . . . . . . . . . . . . . . . . . . . . . 35
7.1.12 OCBoost . . . . . . . . . . . . . . . . . . . . . . . . . . 35
7.2 Classifiers for evolving streams . . . . . . . . . . . . . . . . . 36
7.2.1 OzaBagASHT . . . . . . . . . . . . . . . . . . . . . . . 36
7.2.2 OzaBagADWIN . . . . . . . . . . . . . . . . . . . . . . 37
7.2.3 SingleClassifierDrift . . . . . . . . . . . . . . . . . . . 39
7.2.4 AdaHoeffdingOptionTree . . . . . . . . . . . . . . . . 40
8 Writing a classifier 41
8.1 Creating a new classifier . . . . . . . . . . . . . . . . . . . . . 41
8.2 Compiling a classifier . . . . . . . . . . . . . . . . . . . . . . . 46
ii
Introduction
1
Massive Online Analysis (MOA) is a software environment for imple-
menting algorithms and running experiments for online learning from
evolving data streams.
1
CHAPTER 1. INTRODUCTION
1. The algorithm is passed the next available example from the stream
(requirement 1).
2
1.1. DATA STREAMS EVALUATION
3
CHAPTER 1. INTRODUCTION
than one million training examples. In the context of data streams this is
disappointing, because to be truly useful at data stream classification the
algorithms need to be capable of handling very large (potentially infinite)
streams of examples. Demonstrating systems only on small amounts of
data does not build a convincing case for capacity to solve more demand-
ing data stream applications.
MOA permits adequately evaluate data stream classification algorithms
on large streams, in the order of tens of millions of examples where possi-
ble, and under explicit memory limits. Any less than this does not actually
test algorithms in a realistically challenging setting.
4
Installation
2
The following manual is based on a Unix/Linux system with Java 5 SDK
or greater installed. Other operating systems such as Microsoft Windows
will be similar but may need adjustments to suit.
MOA needs the following files:
moa.jar
weka.jar
sizeofag.jar
They are available from
http://sourceforge.net/projects/moa-datastream/
http://sourceforge.net/projects/weka/
http://www.jroller.com/resources/m/maxim/sizeofag.jar
These files are needed to run the MOA software from the command
line and the graphical interface:
java -cp .:moa.jar:weka.jar -javaagent:sizeofag.jar moa.DoTask \
LearnModel -l DecisionStumpTutorial \
-s generators.WaveformGenerator -m 1000000 -O model1.moa
5
Using the GUI
3
A graphical user interface for configuring and running tasks is available
with the command:
java -cp .:moa.jar:weka.jar -javaagent:sizeofag.jar \
moa.gui.TaskLauncher
Click ’Configure’ to set up a task, when ready click to launch a task
click ’Run’. Several tasks can be run concurrently. Click on different tasks
in the list and control them using the buttons below. If textual output of a
task is available it will be displayed in the bottom half of the GUI, and can
be saved to disk.
Note that the command line text box displayed at the top of the win-
dow represents textual commands that can be used to run tasks on the
command line as described in the next chapter. The text can be selected
then copied onto the clipboard.
7
CHAPTER 3. USING THE GUI
8
Using the command line
4
In this chapter we are going to show some examples of tasks performed
using the command line.
The first example will command MOA to train the HoeffdingTree
classifier and create a model. The moa.DoTask class is the main class for
running tasks on the command line. It will accept the name of a task fol-
lowed by any appropriate parameters. The first task used is the LearnModel
task. The -l parameter specifies the learner, in this case the HoeffdingTree
class. The -s parameter specifies the stream to learn from, in this case
generators.WaveformGenerator is specified, which is a data stream
generator that produces a three-class learning problem of identifying three
types of waveform. The -m option specifies the maximum number of ex-
amples to train the learner with, in this case one million examples. The -O
option specifies a file to output the resulting model to:
java -cp .:moa.jar:weka.jar -javaagent:sizeofag.jar moa.DoTask \
LearnModel -l HoeffdingTree \
-s generators.WaveformGenerator -m 1000000 -O model1.moa
This will create a file named model1.moa that contains the decision
stump model that was induced during training.
The next example will evaluate the model to see how accurate it is on
a set of examples that are generated using a different random seed. The
EvaluateModel task is given the parameters needed to load the model
produced in the previous step, generate a new waveform stream with a
random seed of 2, and test on one million examples:
java -cp .:moa.jar:weka.jar -javaagent:sizeofag.jar moa.DoTask \
"EvaluateModel -m file:model1.moa \
-s (generators.WaveformGenerator -i 2) -i 1000000"
9
CHAPTER 4. USING THE COMMAND LINE
Note the the above two steps can be achieved by rolling them into one,
avoiding the need to create an external file, as follows:
java -cp .:moa.jar:weka.jar -javaagent:sizeofag.jar moa.DoTask \
"EvaluateModel -m (LearnModel -l HoeffdingTree \
-s generators.WaveformGenerator -m 1000000) \
-s (generators.WaveformGenerator -i 2) -i 1000000"
10
4.1. COMPARING TWO CLASSIFIERS
11
CHAPTER 4. USING THE COMMAND LINE
100
DecisionStumpTutorial
HoeffdingTree
80
60
% correct
40
20
0
1e+06 2e+06 3e+06 4e+06 5e+06 6e+06 7e+06 8e+06 9e+06 1e+07
examples processed
For this problem it is obvious that a full tree can achieve higher accu-
racy than a single stump, and that a stump has very stable accuracy that
does not improve with further training.
12
5
Tasks in MOA
5.1 WriteStreamToARFFFile
Outputs a stream to an ARFF file. Example:
java -cp .:moa.jar:weka.jar -javaagent:sizeofag.jar moa.DoTask \
"WriteStreamToARFFFile -s generators.WaveformGenerator \
-f Wave.arff -m 100000"
Parameters:
• -s : Stream to write
• -f : Destination ARFF file
• -m : Maximum number of instances to write to file
• -h : Suppress header from output
5.2 MeasureStreamSpeed
Measures the speed of a stream. Example:
java -cp .:moa.jar:weka.jar -javaagent:sizeofag.jar moa.DoTask \
"MeasureStreamSpeed -s generators.WaveformGenerator \
-g 100000"
Parameters:
• -s : Stream to measure
• -g : Number of examples
• -O : File to save the final result of the task to
13
CHAPTER 5. TASKS IN MOA
5.3 LearnModel
Learns a model from a stream. Example:
Parameters:
• -l : Classifier to train
5.4 EvaluateModel
Evaluates a static model on a stream. Example:
Parameters:
• -l : Classifier to evaluate
• -s : Stream to evaluate on
14
5.5. EVALUATEPERIODICHELDOUTTEST
5.5 EvaluatePeriodicHeldOutTest
Evaluates a classifier on a stream by periodically testing on a heldout set.
Example:
java -cp .:moa.jar:weka.jar -javaagent:sizeofag.jar moa.DoTask \
"EvaluatePeriodicHeldOutTest -l HoeffdingTree \
-s generators.WaveformGenerator \
-n 100000 -i 100000000 -f 1000000" > htresult.csv
Parameters:
• -l : Classifier to train
5.6 EvaluateInterleavedTestThenTrain
Evaluates a classifier on a stream by testing then training with each exam-
ple in sequence. Example:
java -cp .:moa.jar:weka.jar -javaagent:sizeofag.jar moa.DoTask \
"EvaluateInterleavedTestThenTrain -l HoeffdingTree \
-s generators.WaveformGenerator \
-i 100000000 -f 1000000" > htresult.csv
Parameters:
• -l : Classifier to train
15
CHAPTER 5. TASKS IN MOA
5.7 EvaluatePrequential
Evaluates a classifier on a stream by testing then training with each exam-
ple in sequence. It may use a sliding window or a fading factor forgetting
mechanism.
This evaluation method using sliding windows and a fading factor was
presented in
[C] João Gama, Raquel Sebastião and Pedro Pereira Rodrigues. Issues in
evaluation of stream learning algorithms. In KDD’09, pages 329–338.
Si
Ei =
Bi
with
Si = Li + α × Si−1
Bi = ni + α × Bi−1
where ni is the number of examples used to compute the loss function Li .
ni = 1 since the loss Li is computed for every single example.
Examples:
16
5.7. EVALUATEPREQUENTIAL
Parameters:
– WindowClassificationPerformanceEvaluator
– FadingFactorClassificationPerformanceEvaluator
– EWMAFactorClassificationPerformanceEvaluator
17
Evolving data streams
6
MOA streams are build using generators, reading ARFF files, joining sev-
eral streams, or filtering streams. MOA streams generators allow to simu-
late potentially infinite sequence of data. There are the following :
• Random Tree Generator
• SEA Concepts Generator
• STAGGER Concepts Generator
• Rotating Hyperplane
• Random RBF Generator
• LED Generator
• Waveform Generator
• Function Generator
.
6.1 Streams
Classes available in MOA to obtain input streams are the following:
6.1.1 ArffFileStream
A stream read from an ARFF file. Example:
ArffFileStream -f elec.arff
Parameters:
• -f : ARFF file to load
• -c : Class index of data. 0 for none or -1 for last attribute in file
19
CHAPTER 6. EVOLVING DATA STREAMS
f(t)
1 f(t)
α
0.5
α
t
t0
W
6.1.2 ConceptDriftStream
Generator that adds concept drift to examples in a stream.
Considering data streams as data generated from pure distributions,
MOA models a concept drift event as a weighted combination of two pure
distributions that characterizes the target concepts before and after the
drift. MOA uses the sigmoid function, as an elegant and practical solution
to define the probability that every new instance of the stream belongs to
the new concept after the drift.
We see from Figure 6.1 that the sigmoid function
f(t) = 1/(1 + e−s(t−t0 ) )
has a derivative at the point t0 equal to f 0 (t0 ) = s/4. The tangent of angle
α is equal to this derivative, tan α = s/4. We observe that tan α = 1/W,
and as s = 4 tan α then s = 4/W. So the parameter s in the sigmoid gives
the length of W and the angle α. In this sigmoid model we only need
to specify two parameters : t0 the point of change, and W the length of
change. Note that for any positive real number β
f(t0 + β · W) = 1 − f(t0 − β · W),
and that f(t0 +β·W) and f(t0 −β·W) are constant values that don’t depend
on t0 and W:
f(t0 + W/2) = 1 − f(t0 − W/2) = 1/(1 + e−2 ) ≈ 88.08%
f(t0 + W) = 1 − f(t0 − W) = 1/(1 + e−4 ) ≈ 98.20%
f(t0 + 2W) = 1 − f(t0 − 2W) = 1/(1 + e−8 ) ≈ 99.97%
20
6.1. STREAMS
Example:
ConceptDriftStream -s (generators.AgrawalGenerator -f 7)
-d (generators.AgrawalGenerator -f 2) -w 1000000 -p 900000
Parameters:
• -s : Stream
6.1.3 ConceptDriftRealStream
Generator that adds concept drift to examples in a stream with different
classes and attributes. Example: real datasets
Example:
Parameters:
• -s : Stream
21
CHAPTER 6. EVOLVING DATA STREAMS
6.1.4 FilteredStream
A stream that is filtered.
Parameters:
• -s : Stream to filter
6.1.5 AddNoiseFilter
Adds random noise to examples in a stream. Only to use with Filtered-
Stream.
Parameters:
6.2.1 generators.AgrawalGenerator
Generates one of ten different pre-defined loan functions
It was introduced by Agrawal et al. in
[A] R. Agrawal, T. Imielinski, and A. Swami. Database mining: A per-
formance perspective. IEEE Trans. on Knowl. and Data Eng., 5(6):914–
925, 1993.
It was a common source of data for early work on scaling up decision
tree learners. The generator produces a stream containing nine attributes,
six numeric and three categorical. Although not explicitly stated by the
authors, a sensible conclusion is that these attributes describe hypothetical
loan applications. There are ten functions defined for generating binary
class labels from the attributes. Presumably these determine whether the
loan should be approved.
A public C source code is available. The built in functions are based on
the paper (page 924), which turn out to be functions pred20 thru pred29
22
6.2. STREAMS GENERATORS
6.2.2 generators.HyperplaneGenerator
Generates a problem of predicting class of a rotating hyperplane.
It was used as testbed for CVFDT versus VFDT in
X
d X
d
wi x i = w0 = wi
i=1 i=1
P
where xi , is the ith coordinate of x. Examples for which di=1 wi xi ≥ w0
P
are labeled positive, and examples for which di=1 wi xi < w0 are labeled
negative. Hyperplanes are useful for simulating time-changing concepts,
because we can change the orientation and position of the hyperplane in a
smooth manner by changing the relative size of the weights. We introduce
change to this dataset adding drift to each weight attribute wi = wi + dσ,
where σ is the probability that the direction of change is reversed and d is
the change applied to every example.
Parameters:
23
CHAPTER 6. EVOLVING DATA STREAMS
6.2.3 generators.LEDGenerator
Generates a problem of predicting the digit displayed on a 7-segment LED
display.
This data source originates from the CART book. An implementation
in C was donated to the UCI machine learning repository by David Aha.
The goal is to predict the digit displayed on a seven-segment LED display,
where each attribute has a 10% chance of being inverted. It has an optimal
Bayes classification rate of 74%. The particular configuration of the gener-
ator used for experiments (led) produces 24 binary attributes, 17 of which
are irrelevant.
Parameters:
6.2.4 generators.LEDGeneratorDrift
Generates a problem of predicting the digit displayed on a 7-segment LED
display with drift.
Parameters:
24
6.2. STREAMS GENERATORS
6.2.5 generators.RandomRBFGenerator
Generates a random radial basis function stream.
This generator was devised to offer an alternate complex concept type
that is not straightforward to approximate with a decision tree model. The
RBF (Radial Basis Function) generator works as follows: A fixed number
of random centroids are generated. Each center has a random position, a
single standard deviation, class label and weight. New examples are gen-
erated by selecting a center at random, taking weights into consideration
so that centers with higher weight are more likely to be chosen. A random
direction is chosen to offset the attribute values from the central point.
The length of the displacement is randomly drawn from a Gaussian dis-
tribution with standard deviation determined by the chosen centroid. The
chosen centroid also determines the class label of the example. This effec-
tively creates a normally distributed hypersphere of examples surround-
ing each central point with varying densities. Only numeric attributes are
generated.
Parameters:
6.2.6 generators.RandomRBFGeneratorDrift
Generates a random radial basis function stream with drift. Drift is intro-
duced by moving the centroids with constant speed.
Parameters:
25
CHAPTER 6. EVOLVING DATA STREAMS
6.2.7 generators.RandomTreeGenerator
Generates a stream based on a randomly generated tree.
This generator is based on that proposed in
26
6.2. STREAMS GENERATORS
• -l: The first level of the tree above maxTreeDepth that can have leaves
6.2.8 generators.SEAGenerator
Generates SEA concepts functions. This dataset contains abrupt concept
drift, first introduced in paper:
It is generated using three attributes, where only the two first attributes are
relevant. All three attributes have values between 0 and 10. The points of
the dataset are divided into 4 blocks with different concepts. In each block,
the classification is done using f1 + f2 ≤ θ, where f1 and f2 represent the
first two attributes and θ is a threshold value. The most frequent values
are 9, 8, 7 and 9.5 for the data blocks.
Parameters:
6.2.9 generators.STAGGERGenerator
Generates STAGGER Concept functions. They were introduced by Schlim-
mer and Granger in
27
CHAPTER 6. EVOLVING DATA STREAMS
6.2.10 generators.WaveformGenerator
Generates a problem of predicting one of three waveform types.
It shares its origins with LED, and was also donated by David Aha to
the UCI repository. The goal of the task is to differentiate between three
different classes of waveform, each of which is generated from a combi-
nation of two or three base waves. The optimal Bayes classification rate is
known to be 86%. There are two versions of the problem, wave21 which
has 21 numeric attributes, all of which include noise, and wave40 which
introduces an additional 19 irrelevant attributes.
Parameters:
6.2.11 generators.WaveformGeneratorDrift
Generates a problem of predicting one of three waveform types with drift.
Parameters:
28
7
Classifiers
• Naive Bayes
• Decision Stump
• Hoeffding Tree
• Bagging
• Boosting
29
CHAPTER 7. CLASSIFIERS
Y
k
∼
Pr[C = c|I] = Pr[xi = vi |C = c]
i=1
Yk
Pr[xi = vi ∧ C = c]
= Pr[C = c] ·
i=1
Pr[C = c]
7.1.3 DecisionStump
Decision trees of one level.
Parameters:
• -g : The number of instances to observe between model changes
• -b : Only allow binary splits
• -c : Split criterion to use. Example : InfoGainSplitCriterion
• -r : Seed for random behaviour of the classifier
30
7.1. CLASSIFIERS FOR STATIC STREAMS
7.1.4 HoeffdingTree
Decision tree for streaming data.
A Hoeffding tree is an incremental, anytime decision tree induction al-
gorithm that is capable of learning from massive data streams, assuming
that the distribution generating examples does not change over time. Ho-
effding trees exploit the fact that a small sample can often be enough to
choose an optimal splitting attribute. This idea is supported mathemat-
ically by the Hoeffding bound, which quantifies the number of observa-
tions (in our case, examples) needed to estimate some statistics within a
prescribed precision (in our case, the goodness of an attribute). More pre-
cisely, the Hoeffding bound states that with probability 1−δ, the true mean
of a random variable of range R will not differ from the estimated mean
after n independent observations by more than:
r
R2 ln(1/δ)
= .
2n
A theoretically appealing feature of Hoeffding Trees not shared by other
incremental decision tree learners is that it has sound guarantees of per-
formance. Using the Hoeffding bound one can show that its output is
asymptotically nearly identical to that of a non-incremental learner using
infinitely many examples. See for details:
Parameters:
31
CHAPTER 7. CLASSIFIERS
• -p : Disable pre-pruning
7.1.5 HoeffdingTreeNB
Decision tree for streaming data with Naive Bayes classification at leaves.
7.1.6 HoeffdingTreeNBAdaptive
Decision tree for streaming data with adaptive Naive Bayes classification
at leaves. This adaptive Naive Bayes prediction method monitors the error
rate of majority class and Naive Bayes decisions in every leaf, and chooses
to employ Naive Bayes decisions only where they have been more accu-
rate in past cases.
32
7.1. CLASSIFIERS FOR STATIC STREAMS
7.1.7 HoeffdingOptionTree
Decision option tree for streaming data
Hoeffding Option Trees are regular Hoeffding trees containing additional
option nodes that allow several tests to be applied, leading to multiple
Hoeffding trees as separate paths. They consist of a single structure that
efficiently represents multiple trees. A particular example can travel down
multiple paths of the tree, contributing, in different ways, to different op-
tions.
See for details:
Parameters:
33
CHAPTER 7. CLASSIFIERS
7.1.8 HoeffdingOptionTreeNB
Decision option tree for streaming data with Naive Bayes classification at
leaves.
Parameters:
• Same parameters as HoeffdingOptionTree
• -q : The number of instances a leaf should observe before permitting
Naive Bayes
7.1.9 HoeffdingTreeOptionNBAdaptive
Decision option tree for streaming data with adaptive Naive Bayes classi-
fication at leaves. This adaptive Naive Bayes prediction method monitors
the error rate of majority class and Naive Bayes decisions in every leaf,
and chooses to employ Naive Bayes decisions only where they have been
more accurate in past cases.
Parameters:
• Same parameters as HoeffdingOptionTreeNB
7.1.10 OzaBag
Incremental on-line bagging of Oza and Russell.
Oza and Russell developed online versions of bagging and boosting
for Data Streams. They show how the process of sampling bootstrap repli-
cates from training data can be simulated in a data stream context. They
observe that the probability that any individual example will be chosen
for a replicate tends to a Poisson(1) distribution.
34
7.1. CLASSIFIERS FOR STATIC STREAMS
Parameters:
• -l : Classifier to train
• -s : The number of models in the bag
7.1.11 OzaBoost
Incremental on-line boosting of Oza and Russell.
See details in:
For the boosting method, Oza and Russell note that the weighting pro-
cedure of AdaBoost actually divides the total example weight into two
halves – half of the weight is assigned to the correctly classified examples,
and the other half goes to the misclassified examples. They use the Poisson
distribution for deciding the random probability that an example is used
for training, only this time the parameter changes according to the boost-
ing weight of the example as it is passed through each model in sequence.
Parameters:
• -l : Classifier to train
• -s : The number of models to boost
• -p : Boost with weights only; no poisson
7.1.12 OCBoost
Online Coordinate Boosting.
Pelossof et al. presented Online Coordinate Boosting, a new online
boosting algorithm for adapting the weights of a boosted classifier, which
yields a closer approximation to Freund and Schapire’s AdaBoost algo-
rithm. The weight update procedure is derived by minimizing AdaBoost’s
loss when viewed in an incremental form. This boosting method may be
reduced to a form similar to Oza and Russell’s algorithm.
See details in:
35
CHAPTER 7. CLASSIFIERS
[PJ] Raphael Pelossof, Michael Jones, Ilia Vovsha, and Cynthia Rudin.
Online coordinate boosting. 2008.
Example:
OCBoost -l HoeffdingTreeNBAdaptive -e 0.5
Parameters:
• -l : Classifier to train
• -e : Smoothing parameter
• after one node splits, if the number of split nodes of the ASHT tree
is higher than the maximum value, then it deletes some nodes to
reduce its size
36
7.2. CLASSIFIERS FOR EVOLVING STREAMS
• delete all the nodes of the tree, i.e., restart from a new root.
The maximum allowed size for the n-th ASHT tree is twice the maxi-
mum allowed size for the (n − 1)-th tree. Moreover, each tree has a weight
proportional to the inverse of the square of its error, and it monitors its er-
ror with an exponential weighted moving average (EWMA) with α = .01.
The size of the first tree is 2.
With this new method, it is attempted to improve bagging performance
by increasing tree diversity. It has been observed that boosting tends to
produce a more diverse set of classifiers than bagging, and this has been
cited as a factor in increased performance.
See more details in:
Parameters:
7.2.2 OzaBagADWIN
Bagging using ADWIN. ADWIN is a change detector and estimator that solves
in a well-specified way the problem of tracking the average of a stream
of bits or real-valued numbers. ADWIN keeps a variable-length window
of recently seen items, with the property that the window has the maxi-
mal length statistically consistent with the hypothesis “there has been no
change in the average value inside the window”.
More precisely, an older fragment of the window is dropped if and
only if there is enough evidence that its average value differs from that
37
CHAPTER 7. CLASSIFIERS
of the rest of the window. This has two consequences: one, that change
reliably declared whenever the window shrinks; and two, that at any time
the average over the existing window can be reliably taken as an estima-
tion of the current average in the stream (barring a very small or very re-
cent change that is still not statistically visible). A formal and quantitative
statement of these two points (a theorem) appears in
[BG07c] Albert Bifet and Ricard Gavaldà. Learning from time-changing data
with adaptive windowing. In SIAM International Conference on Data
Mining, 2007.
Example:
OzaBagAdwin -l HoeffdingTreeNBAdaptive -s 10
Parameters:
• -l : Classifier to train
38
7.2. CLASSIFIERS FOR EVOLVING STREAMS
7.2.3 SingleClassifierDrift
Class for handling concept drift datasets with a wrapper on a classifier.
The drift detection method (DDM) proposed by Gama et al. controls
the number of errors produced by the learning model during prediction.
It compares the statistics of two windows: the first one contains all the
data, and the second one contains only the data from the beginning until
the number of errors increases. Their method doesn’t store these windows
in memory. It keeps only statistics and a window of recent errors.
They consider that the number of errors in a sample of examples is
modeled by a binomial distribution. A significant increase in the error of
the algorithm, suggests that the class distribution is changing and, hence,
the actual decision model is supposed to be inappropriate. They check for
a warning level and a drift level. Beyond these levels, change of context is
considered.
The number of errors in a sample of n examples is modeled by a bino-
mial distribution. For each point i in the sequence that is being sampled,
the error rate is thep
probability of misclassifying (pi ), with standard devia-
tion given by si = pi (1 − pi )/i. A significant increase in the error of the
algorithm, suggests that the class distribution is changing and, hence, the
actual decision model is supposed to be inappropriate. Thus, they store
the values of pi and si when pi + si reaches its minimum value during
the process (obtaining ppmin and smin ). And it checks when the following
conditions trigger:
• pi + si ≥ pmin + 2 · smin for the warning level. Beyond this level, the
examples are stored in anticipation of a possible change of context.
• pi + si ≥ pmin + 3 · smin for the drift level. Beyond this level, the
model induced by the learning method is reset and a new model is
learnt using the examples stored since the warning level triggered.
39
CHAPTER 7. CLASSIFIERS
Example:
SingleClassifierDrift -d EDDM -l HoeffdingTreeNBAdaptive
Parameters:
• -l : Classifier to train
7.2.4 AdaHoeffdingOptionTree
Adaptive decision option tree for streaming data with adaptive Naive
Bayes classification at leaves.
An Adaptive Hoeffding Option Tree is a Hoeffding Option Tree with the
following improvement: each leaf stores an estimation of the current error.
It uses an EWMA estimator with α = .2. The weight of each node in the
voting process is proportional to the square of the inverse of the error.
Example:
AdaHoeffdingOptionTree -o 50
Parameters:
40
Writing a classifier
8
8.1 Creating a new classifier
To demonstrate the implementation and operation of learning algorithms
in the system, the Java code of a simple decision stump classifier is studied.
The classifier monitors the result of splitting on each attribute and chooses
the attribute the seems to best separate the classes, based on information
gain. The decision is revisited many times, so the stump has potential to
change over time as more examples are seen. In practice it is unlikely to
change after sufficient training.
To describe the implementation, relevant code fragments are discussed
in turn, with the entire code listed (Listing 8.7) at the end. The line num-
bers from the fragments match up with the final listing.
A simple approach to writing a classifier is to extend
moa.classifiers.AbstractClassifier (line 10), which will take
care of certain details to ease the task.
41
CHAPTER 8. WRITING A CLASSIFIER
is a short name used to identify the option. The second is a character in-
tended to be used on the command line. It should be unique—a command
line character cannot be repeated for different options otherwise an excep-
tion will be thrown. The third standard parameter is a string describing
the purpose of the option. Additional parameters to option constructors
allow things such as default values and valid ranges to be specified.
The first option specified for the decision stump classifier is the “grace
period”. The option is expressed with an integer, so the option has the type
IntOption. The parameter will control how frequently the best stump is
reconsidered when learning from a stream of examples. This increases
the efficiency of the classifier—evaluating after every single example is ex-
pensive, and it is unlikely that a single example will change the decision of
the current best stump. The default value of 1000 means that the choice of
stump will be re-evaluated only after 1000 examples have been observed
since the last evaluation. The last two parameters specify the range of val-
ues that are allowed for the option—it makes no sense to have a negative
grace period, so the range is restricted to integers 0 or greater.
The second option is a flag, or a binary switch, represented by a
FlagOption. By default all flags are turned off, and will be turned on
only when a user requests so. This flag controls whether the decision
stumps should only be allowed to split two ways. By default the stumps
are allowed have more than two branches.
The third option determines the split criterion that is used to decide
which stumps are the best. This is a ClassOption that requires a particu-
lar Java class of the type SplitCriterion. If the required class happens
to be an OptionHandler then those options will be used to configure the
object that is passed in.
Four global variables are used to maintain the state of the classifier.
The bestSplit field maintains the current stump that has been cho-
sen by the classifier. It is of type AttributeSplitSuggestion, a class
used to split instances into different subsets.
The observedClassDistribution field remembers the overall dis-
tribution of class labels that have been observed by the classifier. It is of
42
8.1. CREATING A NEW CLASSIFIER
This function is called before any learning begins, so it should set the
default state when no information has been supplied, and no training ex-
amples have been seen. In this case, the four global fields are set to sensible
defaults.
This is the main function of the learning algorithm, called for every
training example in a stream. The first step, lines 47-48, updates the over-
all recorded distribution of classes. The loop from line 49 to line 59 repeats
for every attribute in the data. If no observations for a particular attribute
have been seen previously, then lines 53-55 create a new observing object.
Lines 57-58 update the observations with the values from the new exam-
43
CHAPTER 8. WRITING A CLASSIFIER
ple. Lines 60-61 check to see if the grace period has expired. If so, the best
split is re-evaluated.
Listing 8.5: Functions used during training
79 p r o t e c t e d A t t r i b u t e C l a s s O b s e r v e r newNominalClassObserver ( ) {
80 r e t u r n new NominalAttributeClassObserver ( ) ;
81 }
82
83 p r o t e c t e d A t t r i b u t e C l a s s O b s e r v e r newNumericClassObserver ( ) {
84 r e t u r n new GaussianNumericAttributeClassObserver ( ) ;
85 }
86
87 protected AttributeSplitSuggestion f i n d B e s t S p l i t ( S p l i t C r i t e r i o n c r i t e r i o n ) {
88 A t t r i b u t e S p l i t S u g g e s t i o n bestFound = n u l l ;
89 double b e s t M e r i t = Double . NEGATIVE INFINITY ;
90 double [ ] p r e S p l i t D i s t = t h i s . o b s e r v e d C l a s s D i s t r i b u t i o n . getArrayCopy ( ) ;
91 f o r ( i n t i = 0 ; i < t h i s . a t t r i b u t e O b s e r v e r s . s i z e ( ) ; i ++) {
92 A t t r i b u t e C l a s s O b s e r v e r obs = t h i s . a t t r i b u t e O b s e r v e r s . g e t ( i ) ;
93 i f ( obs ! = n u l l ) {
94 AttributeSplitSuggestion suggestion =
95 obs . g e t B e s t E v a l u a t e d S p l i t S u g g e s t i o n (
96 criterion ,
97 preSplitDist ,
98 i,
99 this . binarySplitsOption . i s S e t ( ) ) ;
100 i f ( suggestion . merit > bestMerit ) {
101 bestMerit = suggestion . merit ;
102 bestFound = s u g g e s t i o n ;
103 }
104 }
105 }
106 r e t u r n bestFound ;
107 }
44
8.1. CREATING A NEW CLASSIFIER
45
CHAPTER 8. WRITING A CLASSIFIER
87 protected AttributeSplitSuggestion f i n d B e s t S p l i t ( S p l i t C r i t e r i o n c r i t e r i o n ) {
88 A t t r i b u t e S p l i t S u g g e s t i o n bestFound = n u l l ;
89 double b e s t M e r i t = Double . NEGATIVE INFINITY ;
90 double [ ] p r e S p l i t D i s t = t h i s . o b s e r v e d C l a s s D i s t r i b u t i o n . getArrayCopy ( ) ;
91 f o r ( i n t i = 0 ; i < t h i s . a t t r i b u t e O b s e r v e r s . s i z e ( ) ; i ++) {
92 A t t r i b u t e C l a s s O b s e r v e r obs = t h i s . a t t r i b u t e O b s e r v e r s . g e t ( i ) ;
93 i f ( obs ! = n u l l ) {
94 AttributeSplitSuggestion suggestion =
95 obs . g e t B e s t E v a l u a t e d S p l i t S u g g e s t i o n (
96 criterion ,
97 preSplitDist ,
98 i,
99 this . binarySplitsOption . i s S e t ( ) ) ;
100 i f ( suggestion . merit > bestMerit ) {
101 bestMerit = suggestion . merit ;
102 bestFound = s u g g e s t i o n ;
103 }
104 }
105 }
106 r e t u r n bestFound ;
107 }
108
109 public void g et M od e lD e sc r ip t io n ( S t r i n g B u i l d e r out , i n t i n d e n t ) {
110 }
111
112 p r o t e c t e d moa . c o r e . Measurement [ ] getModelMeasurementsImpl ( ) {
113 return null ;
114 }
115
116 }
DecisionStumpTutorial.java
moa.jar
weka.jar
sizeofag.jar
The example source code can be compiled with the following com-
mand:
mkdir moa
mkdir moa/classifiers
cp DecisionStumpTutorial.class moa/classifiers/
46
Bi-directional interface with WEKA
9
Now, it is easy to use MOA classifiers and streams from WEKA, and WEKA
classifiers from MOA. The main difference between using incremental clas-
sifiers in WEKA and in MOA will be the evaluation method used.
To use the Weka classifiers from MOA it is necessary to use one of the
following classes:
47
CHAPTER 9. BI-DIRECTIONAL INTERFACE WITH WEKA
9.1.1 WekaClassifier
A classifier to use classifiers from WEKA.
Wi W W
T T
WEKAClassifier builds a model of W instances every T instances only
for non incremental methods. For WEKA incremental methods the WEKA
classifier is trained for every instance.
Example:
WekaClassifier -l weka.classifiers.trees.J48
-w 10000 -i 1000 -f 100000
Parameters:
• -l : Classifier to train
9.1.2 SingleClassifierDrift
Class for handling concept drift datasets with a wrapper on a classifier.
The drift detection method (DDM) proposed by Gama et al. controls
the number of errors produced by the learning model during prediction.
It compares the statistics of two windows: the first one contains all the
data, and the second one contains only the data from the beginning until
the number of errors increases. Their method doesn’t store these windows
in memory. It keeps only statistics and a window of recent errors.
They consider that the number of errors in a sample of examples is
modeled by a binomial distribution. A significant increase in the error of
the algorithm, suggests that the class distribution is changing and, hence,
the actual decision model is supposed to be inappropriate. They check for
a warning level and a drift level. Beyond these levels, change of context is
considered.
48
9.1. WEKA CLASSIFIERS FROM MOA
• pi + si ≥ pmin + 2 · smin for the warning level. Beyond this level, the
examples are stored in anticipation of a possible change of context.
• pi + si ≥ pmin + 3 · smin for the drift level. Beyond this level, the
model induced by the learning method is reset and a new model is
learnt using the examples stored since the warning level triggered.
Example:
SingleClassifierDrift -d EDDM
-l weka.classifiers.bayes.NaiveBayesUpdateable
Parameters:
• -l : Classifier to train
49
CHAPTER 9. BI-DIRECTIONAL INTERFACE WITH WEKA
You can use MOA streams within the WEKA framework using the
weka.datagenerators.classifiers.classification.MOA data gen-
erator. For example:
weka.datagenerators.classifiers.classification.MOA
-B moa.streams.generators.LEDGenerator
50
9.2. MOA CLASSIFIERS FROM WEKA
3. Restart WEKA from the MOA project, e.g., using the ”run-explorer”
target of the ANT build file.
51