Adaboost: Derek Hoiem March 31, 2004
Adaboost: Derek Hoiem March 31, 2004
Derek Hoiem
Adaboost Algorithm
Theory/Interpretations
Practical Issues
Simple to implement
Bootstrapping
Bagging
For i = 1 .. M
Draw n*<n samples from D with replacement
Learn classifier Ci
x
P1 x x P3
x x Wx1 = 5/13, Wo1 = 0/13
o
x o o Wx2 = 1/13, Wo2 = 2/13
x o o
Wx3 = 1/13, Wo3 = 4/13
P o
2
Z = 2 (0 + sqrt(2)/13 + 2/13) = .525
Smoothing Predictions
Equivalent to adding prior in partitioning case
Confidence bound by
Justification for the Error Function
Adaboost minimizes:
Proof:
Misinterpretations of the
Probabilistic Interpretation
Lemma 1 applies to the true distribution
Only applies to the training set
Note that P(y|x) = {0, 1} for the training set in most
cases
0, 0 0, 1 P(o | 0,0) = 1
o x P(x | 0,1) = 1
x1 P(x | 1,0) = 1
P(o | 1,1) = 1
x o
1, 0 1, 1
x2
1/20
1/8
To loose to be of 1/2 1/4
practical value:
Maximizing the margin…
But Adaboost doesn’t necessarily maximize the margin
on the test set (Ratsch)
Ratsch proposes an algorithm (Adaboost*) that does so
Adaboost and Noisy Data
Examples with the largest gap between the label and the
classifier prediction get the most weight
GentleBoost
Update is fm(x) = P(y=1 | x) – P(y=0 | x) instead of
Bounded [0 1]
Comparison (Friedman)
Example:
Weak Learner: stubs WL DimVC = 2
Input space: RN Strong DimVC = N+1
N+1 partitions can have arbitrary confidences
assigned
Complexity of the Weak Learner
Complexity of the Weak Learner
80.0%
75.0%
50.0%
Trees - 2 bins (L2)
45.0%
40.0%
35.0%
5 10 15 20 25 30
Num Iterations
Jin, Liu, etc (CMU) – “A New Boosting Algorithm Using Input-Dependent Regularizer”
Schapire, Freund, etc – “Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods”