Bagging and Boosting: 9.520 Class 10, 13 March 2006 Sasha Rakhlin
Bagging and Boosting: 9.520 Class 10, 13 March 2006 Sasha Rakhlin
• for regression
T
X
f¯(x) = fi(x),
i=1
the average of fi for i = 1, ..., T ;
• for classification
T
X
f¯(x) = sign( fi(x))
i=1
or the majority vote
T
X
f¯(x) = sign( sign(fi(x)))
i=1
Variation I: Sub-sampling methods
Let
Z
I[f ] = (f (x) − y)2p(x, y)dxdy
0.5
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Adaboost (Freund and Schapire, 1996)
Pn
2. Compute the weighted error ²t = i=1 wt (i)I(yi 6= ft (xi ));
³ ´
1−²t
3. Compute the importance of ft as αt = 1/2 ln ²t ;
Solution to a)?
Gradient descent view of boosting
P
1 n
Let Cφ(g) = n i=1 φ(yi g(xi )). We wish to find ft ∈ F
to add to g such that Cφ(g + ²ft) decreases. The desired
direction is −5Cφ(g). We choose the new function ft such
that it has the greatest inner product with − 5 Cφ, i.e. it
maximizes
− < 5Cφ(g), ft > .
Gradient descent view of boosting