0% found this document useful (0 votes)

14 views

Boosting Mit

This document discusses an introduction to boosting algorithms including generative vs discriminative modeling, boosting, alternating decision trees, boosting and overfitting, and applications of boosting. The key points covered are generative vs discriminative approaches, how boosting works by combining multiple weak learners to create a strong learner, and how boosting can help address overfitting issues.

Uploaded by

Cyrus Ray

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

Boosting Mit

Uploaded by

Cyrus Ray

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Plan of talk

• Generative vs. non-generative modeling

An Introduction to Boosting • Boosting
• Alternating decision trees
• Boosting and over-fitting
• Applications

1
Toy Example Generative modeling
• Computer receives telephone call
• Measures Pitch of voice mean1 mean2

Probability
• Decides gender of caller
var1 var2

Male
Human
Voice
Female Voice Pitch

2
Discriminative approach Ill-behaved data
No. of mistakes

mean1
mean2

of mistakes
Probability
No.
Voice Pitch Voice Pitch

3
Traditional Statistics vs.
Comparison of methodologies
Machine Learning
Model Generative Discriminative
Machine Learning
Goal Probability Classification rule
estimates
Estimated Decision Predictions
Data Actions Performance Likelihood Misclassification rate
Statistics world state Theory
measure
Mismatch Outliers Misclassifications
problems

4
No
n-n sum
Bin

ega to
Fea

A weak learner The boosting process

ary

tiv 1
tur

ew
ev

lab
ect

eig
el
or

hts

A weak rule weak learner h1

(x1,y1,1/n), … (xn,yn,1/n)
(x1,y1,w1),(x2,y2,w2) … (xn,yn,wn)
weak learner h
Weighted
weak learner h2
training set (x1,y1,w1), … (xn,yn,wn)

h3
instances labels (x1,y1,w1), … (xn,yn,wn)
(x1,y1,w1), … (xn,yn,wn)
h4h5
h (x1,y1,w1), … (xn,yn,wn)
(x1,y1,w1), … (xn,yn,wn)
h6h7
x1,x2,x3,…,xn y1,y2,y3,…,yn (x1,y1,w1), … (xn,yn,wn)
(x1,y1,w1), … (xn,yn,wn)
h8h9
(x1,y1,w1), … (xn,yn,wn)
(x1,y1,w1), … (xn,yn,wn)
hT
The weak requirement:
Final rule: Sign[ α1 h1 + α2 h2 + + αΤ hT ]

5
Note that the weak learner MUST do better than
random (error less than 50-50 on binary classification)!

6
7
8
9
Adaboost
• Binary labels y = -1,+1
• margin(x,y) = y [Σt αt ht(x)]
• P(x,y) = (1/Z) exp (-margin(x,y))
• Given ht, we choose αt to minimize
Σ(x,y) exp (-margin(x,y))

10
Main property of adaboost Adaboost as gradient descent
• If advantages of weak rules over random • Discriminator class: a linear discriminator in the
guessing are: γ1,γ2,..,γT then in-sample error space of “weak hypotheses”
of final rule is at most • Original goal: find hyperplane with smallest
number of mistakes
– Known to be an NP-hard problem (no algorithm that
runs in time polynomial in d, where d is the dimension
of the space)
(w.r.t. the initial weights)
• Computational method: Use exponential loss as a
surrogate, perform gradient descent.

11
Margins view Adaboost et al.
x, w " R n ; y " {!1,+1} Prediction = sign( w • x) Adaboost = e ! y ( w• x )
Logitboost
y ( w • x)
Margin = Loss
w! x
Cumulative # examples
- w Brownboost
+ -+ + Mistakes Correct
+ -+ + - 0-1 loss
- - Project
+- -
ct
rre
Co

Margin
es

Mistakes
ak
ist

Margin
Correct
M

12
One coordinate at a time What is a good weak learner?
• Adaboost performs gradient descent on exponential loss • The set of weak rules (features) should be flexible enough
• Adds one coordinate (“weak learner”) at each iteration. to be (weakly) correlated with most conceivable relations
• Weak learning in binary classification = slightly better between feature vector and label.
than random guessing. • Small enough to allow exhaustive search for the minimal
• Weak learning in regression – unclear. weighted training error.
• Uses example-weights to communicate the gradient • Small enough to avoid over-fitting.
direction to the weak learner • Should be able to calculate predicted label very efficiently.
• Solves a computational problem • Rules can be “specialists” – predict only on a small subset
of the input space and abstain from predicting on the rest
(output 0).

13
14
15
16
17
18
19
20
21
22
Decision Trees Decision tree as a sum
Y Y
-0.2

X>3
+1 X>3 +0.2
+1
no

no
ye

ye
s

s
5
-1 Y>5 -1
sign -0.1 +0.1
-0.1
-1 -0.2 +0.1
Y>5
no

-1 -0.3
-1
s

ye
-1 +1

s
-0.3 +0.2
X X
3

23
An alternating decision tree Example: Medical Diagnostics
Y

-0.2 •Cleve dataset from UC Irvine database.

+0.2
+1
Y<1 X>3
•Heart disease diagnostics (+1=healthy,-1=sick)
no

ye
ye

s
s

-1 •13 features from tests (real valued and discrete).

sign 0.0 +0.7 -0.1 +0.1
-0.1 +0.1
Y>5 -1
-0.3 •303 instances.
no

ye
s

-0.3 +0.2

+0.7
+1
X

24
Adtree for Cleveland heart-disease diagnostics problem
Cross-validated accuracy
Learning Number of Average Test error
algorithm splits test error variance

ADtree 6 17.0% 0.6%

C5.0 27 27.2% 0.5%

C5.0 +
446 20.2% 0.5%
boosting
Boost
16 16.5% 0.8%
Stumps

25
Curious phenomenon
Boosting decision trees

Using <10,000 training examples we fit >2,000,000 parameters

26
Explanation using margins Explanation using margins

0-1 loss 0-1 loss

Margin No examples Margin

with small !
margins!!

27
Experimental Evidence Theorem Schapire, Freund, Bartlett & Lee
Annals of stat. 98

For any convex combination and any threshold

Fraction of training example

Probability of mistake with small margin

Size of training sample

No dependence on number
of weak rules
that are combined!!!
VC dimension of weak
rules

28
Suggested optimization problem Idea of Proof

d
m
!

Margin
!

29
Applications of Boosting Academic research
% test error rates
• Academic research Error
Database Other Boosting
• Applied research reduction

• Commercial deployment Cleveland 27.2 (DT) 16.5 39%

Promoters 22.0 (DT) 11.8 46%
Letter 13.8 (DT) 3.5 74%
Reuters 4 5.8, 6.0, 9.8 2.95 ~60%
Reuters 8 11.3, 12.1, 13.4 7.4 ~40%

30
Schapire, Singer, Gorin 98

Applied research Examples

• “AT&T, How may I help you?” • Yes I’d like to place a collect call long distance
please  collect
• Classify voice requests • Operator I need to make a call but I need to bill it
• Voice -> text -> category to my office  third party
• Fourteen categories • Yes I’d like to place a call on my master card
Area code, AT&T service, billing credit, please  calling card
calling card, collect, competitor, dial assistance, • I just called a number in Sioux city and I musta
directory, how to dial, person to person, rate, rang the wrong number because I got the wrong
third party, time charge ,time party and I would like to have that taken off my
bill  billing credit

31
Weak rules generated by “boostexter”
Third Results
Calling Collect party
Category
card call
Weak • 7844 training examples
Rule – hand transcribed
• 1000 test examples
– hand / machine transcribed
• Accuracy with 20% rejected
– Machine transcribed: 75%
– Hand transcribed: 90%
Word occurs
Word does
not occur

32
Freund, Mason, Rogers, Pregibon, Cortes 2000

Commercial deployment Massive datasets

• Distinguish business/residence customers • 260M calls / day
• 230M telephone numbers
• Using statistics from call-detail records
• Label unknown for ~30%
• Alternating decision trees • Hancock: software for computing statistical signatures.
– Similar to boosting decision trees, more flexible • 100K randomly selected training examples,
– Combines very simple rules • ~10K is enough
– Can over-fit, cross validation used to stop • Training takes about 2 hours.
• Generated classifier has to be both accurate and efficient

33
Alternating tree for “buizocity” Alternating Tree (Detail)

34
Precision/recall graphs
Business impact
• Increased coverage from 44% to 56%
• Accuracy ~94%
Accuracy

• Saved AT&T 15M$ in the year 2000 in

operations costs and missed opportunities.

Score

35
Summary Come talk with me!
• Boosting is a computational method for learning • [email protected]
accurate classifiers
• http://www.cs.huji.ac.il/~yoavf
• Resistance to over-fit explained by margins
• Underlying explanation –
large “neighborhoods” of good classifiers
• Boosting has been applied successfully to a
variety of classification problems

Learning With Kernels Support Vector Machines, Regularization, Optimization, and Beyond by Bernhard Schlkopf, Alexander J. Smola
No ratings yet
Learning With Kernels Support Vector Machines, Regularization, Optimization, and Beyond by Bernhard Schlkopf, Alexander J. Smola
644 pages
Sta Vlsi
100% (2)
Sta Vlsi
40 pages
22 Boosting
No ratings yet
22 Boosting
32 pages
Introduction To Boosting: Cynthia Rudin PACM, Princeton University
No ratings yet
Introduction To Boosting: Cynthia Rudin PACM, Princeton University
29 pages
Computational Data Analysis: Machine Learning
No ratings yet
Computational Data Analysis: Machine Learning
26 pages
کتاب هفتم بارگزاری شده
No ratings yet
کتاب هفتم بارگزاری شده
57 pages
Introduction To Boosting - 2
No ratings yet
Introduction To Boosting - 2
79 pages
chapter 3- boosting theory
No ratings yet
chapter 3- boosting theory
7 pages
CSC 3304 Lecture 08 Boosting Ensemble Methods
No ratings yet
CSC 3304 Lecture 08 Boosting Ensemble Methods
41 pages
Machine Learning: Ensemble Methods
No ratings yet
Machine Learning: Ensemble Methods
54 pages
Lecture-10-boosting
No ratings yet
Lecture-10-boosting
20 pages
ENG6500 7 Ensembles Boosting
No ratings yet
ENG6500 7 Ensembles Boosting
49 pages
Ada Boost
No ratings yet
Ada Boost
25 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
Bagging and Boosting: 9.520 Class 10, 13 March 2006 Sasha Rakhlin
No ratings yet
Bagging and Boosting: 9.520 Class 10, 13 March 2006 Sasha Rakhlin
19 pages
Boosting Approach To Machine Learn
No ratings yet
Boosting Approach To Machine Learn
23 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
32 pages
CS229 Supplemental Lecture Notes: 1 Boosting
No ratings yet
CS229 Supplemental Lecture Notes: 1 Boosting
11 pages
A Short Introduction To Boosting
No ratings yet
A Short Introduction To Boosting
14 pages
Boosting
No ratings yet
Boosting
11 pages
07 Boosting Notes
No ratings yet
07 Boosting Notes
10 pages
A Short Introduction To Boosting
No ratings yet
A Short Introduction To Boosting
14 pages
Class Adv Classification V
No ratings yet
Class Adv Classification V
50 pages
Introduction to Boosting: Slides Adapted from Che Wanxiang (车万翔) at HIT, and Robin Dhamankar of Many thanks!
100% (1)
Introduction to Boosting: Slides Adapted from Che Wanxiang (车万翔) at HIT, and Robin Dhamankar of Many thanks!
41 pages
Scha Pire
No ratings yet
Scha Pire
182 pages
Adaboost: Derek Hoiem March 31, 2004
No ratings yet
Adaboost: Derek Hoiem March 31, 2004
46 pages
A Brief Introduction To Adaboost: Hongbo Deng 6 Feb, 2007
No ratings yet
A Brief Introduction To Adaboost: Hongbo Deng 6 Feb, 2007
35 pages
1 Eric Boosting304FinalRpdf
No ratings yet
1 Eric Boosting304FinalRpdf
19 pages
107 Boostong Models
No ratings yet
107 Boostong Models
27 pages
Boosting
No ratings yet
Boosting
12 pages
Boosting and Applications Yuan
No ratings yet
Boosting and Applications Yuan
41 pages
Ensemble Classifiers
No ratings yet
Ensemble Classifiers
37 pages
Boosting Margin
No ratings yet
Boosting Margin
30 pages
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
No ratings yet
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
6 pages
Survey - Gradient Boosting Machine
No ratings yet
Survey - Gradient Boosting Machine
9 pages
09_EnsembleLearning
No ratings yet
09_EnsembleLearning
36 pages
ML8Ensembles (1)
No ratings yet
ML8Ensembles (1)
31 pages
Lecture18 Boosting
No ratings yet
Lecture18 Boosting
21 pages
Week 11 EnsembleLearning
No ratings yet
Week 11 EnsembleLearning
34 pages
Experimenting XGBoost Algorithmfor Predictionand Classificationof Different Datasets
No ratings yet
Experimenting XGBoost Algorithmfor Predictionand Classificationof Different Datasets
12 pages
Zhu - Multiclass Adaboost2009 PDF
No ratings yet
Zhu - Multiclass Adaboost2009 PDF
12 pages
A Short Introduction To Boosting
No ratings yet
A Short Introduction To Boosting
14 pages
Artificial Intelligence Fundamentals: Learning: Boosting
No ratings yet
Artificial Intelligence Fundamentals: Learning: Boosting
24 pages
Ensemble Learning Methods
100% (1)
Ensemble Learning Methods
24 pages
The Evolution of Boosting Algorithms: From Machine Learning To Statistical Modelling
No ratings yet
The Evolution of Boosting Algorithms: From Machine Learning To Statistical Modelling
32 pages
Module 5,1 Ensemble_Bagging, RF,Boosting
No ratings yet
Module 5,1 Ensemble_Bagging, RF,Boosting
66 pages
Ensemble Classifiers
100% (1)
Ensemble Classifiers
37 pages
_LECTURE+NOTES_Boosting
No ratings yet
_LECTURE+NOTES_Boosting
8 pages
AIML Lect6 Ensembles
No ratings yet
AIML Lect6 Ensembles
41 pages
Adaboost Algorithm
No ratings yet
Adaboost Algorithm
17 pages
IMPROVE_boost_1999
No ratings yet
IMPROVE_boost_1999
40 pages
Improving Classification With AdaBoost
No ratings yet
Improving Classification With AdaBoost
20 pages
Lec5 Boosting v2.7 1
No ratings yet
Lec5 Boosting v2.7 1
46 pages
LectureNotes7
No ratings yet
LectureNotes7
8 pages
AdaBoost Final
No ratings yet
AdaBoost Final
97 pages
AdaBoost Is Consistent
No ratings yet
AdaBoost Is Consistent
22 pages
ensemble
No ratings yet
ensemble
33 pages
Session 10 - Ensemble Methods (XGBoost)
No ratings yet
Session 10 - Ensemble Methods (XGBoost)
37 pages
Boosting
No ratings yet
Boosting
6 pages
Invited Commentary -- Machine Learning in Causal Inference—How Do I Love
No ratings yet
Invited Commentary -- Machine Learning in Causal Inference—How Do I Love
5 pages
Scientist’s guide to developing explanatory statistical models using Causal Principles
No ratings yet
Scientist’s guide to developing explanatory statistical models using Causal Principles
14 pages
Sucicide Rates in India - 2014-2021
No ratings yet
Sucicide Rates in India - 2014-2021
3 pages
AGRAME - Any-GranularityRanking With Multi-Vector Embeddings
No ratings yet
AGRAME - Any-GranularityRanking With Multi-Vector Embeddings
13 pages
Wainwright Microsoft Slides2
No ratings yet
Wainwright Microsoft Slides2
67 pages
coarse
No ratings yet
coarse
8 pages
Kejriwal Knowledge Graph Tutorial - 2020-12-asonam-tutorial-KG
No ratings yet
Kejriwal Knowledge Graph Tutorial - 2020-12-asonam-tutorial-KG
73 pages
Efron Mixture
No ratings yet
Efron Mixture
9 pages
Dynamics of Non-Expansive Maps On Strictly Convex
No ratings yet
Dynamics of Non-Expansive Maps On Strictly Convex
18 pages
Domain Representative Keywords Selection - A Probabilistic Approach
No ratings yet
Domain Representative Keywords Selection - A Probabilistic Approach
14 pages
09 Boosting
No ratings yet
09 Boosting
17 pages
S0007125022000137 Sup 001
No ratings yet
S0007125022000137 Sup 001
34 pages
A Grammar of Freethought by Chapman Cohen
No ratings yet
A Grammar of Freethought by Chapman Cohen
72 pages
Ryali Multivariate Dynamical 10
No ratings yet
Ryali Multivariate Dynamical 10
17 pages
A Brief History of Computing
No ratings yet
A Brief History of Computing
61 pages
Lambers Analytic Geometry
No ratings yet
Lambers Analytic Geometry
186 pages
Partial Discharge On Bushing
100% (2)
Partial Discharge On Bushing
87 pages
Ficha Técnica LG Giant 11 Kgs - Lavadora y Secadoras-1
No ratings yet
Ficha Técnica LG Giant 11 Kgs - Lavadora y Secadoras-1
1 page
Activity 2. Search On This Activity 3. THINK OF THIS
No ratings yet
Activity 2. Search On This Activity 3. THINK OF THIS
3 pages
KORG Collection: Owner's Manual
No ratings yet
KORG Collection: Owner's Manual
65 pages
Regular Grammar, Regular Languages and Properties of Regular Languages
No ratings yet
Regular Grammar, Regular Languages and Properties of Regular Languages
16 pages
Chapter 8 Transport in Humans_Student_S3_completed version
No ratings yet
Chapter 8 Transport in Humans_Student_S3_completed version
19 pages
Inspection of Boilers
No ratings yet
Inspection of Boilers
13 pages
(IJETA-V11I3P17) :vikash Kumar, Khushbu Jain, Arin Joshi, Ayush Mishra, Manvi Sharma
No ratings yet
(IJETA-V11I3P17) :vikash Kumar, Khushbu Jain, Arin Joshi, Ayush Mishra, Manvi Sharma
4 pages
IS 1500 - 2013 - Brinell
100% (1)
IS 1500 - 2013 - Brinell
17 pages
Hall Effect and Magnetoresistivity Effect: Experimental Competition-Problem No.1
No ratings yet
Hall Effect and Magnetoresistivity Effect: Experimental Competition-Problem No.1
3 pages
General Chemistry: Atoms and The Atomic Theory
No ratings yet
General Chemistry: Atoms and The Atomic Theory
27 pages
Indian Space Year " ": Research Centre Place
No ratings yet
Indian Space Year " ": Research Centre Place
2 pages
Nandika Sampath Tennakoon
No ratings yet
Nandika Sampath Tennakoon
13 pages
Assessment of Interface Shear Behaviour of Sub-Ballast With Geosynthetics by Large-Scale Direct Shear Test
No ratings yet
Assessment of Interface Shear Behaviour of Sub-Ballast With Geosynthetics by Large-Scale Direct Shear Test
9 pages
AmanKumar (2 7)
No ratings yet
AmanKumar (2 7)
1 page
LIVRO - Simplício - On Aristotle Physics 7
No ratings yet
LIVRO - Simplício - On Aristotle Physics 7
202 pages
SQL Interview Questions
100% (2)
SQL Interview Questions
46 pages
Optimum Design of Reinforced Concrete Raft Foundations Using Finite Element Analysis
No ratings yet
Optimum Design of Reinforced Concrete Raft Foundations Using Finite Element Analysis
78 pages
Math Magic: Welcome
0% (1)
Math Magic: Welcome
14 pages
COSO Checkweigher
No ratings yet
COSO Checkweigher
10 pages
ENS0100
No ratings yet
ENS0100
11 pages
COA of Siberian Ginseng Extract
No ratings yet
COA of Siberian Ginseng Extract
1 page
JEP155
No ratings yet
JEP155
56 pages
Manual Taller Motor Rotax
0% (1)
Manual Taller Motor Rotax
155 pages
Kinds of Variables and Their Uses
100% (2)
Kinds of Variables and Their Uses
2 pages
ECX-bijections
No ratings yet
ECX-bijections
13 pages
Understanding DAG and Lazy Evaluation in Spark
No ratings yet
Understanding DAG and Lazy Evaluation in Spark
12 pages
Device Drivers For Your Girlfriend Sysplay Dot in
No ratings yet
Device Drivers For Your Girlfriend Sysplay Dot in
163 pages
Parking Assignment/Transportation Model: Inputs
No ratings yet
Parking Assignment/Transportation Model: Inputs
4 pages