0% found this document useful (0 votes)

54 views

Lecture 9

The document discusses combining multiple machine learning models to improve predictive performance. Specifically, it discusses ensemble methods like bagging, boosting, and neural networks that average or sequentially combine the predictions of multiple models. Combining models can significantly reduce prediction errors compared to single models by reducing the variance of predictions. The document provides examples of different ensemble algorithms and how they work.

Uploaded by

hiimamanenu

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views

Lecture 9

Uploaded by

hiimamanenu

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 24

Combining Models

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 1
Motivation 1
• Many models can be trained on the same data
• Typically none is strictly better than others
• Recall “no free lunch theorem”
• Can we “combine” predictions from multiple models?

• Yes, typically with significant reduction of error!

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 2
Motivation 2
• Combined prediction using Adaptive Basis Functions
𝑀
𝑓 ( 𝑥 ) =∑ 𝑤𝑚 𝜙𝑚 (𝑥 ;𝑣 𝑚)
𝑖=1

• M basis functions with own parameters

• Weight / confidence of each basis function
• Parameters including M trained using data

• Another interpretation: automatically learning best

representation of data for the task at hand

• Difference with mixture models?

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 3
Examples of Model Combinations
• Also called Ensemble Learning

• Decision Trees
• Bagging
• Boosting
• Committee / Mixture of Experts
• Feed forward neural nets / Multi-layer perceptrons
• …

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 4
Decision Trees
• Partition input space into cuboid regions
• Simple model for each region
• Classification: Single label; Regression: Constant real value
• Sequential process to choose model per instance
• Decision tree

Figure from Murphy

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 5
Learning Decision Trees
• Decision for each region
• Regression: Average of training data for the region
• Classification: Most likely label in the region

• Learning tree structure and splitting values

• Learning optimal tree intractable

• Greedy algorithm
• Find (node, dim., value) w/ largest reduction of “error”
• Regression error: residual sum of squares
• Classification: Misclassification error, entropy, …
• Stopping condition
• Preventing overfitting: Pruning using cross validation

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 6
Pros and Cons of Decision Trees
• Easily interpretable decision process
• Widely used in practice, e.g. medical diagnosis

• Not very good performance

• Restricted partition of space
• Restricted to choose one model per instance
• Unstable

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 7
Mixture of Supervised Models

𝑓 ( 𝑥 ) =∑ 𝜋 𝑘 𝜙 𝑘(𝑥 ,𝑤)
𝑖
Mixture of linear regression models Mixture of logistic regression models

Figure from Murphy

• Training using EM

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 8
Conditional Mixture of Supervised Models

• Mixture of experts
𝑓 ( 𝑥 ) =∑ 𝜋 𝑘 (𝑥)𝜙 𝑘 (𝑥,𝑤)
𝑖

Figure from Murphy

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya
9
Bootstrap Aggregation / Bagging
• Individual models (e.g. decision trees) may have high
variance along with low bias

• Construct M bootstrap datasets

• Train separate copy of predictive model on each
• Average prediction over copies
1
𝑓 ( 𝑥 ) = ∑ 𝑓 𝑚 (𝑥)
𝑀 𝑖
• If the errors are uncorrelated, then bagged error
reduces linearly with M

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 10
Random Forests
• Training same algorithm on bootstraps creates
correlated errors

• Randomly choose (a) subset of variables and (b)

subset of training data

• Good predictive accuracy

• Loss in interpretability

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 11
Boosting
• Combining weak learners, -better than random
• E.g. Decision stumps

• Sequence of weighted datasets

• Weight of data point in each iteration proportional
to no of misclassifications in earlier iterations

• Specific weighting scheme depends on loss function

• Theoretical bounds on error

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 12
Example loss functions and algorithms
• Squared error
• Absolute error

• Squared loss
• 0-1 loss
• Exponential loss
• Logloss
• Hinge loss

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 13
Example: AdaBoost
• Binary classification problem + Exponential loss

1. Initialize
2. Train classifier minimizing
3. Evaluate and
4. Update wts
5. Predict

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 14
Neural networks: Multilayer Perceptrons

• Multiple layers of logistic regression models

• Parameters of each optimized by training

• Motivated by models of the brain

• Powerful learning model regardless

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 15
LR and R remembered …
• Linear models with fixed basis functions

𝑦 ( 𝑥 , 𝑤)= 𝑓 ¿
• Fixed basis functions
• Non-linear transformation

• linear followed by non-linear transformation

^𝑦 ( 𝑥;𝑤 ,𝑣 ) =h( ∑ 𝑤𝑘𝑗 𝑔( ∑ 𝑣 𝑗𝑖 𝑥𝑖 ))

𝑗=1𝑡𝑜 𝑀 𝑖=1𝑡𝑜 𝐷

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 16
Feed-forward network functions
• M linear combinations of input variables
𝑎 𝑗= ∑ 𝑣 𝑗𝑖 𝑥𝑖
𝑖=1𝑡𝑜 𝐷
• Apply non-linear activation function
𝑧 𝑗=𝑔(𝑎 𝑗 )
• Linear combinations to get output activations
𝑏 𝑘= ∑ 𝑤𝑘𝑗 𝑧 𝑗
𝑗=1𝑡𝑜 𝑀
• Apply output activation function to get outputs

𝑦 𝑘=h (𝑏𝑘 )
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 17
Network Representation

𝑖𝑛𝑝𝑢𝑡𝑠 h𝑖𝑑𝑑𝑒𝑛 𝑜𝑢𝑡𝑝𝑢𝑡𝑠

𝑔
𝑣 𝑎𝑀 𝑧𝑀
𝑤

𝑥𝐷 h
𝑏1 𝑦1 Easy to generalize to
multiple layers
𝑣 𝑗𝑖 𝑤𝑘𝑗
𝑥𝑖 𝑎𝑗 𝑧𝑗 𝑏𝑘 𝑦𝑘
𝑏𝐾 𝑦𝐾
𝑥1
𝑎1 𝑧1

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 18
Power of feed-forward networks
• Universal approximators

A 2 layer network with linear outputs can

uniformly approximate any smooth continuous
function with arbitrary accuracy given
sufficient number of nodes in hidden layer

• Why are >2 layers needed?

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 19
Training
• Formulate error function in terms of weights

∑ | ^𝑦 ( 𝑥𝑛 ;𝑤 ,𝑣 ) − 𝑦 𝑛||
2
𝐸 ( 𝑤 ,𝑣 )=
𝑖=1𝑡𝑜 𝑁

• Optimize weights using gradient descent

( 𝑤 , 𝑣 )(𝑡 +1 )=( 𝑤 , 𝑣 )(𝑡 ) −𝜂 𝛻 𝐸( (𝑤 , 𝑣 )(𝑡 ))

• Deriving the gradient looks complicated because of

feed-forward …

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 20
Error Backpropagation
• Full gradient: sequence of local computations and
propagations over the network
𝜕 𝐸𝑛 𝜕 𝐸 𝑛 𝜕 𝑏𝑛𝑘
Output layer

𝑤
= =𝛿𝑛𝑘 𝑧 𝑛𝑗 𝑓𝑜𝑟𝑤𝑎𝑟𝑑 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛
𝜕 𝑤𝑘𝑗 𝜕 𝑏𝑛𝑘 𝜕 𝑤𝑘𝑗
𝛿𝑤 ^ 𝑔 h
𝑛𝑘= 𝑦 𝑛𝑘 − 𝑦 𝑛𝑘 𝑣 𝑗𝑖 𝑤𝑘𝑗
𝑥𝑖 𝑎𝑗 𝑧𝑗 𝑏𝑘 ^
𝑦𝑘
𝛿 𝑣𝑛𝑗
𝑤
𝛿 𝑛𝑘 𝑦𝑘

𝜕 𝐸𝑛 𝜕 𝐸𝑛 𝜕 𝑎𝑛𝑗 𝑏𝑎𝑐𝑘𝑤𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟

Hidden layer

𝑣
= =𝛿𝑛𝑗 𝑥𝑛𝑖
𝜕 𝑣 𝑗𝑖 𝜕 𝑎𝑛𝑗 𝜕 𝑣 𝑗𝑖
𝜕 𝐸𝑛 𝜕 𝑏𝑛𝑘 𝜕𝐸 𝜕 𝐸𝑛
𝛿 =∑
𝑣
=∑ 𝛿𝑤𝑛𝑘 𝑤𝑘𝑗 𝑔 ′ (𝑎𝑛𝑗 ) =∑
𝑛𝑗
𝑘 𝜕 𝑏𝑛𝑘 𝜕𝑎 𝑛𝑗 𝑘 𝜕 𝑤 𝑛 𝜕𝑤

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 21
Backpropagation Algorithm
1. Apply input vector and compute derived variables
2. Compute at all output nodes
3. Back propagate to compute at all hidden nodes
4. Compute derivatives and
5. Batch: Sum derivatives over all input vectors

• Vanishing gradient problem

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 22
Neural Network Regularization
• Given such a large number of parameters,
preventing overfitting is vitally important

• Choosing the number of layers + no of hidden nodes

• Controlling the weights
• Weight decay
• Early stopping
• Weight sharing
• Structural regularization
• Convolutional neural networks for invariances in image data

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 23
So… Which classifier is the best in practice?
• Extensive experimentation by Caruana et al 2006
• See more recent study here
• Low dimensions (9-200) • High dimensions (500-100K)
1. Boosted decision trees 1. HMC MLP
2. Random forests 2. Boosted MLP
3. Bagged decision trees 3. Bagged MLP
4. SVM 4. Boosted trees
5. Neural nets
5. Random forests
6. K nearest neighbors
7. Boosted stumps
8. Decision tree
9. Logistic regression
10. Naïve Bayes
• Remember the “No Free Lunch” theorem …
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 24

Probabilistic Models With Latent Variables
No ratings yet
Probabilistic Models With Latent Variables
29 pages
lecture2
No ratings yet
lecture2
21 pages
Probabilistic Models For Classification
No ratings yet
Probabilistic Models For Classification
32 pages
Lec-01-Introduction to Statistical Learning
No ratings yet
Lec-01-Introduction to Statistical Learning
38 pages
Assignment1 MFML
No ratings yet
Assignment1 MFML
2 pages
Autoencoder
No ratings yet
Autoencoder
24 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
4 DataAnalyics Part1
No ratings yet
4 DataAnalyics Part1
59 pages
Module 3
No ratings yet
Module 3
27 pages
Lec29 ImportanceSampling
No ratings yet
Lec29 ImportanceSampling
84 pages
2. Linear_ Regression_SGD
No ratings yet
2. Linear_ Regression_SGD
71 pages
Sysid24 0033 MS
No ratings yet
Sysid24 0033 MS
7 pages
Who Am I?: Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya
No ratings yet
Who Am I?: Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya
13 pages
Deep Learning: Computer Science and Engineering
No ratings yet
Deep Learning: Computer Science and Engineering
18 pages
Chap 6 - Deep FeedForward Networks - Eunjeong Yi
No ratings yet
Chap 6 - Deep FeedForward Networks - Eunjeong Yi
21 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
47 pages
Provable Non-Convex Optimization For ML: Prateek Jain Microsoft Research India
No ratings yet
Provable Non-Convex Optimization For ML: Prateek Jain Microsoft Research India
86 pages
labook DA
No ratings yet
labook DA
59 pages
Lesson 4 Gradient Descent
No ratings yet
Lesson 4 Gradient Descent
13 pages
IJARCCE.2022.115105
No ratings yet
IJARCCE.2022.115105
6 pages
GR_1_report_week_7
No ratings yet
GR_1_report_week_7
6 pages
Opt Amsgrad
No ratings yet
Opt Amsgrad
16 pages
ADA LAB Manual CE
No ratings yet
ADA LAB Manual CE
41 pages
Machine Learning
No ratings yet
Machine Learning
48 pages
Open Algorithm Selection Challenge 2017 - Setup and Scenario
No ratings yet
Open Algorithm Selection Challenge 2017 - Setup and Scenario
7 pages
DSA-23
No ratings yet
DSA-23
6 pages
Numerical Methods For Engineers ch2
No ratings yet
Numerical Methods For Engineers ch2
73 pages
Lecture4 2024
No ratings yet
Lecture4 2024
67 pages
ML Video
No ratings yet
ML Video
8 pages
Anfis P - Ijertv11n12 - 26
No ratings yet
Anfis P - Ijertv11n12 - 26
16 pages
AI Foundations and Applications: 4. Linear Regression
No ratings yet
AI Foundations and Applications: 4. Linear Regression
31 pages
Math 404 - W01 - Intro
No ratings yet
Math 404 - W01 - Intro
28 pages
009-Neural_Networks-Complete
No ratings yet
009-Neural_Networks-Complete
61 pages
Linear Regression: Level:4 Department: IT, Security
No ratings yet
Linear Regression: Level:4 Department: IT, Security
35 pages
Genetic Algorithm
No ratings yet
Genetic Algorithm
46 pages
Lecture 4
No ratings yet
Lecture 4
20 pages
Resume
No ratings yet
Resume
3 pages
DS286 AUG2016 L25-26 Algorithms
No ratings yet
DS286 AUG2016 L25-26 Algorithms
39 pages
Optimization of Deep Networks
No ratings yet
Optimization of Deep Networks
84 pages
3 Math Basics
No ratings yet
3 Math Basics
70 pages
Introduction to Optimization-Lec1
No ratings yet
Introduction to Optimization-Lec1
36 pages
Ijiset V7 I5 05
No ratings yet
Ijiset V7 I5 05
7 pages
lec4
No ratings yet
lec4
33 pages
or... S - CSE - Piyush Srivastava PDF
No ratings yet
or... S - CSE - Piyush Srivastava PDF
2 pages
Omar Arif Omar - Arif@seecs - Edu.pk National University of Sciences and Technology
No ratings yet
Omar Arif Omar - Arif@seecs - Edu.pk National University of Sciences and Technology
44 pages
Intro To ML
No ratings yet
Intro To ML
26 pages
2024-SCU-ML-1-3-PLA
No ratings yet
2024-SCU-ML-1-3-PLA
50 pages
Adaptive Stochastic Conjugate Gradient for Machine Learning
No ratings yet
Adaptive Stochastic Conjugate Gradient for Machine Learning
14 pages
Kriti Summer Training
No ratings yet
Kriti Summer Training
40 pages
Blue Gradient Modern Illustration Computer Presentation
No ratings yet
Blue Gradient Modern Illustration Computer Presentation
16 pages
Deep Learning CS60010: Computer Science and Engineering
No ratings yet
Deep Learning CS60010: Computer Science and Engineering
59 pages
Format Practical Index
No ratings yet
Format Practical Index
4 pages
End-to-End ML Project - Feature Engieering
No ratings yet
End-to-End ML Project - Feature Engieering
23 pages
CG50 Training Material
No ratings yet
CG50 Training Material
159 pages
PDS Practical
No ratings yet
PDS Practical
94 pages
DSA5105 Lecture8
No ratings yet
DSA5105 Lecture8
35 pages
calc_6.6_packet
No ratings yet
calc_6.6_packet
4 pages
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
From Everand
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
Fouad Sabry
No ratings yet
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
From Everand
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
Fouad Sabry
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Online Examination
No ratings yet
Online Examination
14 pages
Smart Helmet
No ratings yet
Smart Helmet
4 pages
Elsevier Nuclear Physics B Journal Template
No ratings yet
Elsevier Nuclear Physics B Journal Template
2 pages
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
No ratings yet
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
21 pages
Professional - Python - (Intro)
No ratings yet
Professional - Python - (Intro)
1 page
A Machine Hearing System For Robust Cough Detection Based On A High-Level Representation of Band-Specific Audio Features
No ratings yet
A Machine Hearing System For Robust Cough Detection Based On A High-Level Representation of Band-Specific Audio Features
12 pages
12
No ratings yet
12
27 pages
A Tutorial On Robotics Part I
No ratings yet
A Tutorial On Robotics Part I
12 pages
A Review of Reasons For Tiktok'S Global Surge: Yuchen Wang
No ratings yet
A Review of Reasons For Tiktok'S Global Surge: Yuchen Wang
4 pages
Anglais, QCM 2
No ratings yet
Anglais, QCM 2
19 pages
The Hardest Thing in Data Science - Caffeinated Data Science
No ratings yet
The Hardest Thing in Data Science - Caffeinated Data Science
5 pages
An In-Depth Look at Gemini's Language Abilities
No ratings yet
An In-Depth Look at Gemini's Language Abilities
23 pages
Artificial Intelligence in Healthcare and Education: General
No ratings yet
Artificial Intelligence in Healthcare and Education: General
4 pages
1.1. Trending Technology (2)
No ratings yet
1.1. Trending Technology (2)
5 pages
REsFil Machine Learning
No ratings yet
REsFil Machine Learning
5 pages
Graph Theory Implementation in NLP
No ratings yet
Graph Theory Implementation in NLP
9 pages
Pytorch (Tabular) - Regression
No ratings yet
Pytorch (Tabular) - Regression
13 pages
Final Exam Solution - Test Paper Final Exam Solution - Test Paper
No ratings yet
Final Exam Solution - Test Paper Final Exam Solution - Test Paper
82 pages
Deep Learning (R20a06610)
No ratings yet
Deep Learning (R20a06610)
170 pages
Adaline and Medaline
50% (2)
Adaline and Medaline
14 pages
Searle-Reith Lectures2
No ratings yet
Searle-Reith Lectures2
8 pages
Engineering Pedagogy: Graduate Students' Chatgpt Experience and Perspectives During Thesis Writing
No ratings yet
Engineering Pedagogy: Graduate Students' Chatgpt Experience and Perspectives During Thesis Writing
14 pages
Answer Book END-SEM
No ratings yet
Answer Book END-SEM
51 pages
Unit 5-1
No ratings yet
Unit 5-1
6 pages
Preliminary Study On Development of Cocoa Beans Fe
No ratings yet
Preliminary Study On Development of Cocoa Beans Fe
9 pages
Answer: A
No ratings yet
Answer: A
48 pages
ĐỀ HSG (đề 9)
No ratings yet
ĐỀ HSG (đề 9)
10 pages
The Electronic Bill of Lading: Challenges of Paperless Trade
No ratings yet
The Electronic Bill of Lading: Challenges of Paperless Trade
25 pages
OWASP-LLM_GenAI-Security-Solutions-Reference-Guide-v1.1.25
No ratings yet
OWASP-LLM_GenAI-Security-Solutions-Reference-Guide-v1.1.25
58 pages
The Role and Applications of Artificial Intelligence A I in Disaster Management
No ratings yet
The Role and Applications of Artificial Intelligence A I in Disaster Management
20 pages
ĐỀ 50 - ĐỀ CHUẨN CẤU TRÚC ĐỀ MINH HOẠ (SHARE)
No ratings yet
ĐỀ 50 - ĐỀ CHUẨN CẤU TRÚC ĐỀ MINH HOẠ (SHARE)
9 pages
Mint_Delhi_04-02-2025
No ratings yet
Mint_Delhi_04-02-2025
20 pages
B7 Comp Tsol-1
0% (1)
B7 Comp Tsol-1
4 pages
Langchain Presentation
No ratings yet
Langchain Presentation
14 pages
Medical Insurance Cost Prediction
100% (1)
Medical Insurance Cost Prediction
18 pages
ICE Workbench
No ratings yet
ICE Workbench
286 pages