Lecture 9
Lecture 9
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 1
Motivation 1
• Many models can be trained on the same data
• Typically none is strictly better than others
• Recall “no free lunch theorem”
• Can we “combine” predictions from multiple models?
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 2
Motivation 2
• Combined prediction using Adaptive Basis Functions
𝑀
𝑓 ( 𝑥 ) =∑ 𝑤𝑚 𝜙𝑚 (𝑥 ;𝑣 𝑚)
𝑖=1
• Decision Trees
• Bagging
• Boosting
• Committee / Mixture of Experts
• Feed forward neural nets / Multi-layer perceptrons
• …
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 4
Decision Trees
• Partition input space into cuboid regions
• Simple model for each region
• Classification: Single label; Regression: Constant real value
• Sequential process to choose model per instance
• Decision tree
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 5
Learning Decision Trees
• Decision for each region
• Regression: Average of training data for the region
• Classification: Most likely label in the region
• Greedy algorithm
• Find (node, dim., value) w/ largest reduction of “error”
• Regression error: residual sum of squares
• Classification: Misclassification error, entropy, …
• Stopping condition
• Preventing overfitting: Pruning using cross validation
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 6
Pros and Cons of Decision Trees
• Easily interpretable decision process
• Widely used in practice, e.g. medical diagnosis
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 7
Mixture of Supervised Models
𝑓 ( 𝑥 ) =∑ 𝜋 𝑘 𝜙 𝑘(𝑥 ,𝑤)
𝑖
Mixture of linear regression models Mixture of logistic regression models
• Training using EM
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 8
Conditional Mixture of Supervised Models
• Mixture of experts
𝑓 ( 𝑥 ) =∑ 𝜋 𝑘 (𝑥)𝜙 𝑘 (𝑥,𝑤)
𝑖
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya
9
Bootstrap Aggregation / Bagging
• Individual models (e.g. decision trees) may have high
variance along with low bias
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 10
Random Forests
• Training same algorithm on bootstraps creates
correlated errors
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 11
Boosting
• Combining weak learners, -better than random
• E.g. Decision stumps
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 12
Example loss functions and algorithms
• Squared error
• Absolute error
• Squared loss
• 0-1 loss
• Exponential loss
• Logloss
• Hinge loss
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 13
Example: AdaBoost
• Binary classification problem + Exponential loss
1. Initialize
2. Train classifier minimizing
3. Evaluate and
4. Update wts
5. Predict
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 14
Neural networks: Multilayer Perceptrons
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 15
LR and R remembered …
• Linear models with fixed basis functions
𝑦 ( 𝑥 , 𝑤)= 𝑓 ¿
• Fixed basis functions
• Non-linear transformation
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 16
Feed-forward network functions
• M linear combinations of input variables
𝑎 𝑗= ∑ 𝑣 𝑗𝑖 𝑥𝑖
𝑖=1𝑡𝑜 𝐷
• Apply non-linear activation function
𝑧 𝑗=𝑔(𝑎 𝑗 )
• Linear combinations to get output activations
𝑏 𝑘= ∑ 𝑤𝑘𝑗 𝑧 𝑗
𝑗=1𝑡𝑜 𝑀
• Apply output activation function to get outputs
𝑦 𝑘=h (𝑏𝑘 )
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 17
Network Representation
𝑥𝐷 h
𝑏1 𝑦1 Easy to generalize to
multiple layers
𝑣 𝑗𝑖 𝑤𝑘𝑗
𝑥𝑖 𝑎𝑗 𝑧𝑗 𝑏𝑘 𝑦𝑘
𝑏𝐾 𝑦𝐾
𝑥1
𝑎1 𝑧1
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 18
Power of feed-forward networks
• Universal approximators
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 19
Training
• Formulate error function in terms of weights
∑ | ^𝑦 ( 𝑥𝑛 ;𝑤 ,𝑣 ) − 𝑦 𝑛||
2
𝐸 ( 𝑤 ,𝑣 )=
𝑖=1𝑡𝑜 𝑁
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 20
Error Backpropagation
• Full gradient: sequence of local computations and
propagations over the network
𝜕 𝐸𝑛 𝜕 𝐸 𝑛 𝜕 𝑏𝑛𝑘
Output layer
𝑤
= =𝛿𝑛𝑘 𝑧 𝑛𝑗 𝑓𝑜𝑟𝑤𝑎𝑟𝑑 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛
𝜕 𝑤𝑘𝑗 𝜕 𝑏𝑛𝑘 𝜕 𝑤𝑘𝑗
𝛿𝑤 ^ 𝑔 h
𝑛𝑘= 𝑦 𝑛𝑘 − 𝑦 𝑛𝑘 𝑣 𝑗𝑖 𝑤𝑘𝑗
𝑥𝑖 𝑎𝑗 𝑧𝑗 𝑏𝑘 ^
𝑦𝑘
𝛿 𝑣𝑛𝑗
𝑤
𝛿 𝑛𝑘 𝑦𝑘
𝑣
= =𝛿𝑛𝑗 𝑥𝑛𝑖
𝜕 𝑣 𝑗𝑖 𝜕 𝑎𝑛𝑗 𝜕 𝑣 𝑗𝑖
𝜕 𝐸𝑛 𝜕 𝑏𝑛𝑘 𝜕𝐸 𝜕 𝐸𝑛
𝛿 =∑
𝑣
=∑ 𝛿𝑤𝑛𝑘 𝑤𝑘𝑗 𝑔 ′ (𝑎𝑛𝑗 ) =∑
𝑛𝑗
𝑘 𝜕 𝑏𝑛𝑘 𝜕𝑎 𝑛𝑗 𝑘 𝜕 𝑤 𝑛 𝜕𝑤
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 21
Backpropagation Algorithm
1. Apply input vector and compute derived variables
2. Compute at all output nodes
3. Back propagate to compute at all hidden nodes
4. Compute derivatives and
5. Batch: Sum derivatives over all input vectors
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 22
Neural Network Regularization
• Given such a large number of parameters,
preventing overfitting is vitally important
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 23
So… Which classifier is the best in practice?
• Extensive experimentation by Caruana et al 2006
• See more recent study here
• Low dimensions (9-200) • High dimensions (500-100K)
1. Boosted decision trees 1. HMC MLP
2. Random forests 2. Boosted MLP
3. Bagged decision trees 3. Bagged MLP
4. SVM 4. Boosted trees
5. Neural nets
5. Random forests
6. K nearest neighbors
7. Boosted stumps
8. Decision tree
9. Logistic regression
10. Naïve Bayes
• Remember the “No Free Lunch” theorem …
Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 24