L09-An Introduction To Machine Learning
L09-An Introduction To Machine Learning
▪ Training-Testing Concepts
Products
http://statweb.stanford.edu/~jhf/ftp/dm-stat.pdf
▪ While we don’t know how our brain converts input to output, we know what the
output should be for every input.
▪ We can use this knowledge to teach the machine.
"A computer program is said to learn from experience 𝐸 with respect to some class of
tasks 𝑇 and performance measure 𝑃, if its performance at tasks in 𝑇, as measured by
𝑃, improves with experience 𝐸."
Tom Mitchell. Machine Learning 1997.
▪ Semi-supervised learning
▪ Reinforcement learning.
▪ e.g. Q-Learning, Temporal Difference (TD), & Deep Adversarial Networks
𝑓: ℝ𝑑 → ℝ 𝑓 is called a regressor
▪ Clustering/segmentation:
𝑓: ℝ𝑑 → 𝐶1 , … , 𝐶𝑘 set of cluster
ML Algorithm
Model (f)
ML Algorithm
Income,
gender,
age, Credit amount $
family Model (f) Credit yes/np
status,
zipcode
ML Algorithm
Income,
gender,
age, Credit amount $
family Model (f) Credit yes/np
status,
zipcode
true label
𝐸 𝑡𝑟𝑎𝑖𝑛 𝑓 = 𝑙𝑜𝑠𝑠 𝑦𝑖 , 𝑓 𝑥𝑖
𝑖=1
▪ Examples of loss functions:
▪ Classification error:
1 𝑠𝑖𝑔𝑛 𝑦𝑖 ≠ 𝑠𝑖𝑔𝑛 𝑓 𝑥𝑖
𝑙𝑜𝑠𝑠 𝑦𝑖 , 𝑓 𝑥𝑖 =ቊ
0 otherwise
𝐸 𝑡𝑟𝑎𝑖𝑛 𝑓 = 𝑙𝑜𝑠𝑠 𝑦𝑖 , 𝑓 𝑥𝑖
𝑖=1
▪ Examples of loss functions:
▪ Classification error:
1 𝑠𝑖𝑔𝑛 𝑦𝑖 ≠ 𝑠𝑖𝑔𝑛 𝑓 𝑥𝑖
𝑙𝑜𝑠𝑠 𝑦𝑖 , 𝑓 𝑥𝑖 =ቊ
0 otherwise
▪ Least square loss:
2
𝑙𝑜𝑠𝑠 𝑦𝑖 , 𝑓 𝑥𝑖 = 𝑦𝑖 − 𝑓 𝑥𝑖
𝐸 𝑡𝑟𝑎𝑖𝑛 𝑓 = 𝑙𝑜𝑠𝑠 𝑦𝑖 , 𝑓 𝑥𝑖
𝑖=1
▪ We aim to have 𝐸 𝑡𝑟𝑎𝑖𝑛 𝑓 small, i.e., minimize 𝐸 𝑡𝑟𝑎𝑖𝑛 𝑓
▪ We hope that 𝐸 𝑡𝑟𝑎𝑖𝑛 𝑓 , the out-sample error (test/true error), will be small too.
Source: https://towardsdatascience.com/
Source: https://towardsdatascience.com/
Source: https://towardsdatascience.com/
Source: https://towardsdatascience.com/
Source: https://towardsdatascience.com/
Prediction Error
Test error
Training error
▪ Do a model selection.
▪ Use regularization (keep the features but reduce their importance by setting small
parameter values).
𝑙𝑜𝑠𝑠 𝑦𝑖 , 𝑓 𝑥𝑖 +𝐶×𝑅 𝑓
𝑖=1
𝐸𝐷𝑗
𝑗=1
▪ T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining,
Inference, and Prediction (2nd Edition), 2009.