Lec-01-Introduction to Statistical Learning
Lec-01-Introduction to Statistical Learning
Learning
Dr. Sayak Roychowdhury
Department of Industrial and Systems Engineering,
IIT Kharagpur
References
Statistical Learning
• IBM Watson wins Jeopardy in 2011 (a Predicting Elections
game show)
https://www.youtube.com/watch?v=P1
8EdAKuC1U
Regression problem
Stock Market Prediction
Classification Problem
Gene Expression Data
𝑓 5 = 𝐸(𝑌|𝑋 = 5)
Regression Function
• The ideal 𝑓 𝑥 = 𝐸(𝑌|𝑋 = 𝑥) is called the regression function
• It is also defined for a vector 𝑋
𝑓 𝑥 = 𝐸(𝑌|𝑋1 = 𝑥1 , 𝑋2 = 𝑥2 )
• 𝑓 𝑥 = 𝐸(𝑌|𝑋 = 𝑥) is called the optimal 2
predictor because it minimizes
the mean square error 𝐸 𝑌 − 𝑞 𝑋 𝑋 = 𝑥] over all the functions 𝑞 at
𝑋=𝑥
• 𝜖 = 𝑌 − 𝑓(𝑥) is irreducible error
መ
• For any estimate 𝑓(𝑥) of 𝑓(𝑥), the expected prediction error
2 2
𝐸 𝑌−𝑓 𝑥 መ መ
𝑋 = 𝑥 = 𝑓 𝑥 − 𝑓(𝑥) + Var(𝜖)
Reducible Irreducible
How to estimate 𝑓መ
• No observation at 𝑋 = 12.5
• One way to approximate is to
• select a neighbourhood ℵ 𝑥
• 𝑓መ 𝑥 = 𝐴𝑣𝑔(𝑌|𝑋 ∈ ℵ 𝑥 )
Supervised
Learning
ANN
K nearest
neighbours
Thin-plate spline
Non-parametric
SVM
Tree-based
methods
Parametric Methods
• Parametric methods have model-based approach
• Step 1: Assume a functional form or shape of 𝑓. E.g. linear model
𝑓 𝑋 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + ⋯ + 𝛽𝑝 𝑋𝑝
To estimate 𝑓, we only need to estimate 𝑝 parameters, 𝛽0 , 𝛽1 … , 𝛽𝑝
rough thin-plate spline fit to the Income data makes zero errors
on the training data.
Flexibility and Interpretability
Linear regression is a relatively inflexible
approach, because it can only generate
linear functions.
• 𝐵𝑖𝑎𝑠 𝑓መ 𝑥0 = 𝐸 𝑓መ 𝑥0 − 𝑓(𝑥0 )
• The expectation averages over the variability of 𝑦0 as well as the variability in the
training dataset
• Typically, as 𝑓መ becomes more flexible, the variability increases and the bias
decreases
• Choosing flexibility based on average test-error amounts to bias-variance trade-
off
Bias Variance Trade-off