2.b Applied Machine Learning Secret Sauce - Slides
2.b Applied Machine Learning Secret Sauce - Slides
3. Prediction vs Estimation
Prediction problem set-up
Given:
• Training data set 𝑦! , 𝑥! , … , 𝑦" , 𝑥" (assume iid)
• Usually called “regression” when 𝑦 continuous, “classification”
when 𝑦 discrete
Econometrics ML
• Loss function ℓ(𝑦,
' 𝑦)
y Outcome variable Label
x Covariate Feature
Goal:
• Prediction function 𝑓) with low average loss (“risk”)
L 𝑓) = 𝐸($,&) ℓ 𝑓) 𝑥 , 𝑦
where (𝑦, 𝑥) distributed same as training
Squared-error loss for regression
“Regression”: Continuous outcome, 𝑦 ∈ ℝ
(
Squared-error loss: 𝐿 𝑓) = 𝐸 𝑓) 𝑥 − 𝑦
' 𝑦 = 𝑦' − 𝑦 ( )
(ℓ 𝑦,
Hence:
1. Flexible functional forms
2. Limit expressiveness (regularization)
Regularization for linear regression
• Rather than OLS "
1 (
min > 𝑦0 − 𝛽) ) 𝑥0
. 𝑛
/
0,!
• Fit constrained problem
! (
min ∑"0,! 𝑦0 − 𝛽) ) 𝑥0 s.t. 𝛽) ≤𝑐
. "
/
)
𝛽1 !
= ∑%"#$ 1'&! (! 𝛽1 $
= ∑%"#$ |𝛽1" | 𝛽1 )
= ∑%"#$ 𝛽1")
) = (𝛽)* , 𝛽)! , … , 𝛽)- )
• Throughout, assume 𝛽′
• Normalize! not penalized
LASSO regression
! ! "" (( -- |)𝛽) |
min ∑∑
min 0,!𝑦 𝑦−
0 −)
𝛽𝛽'
) )
𝑥 𝑥 0 + 𝜆
s.t. ∑∑ +,!| 𝛽 |
+ ≤ 𝑐
./." " 0,! 0 0 +,! +
/
! " (
min " ∑0,! 𝑦0 − 𝛽' ) 𝑥0 + 𝜆 ∑-+,! 𝛽)+(
.
/
Hence:
1. Flexible functional forms
2. Limit expressiveness (regularization)
3. Learn how much to regularize (tuning)
Hence:
1. Flexible functional forms
2. Limit expressiveness (regularization)
3. Learn how much to regularize (tuning)
'
Estimate L(𝑓)
firewall
Obtain a function 𝑓) principle
Illustration: Databricks
Random forest
OLS Tree
Forest
Boosting / boosted trees
• Iteratively fit a simple tree
Source: medium.com/mlreview
Bayesian regularization
• Bayesian methods shrink towards a prior
• Powerful way of constructing regularized predictions,
e.g. ridge regression, Bayesian trees
ML basics recap
1. Flexible functional forms
2. Limit expressiveness (regularization)
3. Learn how much to regularize (tuning)
0.15
Log Lambda = −4.69
Log Lambda = −6.46
Log Lambda = −6.65
RMSE
0.12
0.09
−3 −6 −9
Log Lambda (Reverse Scale)
0.18
Validation RMSE
cv_lasso_fit <- cv.glmnet(x = XVars, 0.15
y = house_train$Sale_Price) 0.12
0.09
RSME
0.061
tune_rf_res <- tune_grid(
tune_wf,
resamples = cv_folds, 0.060
grid = rf_grid
)
0.059
20 40 60 80
mtry
So what is new?
Statistics and econometrics
• Dominance of regularization: James and Stein (1961)
• Random forests: Breiman (2001)
• Non- and semiparametrics, sieve estimation
)
• What do these features imply for the properties of 𝑓?
• And how can we therefore use 𝑓) in applied work?
Structure of first chapter of webinar
1. Introduction
3. Prediction vs Estimation