0% found this document useful (0 votes)

18 views41 pages

2.b Applied Machine Learning Secret Sauce - Slides

Uploaded by

Endris H. Ebrahim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views41 pages

2.b Applied Machine Learning Secret Sauce - Slides

Uploaded by

Endris H. Ebrahim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

Machine Learning:

An Applied Econometric Approach

Jann Spiess

based on work with Sendhil Mullainathan

in collaboration with Susan Athey and Niall Keleher

2. The Secret Sauce of Machine Learning

Structure of first chapter of webinar
1. Introduction

Training data Application data

𝑓( (
(𝑦, 𝑥) (𝑦& = 𝑓(𝑥), 𝑥)

2. The Secret Sauce of Machine Learning

3. Prediction vs Estimation
Prediction problem set-up
Given:
• Training data set 𝑦! , 𝑥! , … , 𝑦" , 𝑥" (assume iid)
• Usually called “regression” when 𝑦 continuous, “classification”
when 𝑦 discrete
Econometrics ML
• Loss function ℓ(𝑦,
' 𝑦)
y Outcome variable Label
x Covariate Feature
Goal:
• Prediction function 𝑓) with low average loss (“risk”)
L 𝑓) = 𝐸($,&) ℓ 𝑓) 𝑥 , 𝑦
where (𝑦, 𝑥) distributed same as training
Squared-error loss for regression
“Regression”: Continuous outcome, 𝑦 ∈ ℝ

(
Squared-error loss: 𝐿 𝑓) = 𝐸 𝑓) 𝑥 − 𝑦
' 𝑦 = 𝑦' − 𝑦 ( )
(ℓ 𝑦,

• Predict log house price 𝑦 of a new home from its

characteristics 𝑥 based on survey data from homes with
same distribution (Mullainathan and Spiess, 2017)

• Predict log consumption 𝑦 for a new household 𝑥 based

on data on similar households (Adelman et al.)
Loss measures for classification
“Classification”: Binary outcome, 𝑦 ∈ {0,1}

If prediction itself is binary, 𝑦' ∈ {0,1}:

𝑦=1 𝑦=0
𝑦' = 1 True positive False positive (Type I)
𝑦' = 0 False negative (Type II) True negative

ROC curve source: Aiken et al. (2020)

Standard regression solution
(
Goal: small 𝐸 𝑓) 𝑥 − 𝑦

E.g. use linear functions 𝑓) 𝑥 = 𝛽) ) 𝑥 = 𝛽)* + ∑-+,! 𝛽)+ 𝑥+

• From training data, pick the 𝛽 that provides best in-

sample fit:
( ! " (
min 𝐸 𝑦 − 𝛽) 𝑥
)
à min ∑0,! 𝑦0 − 𝛽) 𝑥0
)
.
/ .
/ "

• Which optimality properties does OLS have?

• Is this optimal for prediction?
Bias–variance decomposition
• Loss at new point 𝑦 = 𝛽 ) 𝑥 + 𝜖 (𝐸 𝜖 𝑥 = 0):
(
( ) ) )
𝑦' − 𝑦 = 𝛽 𝑥 − 𝛽 𝑥 − 𝜖
• Average over draws of training sample (and 𝜖):
$
𝐸!,# [ 𝑦& − 𝑦 $] (
= 𝐸! [ 𝛽 𝑥 − 𝛽 𝑥 ] + 𝐸# [𝜖 $ ]
% %
$
%
= ( −𝛽 𝑥
𝐸! [ 𝛽] + 𝑥 % 𝑉! 𝛽( 𝑥 + 𝑉# (𝜖|𝑥)

bias variance irreducible noise

approximation overfit
• Important framing within econometrics, stats
Approximation–overfit trade-off
Approximation–overfit trade-off

Source: Hastie et al. (2009)

Approximation–overfit trade-off
As model becomes more complex:
1. Fit true function better (approximation)
2. Fit noise better (overfit)

Hence:
1. Flexible functional forms
2. Limit expressiveness (regularization)
Regularization for linear regression
• Rather than OLS "
1 (
min > 𝑦0 − 𝛽) ) 𝑥0
. 𝑛
/
0,!
• Fit constrained problem
! (
min ∑"0,! 𝑦0 − 𝛽) ) 𝑥0 s.t. 𝛽) ≤𝑐
. "
/
)
𝛽1 !
= ∑%"#$ 1'&! (! 𝛽1 $
= ∑%"#$ |𝛽1" | 𝛽1 )
= ∑%"#$ 𝛽1")
) = (𝛽)* , 𝛽)! , … , 𝛽)- )
• Throughout, assume 𝛽′
• Normalize! not penalized
LASSO regression
! ! "" (( -- |)𝛽) |
min ∑∑
min 0,!𝑦 𝑦−
0 −)
𝛽𝛽'
) )
𝑥 𝑥 0 + 𝜆
s.t. ∑∑ +,!| 𝛽 |
+ ≤ 𝑐
./." " 0,! 0 0 +,! +
/

• Selects and shrinks

• “Capitalist” – in doubt give all to one
• Produces sparse solutions

Illustration: Afshine Amidi and Shervine Amidi

LASSO regression
Ridge regression

! " (
min " ∑0,! 𝑦0 − 𝛽' ) 𝑥0 + 𝜆 ∑-+,! 𝛽)+(
.
/

• Shrink towards zero, but never quite

• “Socialist” – in doubt distribute to multiple
• Can be interpreted as Bayesian posterior

Illustration: Afshine Amidi and Shervine Amidi

Regularization for linear regression

Source: Afshine Amidi and Shervine Amidi

Structure of supervised learners
• A function class
• A regularizer
• An optimization algorithm that gets us there
Poverty targeting

Example source: Adelman et al.

Reference point: OLS

Example source: Adelman et al.

Fitted vs actual values in sample

Example source: Adelman et al.

Regression trees

Example source: Adelman et al.

OLS vs tree

Example source: Adelman et al.

How to find optimal tree?

Example source: Adelman et al.

Structure of supervised learners
• A function class
• A regularizer
• An optimization algorithm that gets us there
Choosing regularization parameter
Choosing regularization parameter
• Hold-out: create out-of-sample in-sample
• Cross-validation: create repeated hold-outs

Hence:
1. Flexible functional forms
2. Limit expressiveness (regularization)
3. Learn how much to regularize (tuning)

Illustration: Afshine Amidi and Shervine Amidi

Choosing regularization parameter
• Hold-out: create out-of-sample in-sample
• Cross-validation: create repeated hold-outs

Hence:
1. Flexible functional forms
2. Limit expressiveness (regularization)
3. Learn how much to regularize (tuning)

Illustration: Afshine Amidi and Shervine Amidi

Structure of ML exercise
fitting sample hold-out sample

engineering with econometric guidance econometric guarantees

'
Estimate L(𝑓)

firewall
Obtain a function 𝑓) principle

Illustration: Afshine Amidi and Shervine Amidi

ML basics recap
1. Flexible functional forms
2. Limit expressiveness (regularization)
3. Learn how much to regularize (tuning)

• Important researcher choices:

• Loss function
• Data management/splitting
• Feature representation
• Function class and regularizer
From LASSO to neural nets
Function class Regularizer
Linear LASSO, ridge, elastic net
Decision/regression trees Depth, leaves, leaf size, info
gain
Random forest Trees, variables per tree,
sample sizes, complexity
Nearest neighbors Number of neighbors
Kernel regression Bandwidth
Splines Number of knots, order
Neural nets Layers, sizes, connectivity,
drop-out, early stopping
Regularizing neural nets

Image source: Nielsen (2015)

Model combination: ensembles
𝑓) 𝑥 = 𝑤! 𝑓)! (𝑥) + ⋯ + 𝑤7 𝑓)7 (𝑥)

• Can combine across different model classes

• How to choose weights?
Model combination: bagging / random forest

Illustration: Databricks
Random forest
OLS Tree

Forest
Boosting / boosted trees
• Iteratively fit a simple tree

Source: medium.com/mlreview
Bayesian regularization
• Bayesian methods shrink towards a prior
• Powerful way of constructing regularized predictions,
e.g. ridge regression, Bayesian trees
ML basics recap
1. Flexible functional forms
2. Limit expressiveness (regularization)
3. Learn how much to regularize (tuning)

• Important researcher choices:

• Loss function
• Data management/splitting
• Feature representation
• Function class and regularizer
Implementation: R 0.18

0.15
Log Lambda = −4.69
Log Lambda = −6.46
Log Lambda = −6.65

RMSE
0.12

0.09

−3 −6 −9
Log Lambda (Reverse Scale)
0.18

Validation RMSE
cv_lasso_fit <- cv.glmnet(x = XVars, 0.15

y = house_train$Sale_Price) 0.12

0.09

0 100 200 300

Number of Non−zero Covariates

cv_folds <- vfold_cv(house_train, v = 5) min_n 4 8 12 16

rf_grid <- grid_regular(

mtry(range = c(10, 100)), 0.062
min_n(range = c(4, 20)),
levels = 5
)

RSME
0.061
tune_rf_res <- tune_grid(
tune_wf,
resamples = cv_folds, 0.060

grid = rf_grid
)
0.059
20 40 60 80
mtry
So what is new?
Statistics and econometrics
• Dominance of regularization: James and Stein (1961)
• Random forests: Breiman (2001)
• Non- and semiparametrics, sieve estimation

But still, something has happened

• Data
• Computation
• Functional forms that work
• Prediction focus that turns it into engineering competition
• Some new theoretical insights and developments,
e.g. double descent, deep learning
ML basics recap
1. Flexible functional forms
2. Limit expressiveness (regularization)
3. Learn how much to regularize (tuning)

)
• What do these features imply for the properties of 𝑓?
• And how can we therefore use 𝑓) in applied work?
Structure of first chapter of webinar
1. Introduction

Training data Application data

𝑓( (
(𝑦, 𝑥) (𝑦& = 𝑓(𝑥), 𝑥)

2. The Secret Sauce of Machine Learning

3. Prediction vs Estimation

Core Concepts in Real Analysis
From Everand
Core Concepts in Real Analysis
Roshan Trivedi
No ratings yet
Machine Learning Simplified
100% (1)
Machine Learning Simplified
109 pages
Andrew NG Main - Notes PDF
No ratings yet
Andrew NG Main - Notes PDF
226 pages
ML Main Printing Material
No ratings yet
ML Main Printing Material
241 pages
Cs229-Main Notes Andrew NG and Tengyu Ma
No ratings yet
Cs229-Main Notes Andrew NG and Tengyu Ma
227 pages
Main Notes
No ratings yet
Main Notes
227 pages
Machine Learning Notes Cs229 1
No ratings yet
Machine Learning Notes Cs229 1
217 pages
Main Notes
No ratings yet
Main Notes
227 pages
CS229
No ratings yet
CS229
216 pages
CS229 Andrew NG Lecture Notes
No ratings yet
CS229 Andrew NG Lecture Notes
216 pages
Unit 1.2 Perceptron 2024
No ratings yet
Unit 1.2 Perceptron 2024
107 pages
CS229 Lecture Notes: Andrew NG and Tengyu Ma April 25, 2023
No ratings yet
CS229 Lecture Notes: Andrew NG and Tengyu Ma April 25, 2023
223 pages
Undergraduate Fundamentals of Machine Learning Author William J. Deuschle
No ratings yet
Undergraduate Fundamentals of Machine Learning Author William J. Deuschle
143 pages
Machine Learning
100% (1)
Machine Learning
185 pages
Machine Learning The Basics
No ratings yet
Machine Learning The Basics
158 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Optimization Problems For Machine Learning: A Survey
No ratings yet
Optimization Problems For Machine Learning: A Survey
41 pages
Machine Learning Guide
No ratings yet
Machine Learning Guide
185 pages
Cs229 ML Notes
No ratings yet
Cs229 ML Notes
192 pages
Stanford ML
No ratings yet
Stanford ML
168 pages
BITS F464 ML Lecture Notes
No ratings yet
BITS F464 ML Lecture Notes
86 pages
Berkeley Machine Learning
No ratings yet
Berkeley Machine Learning
185 pages
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
No ratings yet
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
78 pages
July4 SaketAnand FriendlyIntroToML
No ratings yet
July4 SaketAnand FriendlyIntroToML
84 pages
Skript Opt Mach
No ratings yet
Skript Opt Mach
49 pages
Information Theory and Language
100% (1)
Information Theory and Language
246 pages
Mastering The Basics of Machine Learning
No ratings yet
Mastering The Basics of Machine Learning
65 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
31 pages
Six Lectures On NN - Montanari
No ratings yet
Six Lectures On NN - Montanari
77 pages
L2 - Problems in ML & Performance Evaluation
No ratings yet
L2 - Problems in ML & Performance Evaluation
30 pages
ML 3
No ratings yet
ML 3
50 pages
Class 9 After
No ratings yet
Class 9 After
38 pages
ML11 Generalization
No ratings yet
ML11 Generalization
40 pages
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Introml sp24 Lec2
No ratings yet
Introml sp24 Lec2
48 pages
CMPE257 - W2C3 - ML Fundamentals - Part 2
No ratings yet
CMPE257 - W2C3 - ML Fundamentals - Part 2
34 pages
Tutoring Session 2023 - Statistics For Business
No ratings yet
Tutoring Session 2023 - Statistics For Business
65 pages
Advanced Machine Learning: Neural Networks Decision Trees Random Forest Xgboost
No ratings yet
Advanced Machine Learning: Neural Networks Decision Trees Random Forest Xgboost
61 pages
Lec 07-08 - Final
No ratings yet
Lec 07-08 - Final
32 pages
Module 3
No ratings yet
Module 3
35 pages
Cours1 ML
No ratings yet
Cours1 ML
41 pages
Lecture 02
No ratings yet
Lecture 02
43 pages
Lecture16 Crossvalidation
No ratings yet
Lecture16 Crossvalidation
32 pages
ML 01
No ratings yet
ML 01
24 pages
Lecture 19
No ratings yet
Lecture 19
25 pages
Lecture5 FGV
No ratings yet
Lecture5 FGV
25 pages
Mldaf Short
No ratings yet
Mldaf Short
23 pages
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
L11+ Regularization
No ratings yet
L11+ Regularization
25 pages
Handout5 Regularization
No ratings yet
Handout5 Regularization
20 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
L11+ Regularization
No ratings yet
L11+ Regularization
24 pages
Department of Statistics & Operations Research FYUP Course Structure
No ratings yet
Department of Statistics & Operations Research FYUP Course Structure
71 pages
CH 1
No ratings yet
CH 1
24 pages
Introduction To Machine Learning Lecture 2: Linear Regression
No ratings yet
Introduction To Machine Learning Lecture 2: Linear Regression
38 pages
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
From Everand
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
IML Summary
No ratings yet
IML Summary
12 pages
Bookdown Demo PDF
No ratings yet
Bookdown Demo PDF
19 pages
K100 Dis
No ratings yet
K100 Dis
351 pages
Essentials of Linear Regression in Python
No ratings yet
Essentials of Linear Regression in Python
23 pages
ML Models and When To Choose One Over Others
No ratings yet
ML Models and When To Choose One Over Others
7 pages
ML Cheat
No ratings yet
ML Cheat
9 pages
Hypothesis Tests About The Mean and Proportion: Prem Mann, Introductory Statistics, 8/E
No ratings yet
Hypothesis Tests About The Mean and Proportion: Prem Mann, Introductory Statistics, 8/E
99 pages
Documento
No ratings yet
Documento
5 pages
ML Summary PDF
No ratings yet
ML Summary PDF
5 pages
Determinants of Non Performing Loans The
No ratings yet
Determinants of Non Performing Loans The
78 pages
Foundations of Statistical Inference
No ratings yet
Foundations of Statistical Inference
89 pages
Asiya Ahimed MSC Thesis
No ratings yet
Asiya Ahimed MSC Thesis
86 pages
Asnakew Tagele
No ratings yet
Asnakew Tagele
50 pages
Dibekulu A. Theisis Finial
No ratings yet
Dibekulu A. Theisis Finial
61 pages
Rainfall Frequency Analysisusing Gumbel Distribution
No ratings yet
Rainfall Frequency Analysisusing Gumbel Distribution
7 pages
Simple Linear Regression and Correlation
100% (1)
Simple Linear Regression and Correlation
52 pages
Aklog Tegen@2018
No ratings yet
Aklog Tegen@2018
75 pages
UNIT 2-3 - Notes - Unit-2-3-Notes
No ratings yet
UNIT 2-3 - Notes - Unit-2-3-Notes
16 pages
Maximum Entropy On The Mean and The Cramér Rate Function in Statistical Estimation and Inverse Problems: Properties, Models, and Algorithms
No ratings yet
Maximum Entropy On The Mean and The Cramér Rate Function in Statistical Estimation and Inverse Problems: Properties, Models, and Algorithms
50 pages
Questions
No ratings yet
Questions
6 pages
Auislandora 10388 OBJ
No ratings yet
Auislandora 10388 OBJ
45 pages
Abuhan
No ratings yet
Abuhan
73 pages
The Effect of Quality Management Practic
No ratings yet
The Effect of Quality Management Practic
12 pages
Tutorial MixtureModel Brms
No ratings yet
Tutorial MixtureModel Brms
69 pages
Case Problem 3 County Beverage Drive-Thru
No ratings yet
Case Problem 3 County Beverage Drive-Thru
4 pages
Project Managers' Personal Competence and Project Success: February 2024
No ratings yet
Project Managers' Personal Competence and Project Success: February 2024
11 pages
Question Set Varian, Big Data New Tricks For Econometrics"
No ratings yet
Question Set Varian, Big Data New Tricks For Econometrics"
11 pages
Ddi Documentation English - Microdata 436
No ratings yet
Ddi Documentation English - Microdata 436
31 pages
Aashish Yadav Stats Final Practical
No ratings yet
Aashish Yadav Stats Final Practical
41 pages
Bon Et Al 2023 Being Bayesian in The 2020s Opportunities and Challenges in The Practice of Modern Applied Bayesian
No ratings yet
Bon Et Al 2023 Being Bayesian in The 2020s Opportunities and Challenges in The Practice of Modern Applied Bayesian
29 pages
Lecture-7 Classification Using Naive Bays
No ratings yet
Lecture-7 Classification Using Naive Bays
19 pages
Ajol-File-Journals 529 Articles 262237 659ed4b685075
No ratings yet
Ajol-File-Journals 529 Articles 262237 659ed4b685075
24 pages
MC 106 354 395
No ratings yet
MC 106 354 395
42 pages
4 5785386467438104023
No ratings yet
4 5785386467438104023
19 pages
Comparisonbetween Multiple Regressionand Bayesian Regressionwith
No ratings yet
Comparisonbetween Multiple Regressionand Bayesian Regressionwith
18 pages
Chap 04 Sampling and Estimation
No ratings yet
Chap 04 Sampling and Estimation
32 pages
4 Determinants of Electronic Banking Service
No ratings yet
4 Determinants of Electronic Banking Service
23 pages
Viral Load Undetectable
No ratings yet
Viral Load Undetectable
10 pages
Application of Bayes Theorem
No ratings yet
Application of Bayes Theorem
30 pages
Discrete Random Variables and Probability Distributions
No ratings yet
Discrete Random Variables and Probability Distributions
31 pages
Vol23i3p66 78
No ratings yet
Vol23i3p66 78
13 pages
NetCoMi Network Construction and Comparison For
No ratings yet
NetCoMi Network Construction and Comparison For
18 pages
Innovation and Economic Growth An Empirical Analysis For African Countries
No ratings yet
Innovation and Economic Growth An Empirical Analysis For African Countries
11 pages
Social and Political Instability An
No ratings yet
Social and Political Instability An
14 pages
Ajol-File-Journals 439 Articles 267092 65f82104dc3f2
No ratings yet
Ajol-File-Journals 439 Articles 267092 65f82104dc3f2
14 pages
Research Proposal Form Template 3
No ratings yet
Research Proposal Form Template 3
9 pages
Actuarial CT6 Statistical Methods Sample Paper 2011 by ActuarialAnswers
No ratings yet
Actuarial CT6 Statistical Methods Sample Paper 2011 by ActuarialAnswers
10 pages
Chapter 6 - Binomial Distribution
100% (1)
Chapter 6 - Binomial Distribution
4 pages
Tutorial 7 Solutions - 2021
No ratings yet
Tutorial 7 Solutions - 2021
7 pages
Stats and Prob Forced To Do This
No ratings yet
Stats and Prob Forced To Do This
4 pages
Journal Pone 0270758 s002
No ratings yet
Journal Pone 0270758 s002
6 pages
20230306T163635136 Att 6168053833232965
No ratings yet
20230306T163635136 Att 6168053833232965
8 pages
Ijsedu 20150306 12
No ratings yet
Ijsedu 20150306 12
6 pages
STAT 302 Midterm 1
No ratings yet
STAT 302 Midterm 1
10 pages
Econometrics Assignment 3
No ratings yet
Econometrics Assignment 3
4 pages
Regression Assignment
No ratings yet
Regression Assignment
7 pages
ISE3001
No ratings yet
ISE3001
3 pages
Contoh Reliability
No ratings yet
Contoh Reliability
13 pages
Week 5 Stat Methods and Data Analysis
No ratings yet
Week 5 Stat Methods and Data Analysis
2 pages
Thesis Questinnire Loyalty DA
No ratings yet
Thesis Questinnire Loyalty DA
4 pages
3rd ME Probability and Statistics
No ratings yet
3rd ME Probability and Statistics
3 pages
5b Criteria For Recognising Merit - Associate Professors
No ratings yet
5b Criteria For Recognising Merit - Associate Professors
2 pages
Journal Pone 0271771 g001
No ratings yet
Journal Pone 0271771 g001
1 page

2.b Applied Machine Learning Secret Sauce - Slides

Uploaded by

2.b Applied Machine Learning Secret Sauce - Slides

Uploaded by

Machine Learning:

An Applied Econometric Approach

based on work with Sendhil Mullainathan

in collaboration with Susan Athey and Niall Keleher

2. The Secret Sauce of Machine Learning

Training data Application data

2. The Secret Sauce of Machine Learning

• Predict log house price 𝑦 of a new home from its

• Predict log consumption 𝑦 for a new household 𝑥 based

If prediction itself is binary, 𝑦' ∈ {0,1}:

ROC curve source: Aiken et al. (2020)

E.g. use linear functions 𝑓) 𝑥 = 𝛽) ) 𝑥 = 𝛽)* + ∑-+,! 𝛽)+ 𝑥+

• From training data, pick the 𝛽 that provides best in-

• Which optimality properties does OLS have?

bias variance irreducible noise

Source: Hastie et al. (2009)

• Selects and shrinks

Illustration: Afshine Amidi and Shervine Amidi

• Shrink towards zero, but never quite

Illustration: Afshine Amidi and Shervine Amidi

Source: Afshine Amidi and Shervine Amidi

Example source: Adelman et al.

Example source: Adelman et al.

Example source: Adelman et al.

Example source: Adelman et al.

Example source: Adelman et al.

Example source: Adelman et al.

Illustration: Afshine Amidi and Shervine Amidi

Illustration: Afshine Amidi and Shervine Amidi

engineering with econometric guidance econometric guarantees

Illustration: Afshine Amidi and Shervine Amidi

• Important researcher choices:

Image source: Nielsen (2015)

• Can combine across different model classes

• Important researcher choices:

0 100 200 300

cv_folds <- vfold_cv(house_train, v = 5) min_n 4 8 12 16

rf_grid <- grid_regular(

But still, something has happened

Training data Application data

2. The Secret Sauce of Machine Learning

You might also like