0% found this document useful (0 votes)

12 views

DDMA05_ModelSelection

Marketing analytics lecture note

Uploaded by

sqade20

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

DDMA05_ModelSelection

Marketing analytics lecture note

Uploaded by

sqade20

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Model Selection

Inseong Song, SNU Business School

Predictive vs. Descriptive Modeling

• In common usage, prediction means to forecast a future event. In data science,

prediction more generally means to estimate an unknown value. The value could be
something in the future, in the present, or even in the past.

– Predictive models for credit scoring estimate the likelihood that a potential
customer will default.
– Predictive models for spam filtering estimate whether a given piece of email is
spam.
– Predictive models for fraud detection judge whether an account has been
defrauded.
• This is in contrast to descriptive modeling, where the primary purpose is to gain
insight into the underlying phenomenon or process.

– A descriptive model of churn would tell us what the churning-customers look like.

© Inseong Song, SNU Business School

• A descriptive model must be judged in part on its intelligibility, and less accurate
model may be preferred if it is easier to understand. So researchers’ preference for
model parsimony is implemented by adding penalties for model complexity.

• A predictive model may be judged solely on its predictive performance, although

intelligibility is nonetheless important.
• However, the difference between these model types is not so strict. Sometimes
much of the value of a predictive model is in the understanding gained from looking
at it rather than in predictions it makes.

© Inseong Song, SNU Business School

Variable Selection as model selection
• Typical models
Y = f(X1,..., Xk) + ε
Y: variable (=value) being predicted (e.g., sales volume; customer responses;
customer life time value; ...)
X: potential predictor variables
K: number of variables including the intercept

• Why not all available variables (say 300 variables)?

– Computation time (especially in nonlinear models)
– Feasibility (there would no observation in most nodes in a decision tree model
with 300 predictors)
– Overfitting: the calibration can be done perfectly, but the result may not apply
well to other comparable data set
– Interpretation: hard to interpret the effects of 300 variables, difficult to
communicate, and meaningless in finding strategic insights

© Inseong Song, SNU Business School

All-Possible Subset Regression

• Regressions with all possible subset of predictor variables: e.g., if there are N
variables, then we need to run 2N regressions
• Find the best regression model out of 2N regression outputs, based on evaluation
criteria such as adjusted-R2, AIC, or BIC .
"#$
Adjusted 𝑅 ! = 1 − (1 − 𝑅 ! ),
"#%
AIC = −2 log 𝐿 + 2𝑘,
BIC = −2 log 𝐿 + 𝑘 ∗ log 𝑇
T: the number of observations,
k: the number of predictor variables included,
log 𝐿: log likelihood
• This approach is impractical when N is large: e.g., N=50, 250=1.16 x 1015 possible
models

© Inseong Song, SNU Business School

Stepwise Selection

• Find the optimal set of predictor variables utilizing both forward selection with
backward elimination
• Forward Selection
– Estimate models with one predictor variable (N such models)
– The variable with the largest F-statistic becomes the candidate
– If the candidate’s F is larger than a predetermined level F0, then the variable
associated with this F is added in the predictor set (otherwise the process stop with
no variable in the model)
– If not stop, run regressions with two predictor variables, the selected variable (say
X1) and one (say Xk) of the remaining N-1 variables (so N-1 regression models) –
Compute partial-F: a statistic testing bk=0 when both X1 and Xk are in the model. If
the largest F value exceeds a predetermined level F0, then the variable with this F is
added. Otherwise stop.
– Repeat this process
– Forward selection continues until no further predictor can be added

© Inseong Song, SNU Business School

• Backward Elimination
– Estimate a model with all N variables are included
– Compute partial-F for each of the variable in the predictor set – the variable
with the smallest partial F is the candidate for deletion
&&'! #&&'" &&'"
– partial F for Xk 𝐹 = 1
()! #()" ()"
SSE2: sum of squared errors of larger model (Xk is included)
SSE1: sum of squared errors of smaller model (Xk is not included)
df: degree of freedom for error terms

– If the candidate’s partial-F is smaller than the predetermined value F1, then the
variable is removed. Otherwise stop.
– Continue backward elimination until no further variable can be eliminated.

© Inseong Song, SNU Business School

• Combination
– Begin with forward selection.
– When 2 variables are included, apply backward elimination.
– Apply forward-backward scheme until no further variable can be added
or removed. (some variables could be included in early stage and to be
dropped subsequently)
• Stepwise selection is computationally efficient. But it could end up a
suboptimal solution due to its algorithm characteristic (sequential search)

© Inseong Song, SNU Business School

Evaluation of Model
• Assess if the developed model can explain data well and also predict well
• The model should be good at explaining the data used in calibration. Moreover the
model should also work well for the another data.
• Partition of samples into subgroups
– Training Set (Calibration sample): observations used in model estimation
– Validation set: observations to which estimation results are applied, to select
the best performing model (model selection)
– Test set: observations to which estimation results are applied, to assess the
performance of the selected model (model assessment)
– A large calibration sample may lead to an accurate estimation. But such
improvement in accuracy becomes smaller as the sample size continues to
increase.
– A large validation sample makes the model comparison easy.
→ Since the size of the whole dataset is fixed, we have a trade-off between
calibration/validation samples.

© Inseong Song, SNU Business School

Total Data

Training (Calibration) Validation Test

Used to fit the models

Used to estimate prediction
error for model selection

Used for assessment of the

generalization error of the final
chosen model

© Inseong Song, SNU Business School

Loss Function

• 2
Consider a target variable (Y), a vector of inputs (X), and a prediction model (𝑓(𝑋))
that has been estimated from a training set.
#
𝑌, 𝑓$ 𝑋 squared error
• Loss function 𝐿 𝑌, 𝑓$ 𝑋 =(
𝑌, 𝑓$ 𝑋 absolute error

• Various types of loss functions exist (depending upon the nature of the data).
• Test error, also referred to as generalization error, is the prediction error over an
independent test sample
• Training error is the average loss over the training sample. Unfortunately, training
error is not a good estimate of the test error

© Inseong Song, SNU Business School

Bias-Variance Decomposition

• Assume 𝑌 = 𝑓 𝑋 + 𝜀 where 𝐸 𝜀 = 0 and 𝑉𝑎𝑟 𝜀 = 𝜎*! .

• 2
Expected prediction error of a regression fit 𝑓(𝑋) at an input point 𝑋 = 𝑥+ using
squared-error loss

Variance of the target Bias: difference between Variance: the expected

around its true mean the average of my ! !)
squared deviation of 𝑓(𝑥
(cannot be avoided) estimate and true mean around its mean

2 the lower the (squared) bias but the

• The more complex we make the model 𝑓,
higher the variance

© Inseong Song, SNU Business School

Training error, Test error, and Model Complexity
(Hastie, Tibshirani, Friedman 2013)

Test error

Training error

© Inseong Song, SNU Business School

• Suppose your model does not perform well. (say, large validation/test errors).
What to do with your model? Simpler or more complex?
• Case 1: validation(test) error >> training error (training error is small)
– Then your model is overfitting. Make it simpler.
• Case 2: validation(test) error ≈ training error (training error is large)
– Then it is underfitting. A more complex model may help

Case 2

Case 1

© Inseong Song, SNU Business School

Simple Example
• n=20, training=14, test=6
• True model: 𝑦 = 𝛽+ + 𝛽$ 𝑥 + 𝛽! 𝑥 ! + 𝛽, 𝑥 , + 𝜀 with 𝛽 - = 6,9, −6,1
– simulate 𝑥~𝑈(0,5), 𝜀~𝑁(0,1), then we have data (𝑦, 𝑥)
• Estimate regression model 𝑦 = 𝜃+ + 𝜃$ 𝑥 + ⋯ + 𝜃% 𝑥 % + 𝜖 for 𝑘 = 1, ⋯ , 7 using
training data
• Predict 𝑦 in test data based on the regression results

© Inseong Song, SNU Business School

• Mean squared Errors

k Training Error Test Error

underfit 1 2.41 9.02
2 1.41 6.03
true model 3 0.93 1.44
4 0.62 10.70
5 0.43 187.15
overfit
6 0.43 82.80
7 0.41 3514.03

© Inseong Song, SNU Business School

Calibration vs. Validation Sample
① Holdout method
• Partition the data into two mutually exclusive subgroups: calibration vs. holdout
• How much to allocate on calibration?
– Typically, 2/3 for calibration. For small data (n<100), maybe 3/4.
• Assigning each observation to a group: through randomization
• When the data set is large, hold-out method should be okay.
• When we have a small data set, the inefficiency from reserving a large portion of
samples matters in the hold-out method.

② K-Fold Cross-validation
• Partition randomly the data into K equal-sized subsets
• We estimate/ validate the model K times: In each turn, use K-1 subsets as
calibration samples, one remaining subset as validation sample.
• Models are assessed based on the average validation error across K predictions
• Rule of thumb: K = 10 or 20

© Inseong Song, SNU Business School

③ Leave-One-Out method
• Same idea as K-Fold cross-validation, but K=Number of Observations

④ Bootstrap
• When the size of data is n, select sample of size n “with replacement”
• Same observations can be included more than once.
• It is known that bootstrapping performs better than cross-validation when the size
of the data is small

© Inseong Song, SNU Business School

Evaluation Criteria (Choice of Loss Function)

• Criterion : “ goodness-of-fit” or “badness-of-fit” – distance between what really

happened and what the model predicts to happen
• The way in quantifying “goodness-of-fit” depends on the purpose of the model
and on the nature of the dependent variables.

① Continuous Variable
• mean squared errors(MSE)
∑0./$ 𝑒.! ⁄𝑛 = ∑0./$ 𝑦. − 𝑦O. ! ⁄𝑛
• mean absolute deviation(MAD)
∑0./$ 𝑒. ⁄𝑛 = ∑0./$ 𝑦. − 𝑦O. ⁄𝑛
• mean absolute percentage deviation(MAPD)
1$ #12 $
∑0./$ ⁄𝑛 ×100%
1$

© Inseong Song, SNU Business School

② Discrete Dependent Variable
• When Yi takes on discrete values, typically the prediction is expressed in probability
term, Pr(Yi =1)
• Prediction: First determine the cut=off value “c” beforehand
1, Pr 𝑦 = 1 > 𝑐
𝑦O = R
0, Pr 𝑦 = 1 ≤ 𝑐

실제 자료
0 1
0 으로 예측 𝑡" 𝑓"
모형의 예측
1 로 예측 𝑓# 𝑡#

– true positive
– true negative
– false positive
– false negative

© Inseong Song, SNU Business School

1, if prediction is correct
• Hit Ratio = ∑0./$ 𝐻. ⁄𝑛, 𝐻. = R
0, if not
3% 43&
– So, Hit Ratio =
3% 43& 4)% 4)&

• F1 accuracy measure (harmonic mean of precision and recall)

3%
– Precision =
3% 4)%
3%
– Recall =
3% 4)&

Precision×Recall
– F1 = 2 ⋅
Precision4Recall
• Predictive Likelihood
1 $#1$
∏0./$ 𝑃h. $ 1 − 𝑃h.
• Predictive Log Likelihood
∑0./$ 𝑦. log 𝑃h. + 1 − 𝑦. log 1 − 𝑃h.
• Note: We prefer larger Hit Ratio, F1, Log Likelihood. So when it comes to loss, for
example, you need to minimize the negative of log likelihood.

Regularization: Avoiding Overfitting for Parameter Optimization

• Avoiding overfitting involves complexity control: “right” balance between

the fit to the data and the complexity of the model
• The general strategy for complexity control is that instead of just
optimizing the fit to the data, we optimize some combination of fit and
simplicity: Models will be better if they fit the data better, but they also
will be better if they are simpler. This general methodology is called
regularization.
• Complexity control via regularization works by adding a penalty for
complexity
arg max &it 𝑥, 𝑤 − 𝜆 ⋅ penalty(𝑤)
4
where 𝑤 refers to the model and 𝜆 is the importance weight on penalty.

Shrinkage Methods

• Subset selection procedure is a discrete process – variables are either

retained or discarded – it often exhibits high variances and so doesn’t
reduce the prediction error of the full model. Shrinkage methods are
more continuous and don’t suffer as much from high variability.
• Ridge regression
• Lasso
• Elastic net

Ridge regression

!
𝛽2 6.(78 = arg min ∑0./$ 𝑦. − 𝛽+ − ∑%:/$ 𝑥.: 𝛽: + 𝜆 ∑%:/$ 𝛽:!
9

• The larger the value of 𝜆, the greater the amount of shrinkage

• Ridge regression could be a solution for obtaining an estimate when the design
matrix is not a full rank matrix (i.e, multicollinearity).

• The ridge solutions are not equivariant under scaling of the inputs and so one
normally standardizes the inputs before solving the minimization
• In addition, the intercept has been left out of the penalty. So use
reparametrization using centered inputs (𝑥.: − 𝑥l: ) and estimate 𝛽+ = 𝑦.
m

• Then, after centering, the input matrix has k (rather than k+1) columns.
𝑅𝑆𝑆 𝜆 = 𝑦 − 𝑋𝛽 " 𝑦 − 𝑋𝛽 + 𝜆𝛽 " 𝛽

𝛽2 6.(78 = 𝑋 " 𝑋 + 𝜆𝐼 #$ 𝑋 " 𝑦

Lasso

$ 0 !
𝛽2 ;<==> = arg min ∑ 𝑦. − 𝛽+ − ∑%:/$ 𝑥.: 𝛽: + 𝜆 ∑%:/$ |𝛽: |
9 ! ./$

• The 𝐿$ lasso penalty makes the solutions nonlinear and there is no closed form
expression (unlike ridge)
• Computing lasso solution is a quadratic programming problem. (Practically, the
computational burden for lasso is comparable to that of ridge.)

• R package ‘lars’ (check the package manual, it can do least angle regression,
lasso, and forward stagewise regression)

Elastic Net

* ( +
𝛽!!"#$%&' (!% = arg min ∑
+ &,*
𝑦& − 𝛽- − ∑/.,* 𝑥&. 𝛽. + 𝜆 ∑/.,* 𝛼 𝛽. + 1 − 𝛼 𝛽.+
)

• Elastic net is a compromise between ridge and lasso.

• The parameter 𝛼 determines the mix of the penalties, and is often pre-chosen on
qualitative ground.

Inseong Song, SNU Business School

Implementing Ridge regression

!
2
Estimation 𝛽(𝜆) = arg min ∑0./$ 𝑦. − 𝛽+ − ∑%:/$ 𝑥.: 𝛽: + 𝜆 ∑%:/$ 𝛽:!
9

!
Validation Error for model evaluation ∑04? 2 % q
./04$ 𝑦. − 𝛽+ − ∑:/$ 𝑥.: 𝛽:

• Consider multiple candidate for 𝜆, say {0, 0.01, 0.02,0.04, 0.08, …., 10}. Prepare the
training data and validation/test data.
• For each value of candidate for 𝜆 , estimate 𝛽2 6.(78 (𝜆) and then compute the
validation error.

• Find the value of for 𝜆 at which the validation error is minimized.

• Then compute the test error for that value of 𝜆.

• R package ‘ridge’

Bias-Variance in Shrinkage methods

!
2
Consider ridge 𝛽(𝜆) = arg min ∑0./$ 𝑦. − 𝛽+ − ∑%:/$ 𝑥.: 𝛽: + 𝜆 ∑%:/$ 𝛽:!
9

• If 𝜆 = 0, we are including all possible variables. So Overfitting. (High variance)

• If 𝜆 is a very large value, then it is very likely that 𝛽: = 0 for many j. So underfitting
(High bias)

• So there should be intermediate value of 𝜆 that optimally trade-offs between them.

High variance High bias

Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Statistics in Nursing Research
94% (17)
Statistics in Nursing Research
33 pages
Lecture 4 - Bias-Variance Trade-Off and Model Selection
No ratings yet
Lecture 4 - Bias-Variance Trade-Off and Model Selection
66 pages
Introduction To Statistical Learning: With Applications in R
No ratings yet
Introduction To Statistical Learning: With Applications in R
13 pages
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
From Everand
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
Joseph George Caldwell
No ratings yet
Olympic History: Athletes and Results Data Analysis
No ratings yet
Olympic History: Athletes and Results Data Analysis
6 pages
A Method of Analysing Interview Transcripts
No ratings yet
A Method of Analysing Interview Transcripts
6 pages
Peer Teaching
No ratings yet
Peer Teaching
29 pages
Diagnostic Tests2
No ratings yet
Diagnostic Tests2
25 pages
STA302 Week12 Full
No ratings yet
STA302 Week12 Full
30 pages
Module07 - Model Selection and Regularization
No ratings yet
Module07 - Model Selection and Regularization
46 pages
Jkkklphftbbhuii
No ratings yet
Jkkklphftbbhuii
17 pages
Lecture 31-36
No ratings yet
Lecture 31-36
44 pages
Hair PPT Ch05
No ratings yet
Hair PPT Ch05
18 pages
0 Regularization PDF
No ratings yet
0 Regularization PDF
88 pages
Week 10_Lecture 10
No ratings yet
Week 10_Lecture 10
59 pages
Model Selection-Handout PDF
No ratings yet
Model Selection-Handout PDF
57 pages
datamining unit4
No ratings yet
datamining unit4
21 pages
SRM Notes
No ratings yet
SRM Notes
38 pages
ML Notes (Module-3)
No ratings yet
ML Notes (Module-3)
21 pages
AI & ML Notes
No ratings yet
AI & ML Notes
22 pages
Lecture 5 Model Selection I: STAT 441: Statistical Methods For Learning and Data Mining
No ratings yet
Lecture 5 Model Selection I: STAT 441: Statistical Methods For Learning and Data Mining
17 pages
Data Science Module 5 q & A
No ratings yet
Data Science Module 5 q & A
8 pages
m2 Data analytic and visualization
No ratings yet
m2 Data analytic and visualization
53 pages
Lecture 19
No ratings yet
Lecture 19
25 pages
3rd Module EDBA Contiuation1
No ratings yet
3rd Module EDBA Contiuation1
6 pages
Accuracy Measures
No ratings yet
Accuracy Measures
61 pages
Lecture 21: Model Selection 1 Choosing Models
No ratings yet
Lecture 21: Model Selection 1 Choosing Models
14 pages
Lesson 5 Model Selection
No ratings yet
Lesson 5 Model Selection
41 pages
Model Selection and Model Validation
No ratings yet
Model Selection and Model Validation
36 pages
ML_AI
No ratings yet
ML_AI
53 pages
Model Selection NEW
No ratings yet
Model Selection NEW
24 pages
Chap2-Some Unique Features of Data Science Projects
No ratings yet
Chap2-Some Unique Features of Data Science Projects
44 pages
Aiml Unit 3
No ratings yet
Aiml Unit 3
9 pages
5 CV Boot-Handout PDF
No ratings yet
5 CV Boot-Handout PDF
44 pages
Prediction---accuracy
No ratings yet
Prediction---accuracy
33 pages
Data Mining Models and Evaluation Techniques
No ratings yet
Data Mining Models and Evaluation Techniques
59 pages
Theory in Machine Learning
No ratings yet
Theory in Machine Learning
60 pages
Chapter 06 Linear Reg
No ratings yet
Chapter 06 Linear Reg
24 pages
Classification
No ratings yet
Classification
4 pages
Features Election
No ratings yet
Features Election
18 pages
Week 4 - Intro to ML
No ratings yet
Week 4 - Intro to ML
37 pages
Machine learning
No ratings yet
Machine learning
62 pages
ML 2024 Part1 CrossValidation
No ratings yet
ML 2024 Part1 CrossValidation
43 pages
Module 3 - ML
No ratings yet
Module 3 - ML
101 pages
EDA Module 2
No ratings yet
EDA Module 2
28 pages
ISLR
No ratings yet
ISLR
9 pages
Module 4
No ratings yet
Module 4
33 pages
Statlearn PDF
No ratings yet
Statlearn PDF
123 pages
ML - Module 5
No ratings yet
ML - Module 5
80 pages
Chapter 1. Elements in Predictive Analytics
No ratings yet
Chapter 1. Elements in Predictive Analytics
66 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
73 pages
Prediction & Forecasting: Regression Analysis
No ratings yet
Prediction & Forecasting: Regression Analysis
3 pages
Performance Evaluation
No ratings yet
Performance Evaluation
29 pages
Model Selection On ML
No ratings yet
Model Selection On ML
49 pages
Module 5 Advanced Classification Techniques
No ratings yet
Module 5 Advanced Classification Techniques
40 pages
Briefly Explain The Trade-Offs Associated Between The Model Variance Versus Bias-Squared To Inform Model Selection
No ratings yet
Briefly Explain The Trade-Offs Associated Between The Model Variance Versus Bias-Squared To Inform Model Selection
7 pages
L3 Model Selection Diagnostics
No ratings yet
L3 Model Selection Diagnostics
75 pages
Predictive Analytics (2)
No ratings yet
Predictive Analytics (2)
46 pages
MI_Unit 5
No ratings yet
MI_Unit 5
72 pages
Ds Module 4
No ratings yet
Ds Module 4
73 pages
Regression
No ratings yet
Regression
45 pages
Classical Machine Learning: Linear Regression: Ramesh S
No ratings yet
Classical Machine Learning: Linear Regression: Ramesh S
28 pages
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
1 Population and Sample
No ratings yet
1 Population and Sample
9 pages
描述性注释
100% (1)
描述性注释
8 pages
Managerial Economics in A Global Economy: Bab 5 Peramalan Permintaan (Demand Forecasting
No ratings yet
Managerial Economics in A Global Economy: Bab 5 Peramalan Permintaan (Demand Forecasting
14 pages
Fiscal Sustainability Interim Report
100% (1)
Fiscal Sustainability Interim Report
26 pages
Fundamentals of Big Data and Business Analytics Answers
No ratings yet
Fundamentals of Big Data and Business Analytics Answers
20 pages
Linear Model Small PDF
No ratings yet
Linear Model Small PDF
15 pages
Analysis On Factors That Affect Stock Prices: A Study On Listed Cement Companies at Dhaka Stock Exchange
No ratings yet
Analysis On Factors That Affect Stock Prices: A Study On Listed Cement Companies at Dhaka Stock Exchange
21 pages
Data Analyst Interview Questions
No ratings yet
Data Analyst Interview Questions
7 pages
Midterm Examination Spring 2015: Answer
No ratings yet
Midterm Examination Spring 2015: Answer
22 pages
IMRD Research Template
100% (1)
IMRD Research Template
2 pages
NCI sop
No ratings yet
NCI sop
2 pages
Prediction Reliability of QSAR Models An
No ratings yet
Prediction Reliability of QSAR Models An
17 pages
Statistics For Data Analysis Lec 1 Introduction and Visualization
No ratings yet
Statistics For Data Analysis Lec 1 Introduction and Visualization
8 pages
Data Exploration and Visualization
100% (1)
Data Exploration and Visualization
281 pages
Data-Science Project Life Cycle
No ratings yet
Data-Science Project Life Cycle
3 pages
Stat Is Tika
No ratings yet
Stat Is Tika
44 pages
Anova
No ratings yet
Anova
105 pages
RIVOIRARD - Cours - 00312 (Introduction To Disjunctive Kriging and Non Geostatistics)
No ratings yet
RIVOIRARD - Cours - 00312 (Introduction To Disjunctive Kriging and Non Geostatistics)
98 pages
Long Formal Reports: Chapter: Twelve
No ratings yet
Long Formal Reports: Chapter: Twelve
29 pages
Real Research: Research Methods Sociology Students Can Use Liahna E. Gordon - Download the ebook today and experience the full content
100% (1)
Real Research: Research Methods Sociology Students Can Use Liahna E. Gordon - Download the ebook today and experience the full content
60 pages
P 1168 Grievance Handling Project Report
100% (2)
P 1168 Grievance Handling Project Report
52 pages
cheat sheet
No ratings yet
cheat sheet
2 pages
Data Analyst Job Profile - Prospects - Ac.uk
No ratings yet
Data Analyst Job Profile - Prospects - Ac.uk
6 pages
Statistics Practice Set
No ratings yet
Statistics Practice Set
6 pages
Sample Thesis Results Section
100% (1)
Sample Thesis Results Section
8 pages
Connect Texture Analyser Brochure – Plus models - medium resolution
No ratings yet
Connect Texture Analyser Brochure – Plus models - medium resolution
16 pages

DDMA05_ModelSelection

Uploaded by

DDMA05_ModelSelection

Uploaded by

Model Selection

Inseong Song, SNU Business School

• In common usage, prediction means to forecast a future event. In data science,

© Inseong Song, SNU Business School

• A predictive model may be judged solely on its predictive performance, although

© Inseong Song, SNU Business School

• Why not all available variables (say 300 variables)?

© Inseong Song, SNU Business School

© Inseong Song, SNU Business School

© Inseong Song, SNU Business School

© Inseong Song, SNU Business School

© Inseong Song, SNU Business School

© Inseong Song, SNU Business School

Training (Calibration) Validation Test

Used to fit the models

Used for assessment of the

© Inseong Song, SNU Business School

© Inseong Song, SNU Business School

• Assume 𝑌 = 𝑓 𝑋 + 𝜀 where 𝐸 𝜀 = 0 and 𝑉𝑎𝑟 𝜀 = 𝜎*! .

Variance of the target Bias: difference between Variance: the expected

2 the lower the (squared) bias but the

© Inseong Song, SNU Business School

© Inseong Song, SNU Business School

© Inseong Song, SNU Business School

© Inseong Song, SNU Business School

k Training Error Test Error

© Inseong Song, SNU Business School

© Inseong Song, SNU Business School

© Inseong Song, SNU Business School

• Criterion : “ goodness-of-fit” or “badness-of-fit” – distance between what really

© Inseong Song, SNU Business School

© Inseong Song, SNU Business School

• F1 accuracy measure (harmonic mean of precision and recall)

© Inseong Song, SNU Business School

• Avoiding overfitting involves complexity control: “right” balance between

© Inseong Song, SNU Business School

• Subset selection procedure is a discrete process – variables are either

© Inseong Song, SNU Business School

• The larger the value of 𝜆, the greater the amount of shrinkage

𝛽2 6.(78 = 𝑋 " 𝑋 + 𝜆𝐼 #$ 𝑋 " 𝑦

© Inseong Song, SNU Business School

© Inseong Song, SNU Business School

• Elastic net is a compromise between ridge and lasso.

Inseong Song, SNU Business School

• Find the value of for 𝜆 at which the validation error is minimized.

© Inseong Song, SNU Business School

• If 𝜆 = 0, we are including all possible variables. So Overfitting. (High variance)

• So there should be intermediate value of 𝜆 that optimally trade-offs between them.

High variance High bias

© Inseong Song, SNU Business School

You might also like