Assessing Predictive Models

The document discusses the assessment of predictive models, highlighting key steps such as defining the problem, data preparation, model selection, and evaluation metrics. It also covers ensemble learning techniques like bagging and boosting, which combine multiple models to improve accuracy and robustness. Additionally, it introduces prescriptive analytics, which recommends optimal actions based on data analysis and simulations.

Uploaded by

Roopa Roopa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

374 views25 pages

Assessing Predictive Models

Uploaded by

Roopa Roopa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Assessing Predictive Models

• Assessing predictive models involves evaluating how well a model

predicts future outcomes based on historical data
• Key aspects include selecting appropriate metrics, splitting data into
training and testing sets, and using techniques like cross-validation to
ensure robustness.
• This process helps determine if the model is accurate, reliable, and
generalizable to new, unseen data.
Key steps and considerations in assessing
predictive models:
1. Define the problem and objectives: Clearly articulate the problem the
model is intended to solve and the specific goals for prediction.
2. Data understanding and preparation: Thoroughly analyze the data,
understand its characteristics, and prepare it for modeling. This includes
handling missing values, outliers, and potentially creating new features.
3. Model selection: Choose a suitable predictive modeling technique based
on the problem type, data characteristics, and desired model complexity.
4. Data splitting: Divide the data into training, validation, and test sets.
• Training set: Used to train the model.
• Validation set: Used to tune hyperparameters and select the best model.
• Test set: Used to evaluate the final model's performance on unseen data.
Contd…..

5. Model evaluation: Assess the model's performance using

appropriate metrics for the specific problem type.
Classification: Accuracy, precision, recall, F1-score, AUC-ROC.
Regression: Mean squared error, R-squared.
6. Cross-validation: Employ techniques like k-fold cross-validation to
evaluate the model's performance more robustly and assess its
generalization ability.
7. Hyperparameter tuning: Optimize the model's hyperparameters
using techniques like grid search or random search.
8. Model selection and deployment: Choose the best model based
on the evaluation results and deploy it for use.
9. Monitoring and maintenance : Continuously monitor the model's
performance after deployment and retrain it as needed to maintain
accuracy.
• 10. External validation: Consider validating the model on a
Model Ensembling
• Model ensembling, also known as ensemble learning, is a
machine learning technique that combines multiple individual
models to produce a more accurate and robust prediction than
any single model could achieve alone.
• It leverages the collective intelligence of several models, each
potentially with different strengths and weaknesses, to improve
overall performance, reduce errors, and enhance generalization
to unseen data.
• Instead of relying on a single model, ensemble methods train
multiple models (often called base learners or estimators) on the
same data or different subsets of the data.
• These individual models are then combined to produce a final
prediction, which is typically more accurate and reliable than
any single model's output.
Benefits of Ensemble Learning:
• Improved Accuracy: Ensemble methods can often
achieve higher accuracy than individual models,
especially when the base models are diverse.
• Reduced Generalization Error: By combining
multiple models, ensembles can reduce the risk of
overfitting to the training data and improve
performance on new, unseen data.
• Increased Robustness: Ensembles are less
susceptible to errors or biases from any single model,
making them more robust to noisy or outlier data.
• Better Generalization: Ensembles can capture more
complex patterns and relationships in the data, leading
to better generalization to new data.
Bagging & Boosting
• Bagging and boosting are powerful ensemble learning
techniques that combine multiple models to improve predictive
performance.
• Bagging, like Random Forests, creates diverse models by
training on random subsets of the data, then averaging their
predictions.
• Boosting, on the other hand, builds models sequentially, with
each new model focusing on correcting errors made by its
predecessors, leading to a final model that is more accurate and
less biased.
Bagging (Bootstrap Aggregating):
• Creates multiple models by training them on different random
subsets of the training data (with replacement, known as
bootstrapping).
• Each model is trained independently, and their predictions are
combined (usually by averaging for regression or taking a
majority vote for classification) to produce the final prediction.
• Random Forests, which build an ensemble of decision trees.
• Reduce variance (overfitting) and improve model stability by
averaging out individual model errors.
Boosting
• Builds models sequentially, with each model attempting to
correct the errors of its predecessors.
• Models are trained in a way that gives higher weight to
misclassified instances from previous models, focusing on
improving accuracy on those hard-to-predict cases.
• AdaBoost, Gradient Boosting (XGBoost, LightGBM)
• Reduce bias (underfitting) by iteratively improving model
performance on the training data.
Bagging Vs Boosting
Feature Bagging Boosting
Model Training Independent, parallel Sequential, dependent

Error Reduction Primarily variance reduction Primarily bias reduction

(overfitting) (underfitting)

Weighting All models have equal weight Models have different weights
based on performance

Data Sampling Bootstrapping (sampling with Can use bootstrapping or other

replacement) sampling methods

Example Random Forests Gradient Boosting, AdaBoost

Bagging is like having multiple experts independently analyze the data and then combining their
opinions, while boosting is like having a team of experts who learn from each other's mistakes to
Random Forest

• Random Forest is an ensemble

learning method that builds
multiple decision trees during
training and outputs the mode
of the classes (classification) or
mean prediction (regression) of
the individual trees.
• It leverages the power of
multiple decision trees to
improve accuracy and
robustness compared to a
single decision tree.
Working Principle…………
• 1. Building the Forest:
• Bootstrapping: Random Forest uses a technique called bootstrapping
to create multiple training datasets from the original data. This involves
randomly sampling with replacement, meaning some data points might
appear multiple times in a subset, while others are excluded.
• Random Feature Selection: At each node of a decision tree, instead
of considering all features, a random subset of features is selected. This
introduces further diversity among the trees in the forest.
• Decision Tree Construction:
• For each bootstrapped dataset, a decision tree is built using the
randomly selected features. Each tree learns to make predictions based
on its specific training data and feature subset.
Contd……
• 2. Ensemble Prediction:
• Classification: When used for classification, the Random Forest
aggregates the predictions from all the decision trees. The final
prediction is the class that receives the majority vote (most
frequently predicted by the individual trees).
• Regression: For regression tasks, the average of the predictions
from all the individual trees is taken as the final prediction
Key Benefits……
• Improved Accuracy: By combining the predictions of multiple
trees, Random Forest can often achieve higher accuracy than a
single decision tree.
• Reduced Overfitting:The randomness introduced through
bootstrapping and feature selection helps to reduce overfitting,
making the model more generalizable to unseen data.
• Handles High Dimensionality:Random Forest can effectively
handle datasets with many features, even when some features
are irrelevant.
• Handles Missing Values:Random Forest can be more robust to
missing data compared to some other algorithms.
• Feature Importance:Random Forest provides a measure of
feature importance, which can be useful for understanding which
Applications
• Finance: Fraud detection, credit risk assessment.
• Healthcare: Disease diagnosis, drug discovery.
• E-commerce: Product recommendation, customer
segmentation.
• Environmental Studies: Predicting weather patterns, analyzing
ecological data.
• Image Classification: Recognizing objects in images.
Gradient Boosting

• Gradient Boosting combines

multiple "weak" learners (often
decision trees) to create a
strong predictive model.
• Models are trained sequentially,
with each new model correcting
the errors of the previous ones.
• The algorithm focuses on
minimizing the residuals
(differences between predicted
and actual values) at each
step.
Stochastic Gradient Boosting
• Stochastic Gradient Boosting (SGB) is a machine learning
technique that builds an ensemble of weak prediction models
sequentially, introducing randomness to reduce overfitting and
improve generalization.
• It's a variant of Gradient Boosting that incorporates
subsampling, where each model is trained on a random subset
of the training data, and sometimes also with random feature
selection.
SGB Enhancements……
• Subsampling: Instead of using the entire training set to train
each model, SGB randomly selects a subset of the data.
• Feature Randomness:In addition to subsampling, some
implementations also randomly select a subset of features to
consider when splitting nodes in decision trees.
• Reduced Overfitting:The introduction of randomness through
subsampling and feature selection helps to reduce overfitting,
allowing the model to generalize better to unseen data.
• Increased Varience: SGB can increase the variance of the
ensemble, meaning that the individual models might be more
different from each other, but the overall ensemble prediction
is more robust.
Heterogeneous Ensembles
• Heterogeneous ensembles in machine learning combine
multiple models (also known as base learners or
predictors) of different types to make predictions.
• This contrasts with homogeneous ensembles, which use
the same type of model.
• By incorporating diverse models, heterogeneous
ensembles aim to improve accuracy, robustness, and
generalization compared to using a single model or a
homogeneous ensemb
Key Characteristics
• Diversity of Base Learners: Heterogeneous ensembles
leverage different algorithms, architectures, or even different
hyperparameters of the same algorithm to create diverse
models.
• Complementary Biases: The goal is for these different models
to have complementary biases, meaning that when their
predictions are combined, they can compensate for each other's
weaknesses.
• Combination Strategies: Various methods are used to combine
the predictions of individual models, including weighted
averaging, voting, and meta-learning.
• Examples of Base Learners in Heterogeneous Ensembles:
Decision Trees, Support Vector Machines (SVMs), Neural
Networks, K-Nearest Neighbors (KNN), and Logistic Regression.
How Heterogeneous Ensemble
Works ?
• 1. Train Diverse Models: Different base learners are
trained independently using the same or different
training data.
• 2. Combine Predictions: The predictions of the
individual models are combined to produce a final
prediction.
• 3. Evaluate Performance: The performance of the
heterogeneous ensemble is evaluated and compared to
that of individual models or homogeneous ensembles.
Advantages of Heterogeneous Ensembles
• Improved Accuracy: By leveraging diverse models,
heterogeneous ensembles can achieve higher accuracy than
single models or homogeneous ensembles.
• Enhanced Robustness: They can be more robust to noisy or
outlier data because different models may react differently to
these cases.
• Increased Generalization: Heterogeneous ensembles can
generalize better to unseen data due to the diversity of their
component models.
• Flexibility and Reusability: They can easily incorporate pre-
trained models or models built using different algorithms,
making them flexible and adaptable.
Challenges
• Complexity: Designing and managing heterogeneous
ensembles can be more complex than working with
homogeneous ensembles.
• Integration: Choosing the right combination strategy for
different models can be challenging.
• Computational Cost: Training and combining multiple
diverse models can be computationally more expensive.
Applications
• Medical Diagnosis: Combining models trained on different
types of medical data (e.g., images, genomics) can improve
diagnostic accuracy.
• Financial Forecasting: Integrating models trained on various
financial data sources can enhance the reliability of predictions.
• Natural Language Processing: Combining models that excel
in different aspects of language processing can improve the
accuracy of tasks like sentiment analysis and machine
translation
Prescriptive Analytics
• Prescriptive analytics is a type of data analysis that goes beyond simply
describing what happened or predicting what might happen.
• It focuses on recommending the best course of action to take in a given
situation by analyzing data and simulating potential outcomes.
• Essentially, it answers the question, "What should we do?"
• Focus on recommendations: It doesn't just provide insights; it suggests
specific actions and their potential consequences.
• Utilizes data and algorithms: It leverages data from various sources,
including historical data, real-time feeds, and external information, along with
algorithms, machine learning, and optimization techniques.
• Simulates scenarios: It models different courses of action and their potential
outcomes, helping decision-makers understand the trade-offs.
• Operationalizes insights: It translates analytical findings into actionable
recommendations that can be implemented through business tools and
systems.
• Drives decision-making: It empowers organizations to make more informed
and strategic decisions based on data-driven insights
Prescriptive Vs Predictive
Aspect Predictive Prescriptive

Goal To forecast future outcomes based on To recommend the best course of

historical data and statistical modeling. action to achieve desired
outcomes.
Focus Identifying patterns, trends, and Recommending specific actions to
probabilities to predict what might take based on predicted outcomes
happen. and constraints.
Example Predicting customer churn, forecasting Determining the optimal inventory
sales, identifying potential fraud. levels, recommending the best
delivery route, suggesting the best
pricing strategy.
Decision Provides insights to support strategic Actively guides decision-makers in
Support decision-making by evaluating different choosing the best course of
scenarios and potential outcomes. action.
In Predictive analytics tells you what might Prescriptive analytics tells you
General happen. what you should do.

Ensemble Learning Techniques Explained
No ratings yet
Ensemble Learning Techniques Explained
18 pages
Dichotomies in Software Testing
No ratings yet
Dichotomies in Software Testing
65 pages
FSD Unit III
No ratings yet
FSD Unit III
22 pages
Da Unit-1
No ratings yet
Da Unit-1
23 pages
IoT Lecture Notes by Abhaya Kumar Panda
No ratings yet
IoT Lecture Notes by Abhaya Kumar Panda
76 pages
FSD Unit-II Full Stack Development Notes
No ratings yet
FSD Unit-II Full Stack Development Notes
130 pages
Chapter 1-Introduction To Data Privacy
100% (1)
Chapter 1-Introduction To Data Privacy
78 pages
CD Unit - 1
No ratings yet
CD Unit - 1
38 pages
Social Media As A Mirror Reflecting Mental Health Through Computational Linguistics
No ratings yet
Social Media As A Mirror Reflecting Mental Health Through Computational Linguistics
22 pages
R Programming: Operators & Control Flow
100% (1)
R Programming: Operators & Control Flow
66 pages
DA Lab Manual r22
No ratings yet
DA Lab Manual r22
31 pages
Block Cipher Modes of Operation CNS
No ratings yet
Block Cipher Modes of Operation CNS
8 pages
IDS Unit-1-Handwritten
No ratings yet
IDS Unit-1-Handwritten
39 pages
Spam Email. Classifier
No ratings yet
Spam Email. Classifier
16 pages
SPPM NOTES
No ratings yet
SPPM NOTES
115 pages
Bigdata Unit1
No ratings yet
Bigdata Unit1
62 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
23 pages
NLP Unit-V Notes
No ratings yet
NLP Unit-V Notes
16 pages
IDS Unit 1 Notes
No ratings yet
IDS Unit 1 Notes
24 pages
IDS Unit 2
No ratings yet
IDS Unit 2
49 pages
Wsma Unit-V
No ratings yet
Wsma Unit-V
7 pages
Jntuh Computer Networks r22 r18
No ratings yet
Jntuh Computer Networks r22 r18
82 pages
Machine Independent Optimization
No ratings yet
Machine Independent Optimization
51 pages
Unit 5 Bda
No ratings yet
Unit 5 Bda
18 pages
R22 CC Unit-1
No ratings yet
R22 CC Unit-1
32 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
8 pages
Implementing Atomicity and Durability in DBMS
No ratings yet
Implementing Atomicity and Durability in DBMS
30 pages
22h51a66h9 DM
No ratings yet
22h51a66h9 DM
10 pages
Infosys Reference Coding Questions
No ratings yet
Infosys Reference Coding Questions
24 pages
ML Unit 5
No ratings yet
ML Unit 5
30 pages
Pa Unit-Iii
No ratings yet
Pa Unit-Iii
75 pages
Understanding Graph Types and Uses
No ratings yet
Understanding Graph Types and Uses
24 pages
ML Unit 4
No ratings yet
ML Unit 4
34 pages
Unit-5 Alt
No ratings yet
Unit-5 Alt
15 pages
Nic Unit 1 Notes
No ratings yet
Nic Unit 1 Notes
12 pages
Unit-3 DevOps
No ratings yet
Unit-3 DevOps
27 pages
Unit V - AI
No ratings yet
Unit V - AI
41 pages
Unit III - Least Square Estimation
No ratings yet
Unit III - Least Square Estimation
6 pages
R22 CCN - Unit 5 Notes
No ratings yet
R22 CCN - Unit 5 Notes
12 pages
Unit 5 Ids
No ratings yet
Unit 5 Ids
19 pages
Unit 1 Big Data Analytics Full
No ratings yet
Unit 1 Big Data Analytics Full
29 pages
Knowledge Representation in AI
No ratings yet
Knowledge Representation in AI
13 pages
Semantic Web Unit-III
No ratings yet
Semantic Web Unit-III
17 pages
Unit-5 ML
100% (1)
Unit-5 ML
14 pages
Unit-3 Notes
No ratings yet
Unit-3 Notes
6 pages
Ids Unit 5 Final
No ratings yet
Ids Unit 5 Final
25 pages
FSD Unit 1
No ratings yet
FSD Unit 1
16 pages
Data Mining: Classification & Prediction
No ratings yet
Data Mining: Classification & Prediction
16 pages
Introduction To Randomized Algorithms
No ratings yet
Introduction To Randomized Algorithms
18 pages
WP UNIT-III Lecture Notes - 3-1sem
No ratings yet
WP UNIT-III Lecture Notes - 3-1sem
20 pages
Unit-1 Cyber Laws
No ratings yet
Unit-1 Cyber Laws
21 pages
Data Engineering
No ratings yet
Data Engineering
24 pages
CC 3
No ratings yet
CC 3
6 pages
Unit 1
No ratings yet
Unit 1
32 pages
Aggregate Functions in Relational Algebra
No ratings yet
Aggregate Functions in Relational Algebra
13 pages
Internz Learn: Empowering Internships
No ratings yet
Internz Learn: Empowering Internships
7 pages
Automated E-Commerce Price Comparison Website Using PHP XAMPP MongoDB Django and Web Scrapping
No ratings yet
Automated E-Commerce Price Comparison Website Using PHP XAMPP MongoDB Django and Web Scrapping
6 pages
Software Testing Tools Overview
No ratings yet
Software Testing Tools Overview
17 pages
Pa - Unit - Iv
No ratings yet
Pa - Unit - Iv
45 pages
Eda - M4
No ratings yet
Eda - M4
7 pages
Developing Skills For The TOEFL IBT Intermediate
No ratings yet
Developing Skills For The TOEFL IBT Intermediate
762 pages
Behaviourism & Universal Grammar
No ratings yet
Behaviourism & Universal Grammar
4 pages
Ms1 Exam 1 2022
No ratings yet
Ms1 Exam 1 2022
2 pages
Encourage Use of Modals
No ratings yet
Encourage Use of Modals
13 pages
2002 AP Microeconomics Form B Scoring Commentary
No ratings yet
2002 AP Microeconomics Form B Scoring Commentary
2 pages
Class Committee Constitution Template
No ratings yet
Class Committee Constitution Template
4 pages
Narrative Report on Reading Month 2023
No ratings yet
Narrative Report on Reading Month 2023
48 pages
Curriculum For Higher Diploma in Anaesthesia
No ratings yet
Curriculum For Higher Diploma in Anaesthesia
30 pages
Mahamud Mubarak Suleiman
No ratings yet
Mahamud Mubarak Suleiman
33 pages
Enhancing Bamboo, Wood, and Metal Crafts
No ratings yet
Enhancing Bamboo, Wood, and Metal Crafts
29 pages
Express Specialization Requirements
No ratings yet
Express Specialization Requirements
5 pages
AITHigher Semester
No ratings yet
AITHigher Semester
1 page
Records m3
No ratings yet
Records m3
1 page
Plet2e 5a c2 LF1ws
No ratings yet
Plet2e 5a c2 LF1ws
3 pages
Oracle Fusion Middleware Certification Guide
No ratings yet
Oracle Fusion Middleware Certification Guide
112 pages
Jpts - Isq 1st Semester (Level 1) June-Sept 2025 Academic Course Form
No ratings yet
Jpts - Isq 1st Semester (Level 1) June-Sept 2025 Academic Course Form
20 pages
Nathaly Jaquez: Educator & Leader Profile
No ratings yet
Nathaly Jaquez: Educator & Leader Profile
1 page
NBA Student Sample Questions Detailed Answers
No ratings yet
NBA Student Sample Questions Detailed Answers
3 pages
Zoology Model Paper 2021 Exam Guide
No ratings yet
Zoology Model Paper 2021 Exam Guide
2 pages
Semana 3 - IN3009 Análisis y Diseño de Sistemas 2025-1 - Teoría
No ratings yet
Semana 3 - IN3009 Análisis y Diseño de Sistemas 2025-1 - Teoría
30 pages
3rd Quarter Recognition Day and Honor Society Script
100% (5)
3rd Quarter Recognition Day and Honor Society Script
4 pages
University of La Salette, Inc: F.B. Harrison ST, Pasay, Metro Manila
No ratings yet
University of La Salette, Inc: F.B. Harrison ST, Pasay, Metro Manila
4 pages
SPSS Training for Engineering Students
No ratings yet
SPSS Training for Engineering Students
5 pages
ATD Homework Assignments
100% (1)
ATD Homework Assignments
2 pages
Supervisory Management Workbook 4 The Art of Delegating
No ratings yet
Supervisory Management Workbook 4 The Art of Delegating
41 pages
Global Perspectives
No ratings yet
Global Perspectives
22 pages
Improving Students' Listening Ability Using Jigsaw Technique
No ratings yet
Improving Students' Listening Ability Using Jigsaw Technique
8 pages
Worksheet Activity in English 7
No ratings yet
Worksheet Activity in English 7
2 pages
SNC2D Biology Christopher
No ratings yet
SNC2D Biology Christopher
10 pages
Class 11th Syllabus For Applied Mathematics
No ratings yet
Class 11th Syllabus For Applied Mathematics
4 pages

Assessing Predictive Models

Uploaded by

Assessing Predictive Models

Uploaded by

Assessing Predictive Models

• Assessing predictive models involves evaluating how well a model

5. Model evaluation: Assess the model's performance using

Error Reduction Primarily variance reduction Primarily bias reduction

Data Sampling Bootstrapping (sampling with Can use bootstrapping or other

Example Random Forests Gradient Boosting, AdaBoost

• Random Forest is an ensemble

• Gradient Boosting combines

Goal To forecast future outcomes based on To recommend the best course of

You might also like