0% found this document useful (0 votes)
374 views25 pages

Assessing Predictive Models

The document discusses the assessment of predictive models, highlighting key steps such as defining the problem, data preparation, model selection, and evaluation metrics. It also covers ensemble learning techniques like bagging and boosting, which combine multiple models to improve accuracy and robustness. Additionally, it introduces prescriptive analytics, which recommends optimal actions based on data analysis and simulations.

Uploaded by

Roopa Roopa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
374 views25 pages

Assessing Predictive Models

The document discusses the assessment of predictive models, highlighting key steps such as defining the problem, data preparation, model selection, and evaluation metrics. It also covers ensemble learning techniques like bagging and boosting, which combine multiple models to improve accuracy and robustness. Additionally, it introduces prescriptive analytics, which recommends optimal actions based on data analysis and simulations.

Uploaded by

Roopa Roopa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Assessing Predictive Models

• Assessing predictive models involves evaluating how well a model


predicts future outcomes based on historical data
• Key aspects include selecting appropriate metrics, splitting data into
training and testing sets, and using techniques like cross-validation to
ensure robustness.
• This process helps determine if the model is accurate, reliable, and
generalizable to new, unseen data.
Key steps and considerations in assessing
predictive models:
1. Define the problem and objectives: Clearly articulate the problem the
model is intended to solve and the specific goals for prediction.
2. Data understanding and preparation: Thoroughly analyze the data,
understand its characteristics, and prepare it for modeling. This includes
handling missing values, outliers, and potentially creating new features.
3. Model selection: Choose a suitable predictive modeling technique based
on the problem type, data characteristics, and desired model complexity.
4. Data splitting: Divide the data into training, validation, and test sets.
• Training set: Used to train the model.
• Validation set: Used to tune hyperparameters and select the best model.
• Test set: Used to evaluate the final model's performance on unseen data.
Contd…..

5. Model evaluation: Assess the model's performance using


appropriate metrics for the specific problem type.
Classification: Accuracy, precision, recall, F1-score, AUC-ROC.
Regression: Mean squared error, R-squared.
6. Cross-validation: Employ techniques like k-fold cross-validation to
evaluate the model's performance more robustly and assess its
generalization ability.
7. Hyperparameter tuning: Optimize the model's hyperparameters
using techniques like grid search or random search.
8. Model selection and deployment: Choose the best model based
on the evaluation results and deploy it for use.
9. Monitoring and maintenance : Continuously monitor the model's
performance after deployment and retrain it as needed to maintain
accuracy.
• 10. External validation: Consider validating the model on a
Model Ensembling
• Model ensembling, also known as ensemble learning, is a
machine learning technique that combines multiple individual
models to produce a more accurate and robust prediction than
any single model could achieve alone.
• It leverages the collective intelligence of several models, each
potentially with different strengths and weaknesses, to improve
overall performance, reduce errors, and enhance generalization
to unseen data.
• Instead of relying on a single model, ensemble methods train
multiple models (often called base learners or estimators) on the
same data or different subsets of the data.
• These individual models are then combined to produce a final
prediction, which is typically more accurate and reliable than
any single model's output.
Benefits of Ensemble Learning:
• Improved Accuracy: Ensemble methods can often
achieve higher accuracy than individual models,
especially when the base models are diverse.
• Reduced Generalization Error: By combining
multiple models, ensembles can reduce the risk of
overfitting to the training data and improve
performance on new, unseen data.
• Increased Robustness: Ensembles are less
susceptible to errors or biases from any single model,
making them more robust to noisy or outlier data.
• Better Generalization: Ensembles can capture more
complex patterns and relationships in the data, leading
to better generalization to new data.
Bagging & Boosting
• Bagging and boosting are powerful ensemble learning
techniques that combine multiple models to improve predictive
performance.
• Bagging, like Random Forests, creates diverse models by
training on random subsets of the data, then averaging their
predictions.
• Boosting, on the other hand, builds models sequentially, with
each new model focusing on correcting errors made by its
predecessors, leading to a final model that is more accurate and
less biased.
Bagging (Bootstrap Aggregating):
• Creates multiple models by training them on different random
subsets of the training data (with replacement, known as
bootstrapping).
• Each model is trained independently, and their predictions are
combined (usually by averaging for regression or taking a
majority vote for classification) to produce the final prediction.
• Random Forests, which build an ensemble of decision trees.
• Reduce variance (overfitting) and improve model stability by
averaging out individual model errors.
Boosting
• Builds models sequentially, with each model attempting to
correct the errors of its predecessors.
• Models are trained in a way that gives higher weight to
misclassified instances from previous models, focusing on
improving accuracy on those hard-to-predict cases.
• AdaBoost, Gradient Boosting (XGBoost, LightGBM)
• Reduce bias (underfitting) by iteratively improving model
performance on the training data.
Bagging Vs Boosting
Feature Bagging Boosting
Model Training Independent, parallel Sequential, dependent

Error Reduction Primarily variance reduction Primarily bias reduction


(overfitting) (underfitting)

Weighting All models have equal weight Models have different weights
based on performance

Data Sampling Bootstrapping (sampling with Can use bootstrapping or other


replacement) sampling methods

Example Random Forests Gradient Boosting, AdaBoost

Bagging is like having multiple experts independently analyze the data and then combining their
opinions, while boosting is like having a team of experts who learn from each other's mistakes to
Random Forest

• Random Forest is an ensemble


learning method that builds
multiple decision trees during
training and outputs the mode
of the classes (classification) or
mean prediction (regression) of
the individual trees.
• It leverages the power of
multiple decision trees to
improve accuracy and
robustness compared to a
single decision tree.
Working Principle…………
• 1. Building the Forest:
• Bootstrapping: Random Forest uses a technique called bootstrapping
to create multiple training datasets from the original data. This involves
randomly sampling with replacement, meaning some data points might
appear multiple times in a subset, while others are excluded.
• Random Feature Selection: At each node of a decision tree, instead
of considering all features, a random subset of features is selected. This
introduces further diversity among the trees in the forest.
• Decision Tree Construction:
• For each bootstrapped dataset, a decision tree is built using the
randomly selected features. Each tree learns to make predictions based
on its specific training data and feature subset.
Contd……
• 2. Ensemble Prediction:
• Classification: When used for classification, the Random Forest
aggregates the predictions from all the decision trees. The final
prediction is the class that receives the majority vote (most
frequently predicted by the individual trees).
• Regression: For regression tasks, the average of the predictions
from all the individual trees is taken as the final prediction
Key Benefits……
• Improved Accuracy: By combining the predictions of multiple
trees, Random Forest can often achieve higher accuracy than a
single decision tree.
• Reduced Overfitting:The randomness introduced through
bootstrapping and feature selection helps to reduce overfitting,
making the model more generalizable to unseen data.
• Handles High Dimensionality:Random Forest can effectively
handle datasets with many features, even when some features
are irrelevant.
• Handles Missing Values:Random Forest can be more robust to
missing data compared to some other algorithms.
• Feature Importance:Random Forest provides a measure of
feature importance, which can be useful for understanding which
Applications
• Finance: Fraud detection, credit risk assessment.
• Healthcare: Disease diagnosis, drug discovery.
• E-commerce: Product recommendation, customer
segmentation.
• Environmental Studies: Predicting weather patterns, analyzing
ecological data.
• Image Classification: Recognizing objects in images.
Gradient Boosting

• Gradient Boosting combines


multiple "weak" learners (often
decision trees) to create a
strong predictive model.
• Models are trained sequentially,
with each new model correcting
the errors of the previous ones.
• The algorithm focuses on
minimizing the residuals
(differences between predicted
and actual values) at each
step.
Stochastic Gradient Boosting
• Stochastic Gradient Boosting (SGB) is a machine learning
technique that builds an ensemble of weak prediction models
sequentially, introducing randomness to reduce overfitting and
improve generalization.
• It's a variant of Gradient Boosting that incorporates
subsampling, where each model is trained on a random subset
of the training data, and sometimes also with random feature
selection.
SGB Enhancements……
• Subsampling: Instead of using the entire training set to train
each model, SGB randomly selects a subset of the data.
• Feature Randomness:In addition to subsampling, some
implementations also randomly select a subset of features to
consider when splitting nodes in decision trees.
• Reduced Overfitting:The introduction of randomness through
subsampling and feature selection helps to reduce overfitting,
allowing the model to generalize better to unseen data.
• Increased Varience: SGB can increase the variance of the
ensemble, meaning that the individual models might be more
different from each other, but the overall ensemble prediction
is more robust.
Heterogeneous Ensembles
• Heterogeneous ensembles in machine learning combine
multiple models (also known as base learners or
predictors) of different types to make predictions.
• This contrasts with homogeneous ensembles, which use
the same type of model.
• By incorporating diverse models, heterogeneous
ensembles aim to improve accuracy, robustness, and
generalization compared to using a single model or a
homogeneous ensemb
Key Characteristics
• Diversity of Base Learners: Heterogeneous ensembles
leverage different algorithms, architectures, or even different
hyperparameters of the same algorithm to create diverse
models.
• Complementary Biases: The goal is for these different models
to have complementary biases, meaning that when their
predictions are combined, they can compensate for each other's
weaknesses.
• Combination Strategies: Various methods are used to combine
the predictions of individual models, including weighted
averaging, voting, and meta-learning.
• Examples of Base Learners in Heterogeneous Ensembles:
Decision Trees, Support Vector Machines (SVMs), Neural
Networks, K-Nearest Neighbors (KNN), and Logistic Regression.
How Heterogeneous Ensemble
Works ?
• 1. Train Diverse Models: Different base learners are
trained independently using the same or different
training data.
• 2. Combine Predictions: The predictions of the
individual models are combined to produce a final
prediction.
• 3. Evaluate Performance: The performance of the
heterogeneous ensemble is evaluated and compared to
that of individual models or homogeneous ensembles.
Advantages of Heterogeneous Ensembles
• Improved Accuracy: By leveraging diverse models,
heterogeneous ensembles can achieve higher accuracy than
single models or homogeneous ensembles.
• Enhanced Robustness: They can be more robust to noisy or
outlier data because different models may react differently to
these cases.
• Increased Generalization: Heterogeneous ensembles can
generalize better to unseen data due to the diversity of their
component models.
• Flexibility and Reusability: They can easily incorporate pre-
trained models or models built using different algorithms,
making them flexible and adaptable.
Challenges
• Complexity: Designing and managing heterogeneous
ensembles can be more complex than working with
homogeneous ensembles.
• Integration: Choosing the right combination strategy for
different models can be challenging.
• Computational Cost: Training and combining multiple
diverse models can be computationally more expensive.
Applications
• Medical Diagnosis: Combining models trained on different
types of medical data (e.g., images, genomics) can improve
diagnostic accuracy.
• Financial Forecasting: Integrating models trained on various
financial data sources can enhance the reliability of predictions.
• Natural Language Processing: Combining models that excel
in different aspects of language processing can improve the
accuracy of tasks like sentiment analysis and machine
translation
Prescriptive Analytics
• Prescriptive analytics is a type of data analysis that goes beyond simply
describing what happened or predicting what might happen.
• It focuses on recommending the best course of action to take in a given
situation by analyzing data and simulating potential outcomes.
• Essentially, it answers the question, "What should we do?"
• Focus on recommendations: It doesn't just provide insights; it suggests
specific actions and their potential consequences.
• Utilizes data and algorithms: It leverages data from various sources,
including historical data, real-time feeds, and external information, along with
algorithms, machine learning, and optimization techniques.
• Simulates scenarios: It models different courses of action and their potential
outcomes, helping decision-makers understand the trade-offs.
• Operationalizes insights: It translates analytical findings into actionable
recommendations that can be implemented through business tools and
systems.
• Drives decision-making: It empowers organizations to make more informed
and strategic decisions based on data-driven insights
Prescriptive Vs Predictive
Aspect Predictive Prescriptive

Goal To forecast future outcomes based on To recommend the best course of


historical data and statistical modeling. action to achieve desired
outcomes.
Focus Identifying patterns, trends, and Recommending specific actions to
probabilities to predict what might take based on predicted outcomes
happen. and constraints.
Example Predicting customer churn, forecasting Determining the optimal inventory
sales, identifying potential fraud. levels, recommending the best
delivery route, suggesting the best
pricing strategy.
Decision Provides insights to support strategic Actively guides decision-makers in
Support decision-making by evaluating different choosing the best course of
scenarios and potential outcomes. action.
In Predictive analytics tells you what might Prescriptive analytics tells you
General happen. what you should do.

You might also like