0% found this document useful (0 votes)

25 views24 pages

Gradient Boosted Trees Explained

The document discusses Gradient Boosted Trees, a machine learning technique that builds predictive models through an ensemble of weak learners, typically decision trees. It outlines the importance of data mining, various data mining techniques, and the principles of boosting, particularly focusing on how Gradient Boosting minimizes loss functions using gradient descent. The pros and cons of Gradient Boosted Trees are also highlighted, emphasizing their effectiveness and sensitivity to overfitting.

Uploaded by

cemisouth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views24 pages

Gradient Boosted Trees Explained

Uploaded by

cemisouth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Gradient boosted trees

Dr. Geetha Kuntoji

Assistant Professor

Department of Civil Engineering

BMS College of Engineering

Bengaluru-19

28 Feb 2025 (10.30am to 5.00pm)

Department of Civil Engineering NIE College of Engineering Mysore
Data Mining : It is a process of extracting patterns from data. They should
be:
Valid: holding on to new data with some certainity
Novel: being non-obvious to the system.
Useful: should be possible to act on the item
Understandable: Humans should be able to interpret the pattern.

Also known as Knowledge Discovery in Databases (KDD).

Data Mining might mean:

Statistics Visualization Artificial

Intelligence

Information Knowledge-
M a c hi ne L ea r ni ng
Retreiva I based systems

Knowledge Pattern And so on....

acquisition Recognition
What's needed?

imp
%soSuitable data
irmig
lame
Computing power
** Data mining software

. Someone who knows both

the nature of data and the Reason, theory or hunch
.
software tools.
Typical Data Mining and KDD have
widespread applications.
Some examples include: Marketing

applications of
Data Mining
Financial services
and KDD Health care

And so on....
•
Some basic techniques

r Predictive model: It basically describes what will happen in the future, rather
predicts by by
analyzing the given current data. It uses statistical analysis, machine learning
algorithms and other forecast techniques to predict what might happen in the
[Link] is not accurate as it is essentially just a prediction into the future using
L the data and the given stastistical/Machine Learning techniques. Eg- Performance
r Analysis.
Descriptive model: It basically gives a vision into the past and tells what exactly happened in
the past. It involves Data Aggregation and Data [Link] is accurate as it describes exactly
what happened in the past. Eg- Sentiment Analysis.
L
r 1
Prescriptive model: This is realtively new field in Data [Link] is a step above
0 predictive and descriptive model. It basically provides a viable solution to the
problem in hand and the
impact of considering a solution on future [Link] is still an evolving technique.
Eg- Google self driving car.
Some basic techniques

Predictive Descriptive
Regression Clustering

Classification Association
rules and variants

Collaborative
Filtering Deviation
detection
Key data mining tas s

■
Classification: mapping Regression: mapping data Clustering: Grouping

data into predefined item to a real valued similar data together into

groups or classes. prediction variable. clusters.

Key learning tasks in Machine Learning
4-ftmeik.
Unsupervised learning: Data given is not
Supervised learning: A set of well-labled
labelled ie. only input variables are given
data is given with defined inputs and
with no corresponding output variables.
outputs variables (training data) and the
The algorithms find patterns and draw
algorithms learn to predict the output
inferences from the given data. This is
from the input data.
"pure Data Mining".

Semi-supervised: Some data is labeled

but most of it is unlabeled and a mixture

of supervised and unsupervised

techniques can be used.

Some basic Data Mining Methods

Genetic
Cluster/Nearest
Decision Trees Neural Networks Algorithms/Evolutionary
Neighbour
Computing

ayesien Networks Statistics Hybrids

We are interested in Gradient boosted trees.

Gradient
boosted trees
We would use Rapidminer (possibly Python?)
Gradient boosted trees

 Decision Trees
We will discuss a bit about decision trees first.
A decision tree is a tree where each node represents a feature(attribute), each
link(branch) represents a decision(rule) and each leaf represents an
outcome(categorical or continues value).
A decision tree takes a set of input features and splits input data recursively based
on those features.
The processes are repeated until some stop condition is met. Ex- Depth of tree, no
more information gain possible etc.
Gradient boosted trees

 Decision Trees have been there for a long time and have also known to suffer from bias and variance.

We have a large bias with simple trees and large variance with complex trees.
Ensemble methods combine several decision trees to produce better predictive
performance rather than utilizing a single decision tree.
The main principle behind the ensemble model is that a group of weak learners
come together to form a strong learner.
A few ensemble methods : Bagging, Boosting
 We will see each of them.
Gradient boosted trees

Bagging

It's used when our goal is to reduce the variance of the decision tree.
Here the idea is to take a subset of data from training sample chosen randomly
with replacement.
Now, each collection of subset data is used to train their decision trees.
Thus we end up with ensemble of different models and their average is much more
robust than a single decision tree,which is much more robust in Predictive
Analysis.
Random Forest is an extension of Bagging.
Gradient boosted trees

Random Forest

It is basically a collection or ensemble of model of numerous decision trees. A collection of

trees is generally called forest.
It is also a bagging technique with a key difference, it takes a subset of features at each
split , and prune the trees with a stopping criteria for node splits.
The tree is grown to the largest.
The above steps are repeated and the prediction is given based on the aggregation of
predictions from n number of trees.
Used for both classification and regression.
It handles higher dimensionality data and missing values well and maintains accuracy, but
doesnt give precise values for the regression model as the final prediction is based on the
mean predictions from subset trees.
Gradient boosted trees

 Boosting

Boosting refers to a family of learners which convert weak learners to strong learners.

 It learns sequentially from the errors from a prior random sample(in our case, a tree).

The weak learners are trained sequentially each trying to correct its predecessor.

The early learners fit simple models to the data and then analyze the data for errors.

All the weak learners with their higher accuracy of error (only slighty less than guessing,0.5) are combined in some way to get a strong
classifier,with a higher accuracy.

 When an input is misclassified by a hypothesis, its weight is increased so that next hypothesis is more likely to classify it correctly.

By combining the whole set at the end, the weak learners are converted into better performing model.
Gradient boosted trees

*
*
Start from a weak The result is strong We train an algorithm, A model is built on a
Types of boosting AdaBoost: short for
classifier and learn to classifier built by say Decision tree on a subset of data and
Adaptive boosting.
linearly combine them so boosting of weak model, whose all features predictions are made on
that the error is reduced. classifiers.
have been given equal the whole dataset,and
weights. errors are calculated by

the predictions and

actual values.
Gradient boosted trees

Adaboost

While creating the next model, higher weights are given to the data points which were
predicted incorrectly ie. misclassitied.
Weights can be determined using the error value, ie. Higher the error, more is the
weight associated to the observation.
This process is repeated until the error function does not change, or the maximum limit of
the estimators is reached.
 Its used for both classfication and regression problem,mostly decision stamps are used with
Adaboost, but any machine learning algorithm, if it accepts weight on training data set can
be used a base learner.
 One of the applications of Adaboost is face recognition systems.
Gradient boosted trees

Types of Boosting

Gradient Boosting

We will cover this in detail now.

There are other implementations of Gradient boosting like XGBoost and Light
GB.
Gradient boosted trees

Gradient Boost

It's also a machine learning technique which produces which produces a

prediction model in the form of an ensemble of weak prediction models, typically
decision trees.
Thus, they may be referred as Gradient boosted trees.
Like other boosting methods, it builds a model in a sequential or stage-wise
fashion.
Gradient boosted trees

We shall now see some maths behind it.

The objective of any supervised learning algorithm is to define a loss function and minimize it.
We have mean square error defined as:

We want our loss function(MSE) in our predictions be minimum using gradient descent and updating our
predictions based on a learning rate.
Gradient boosted trees

We will see what is learning rate.

Learning rates are the hypermeters which controls how much we are adjusting the weights of our network with
respect to the loss gradient. The learning rate affects how quickly our model can converge to a local minima (aka.
arrive at the best accuracy).
The relationship is given by the formula: new weight = existing weight learning rate * gradient In gradient
boosted trees, we use the following learning rate:

We basically update the predictions such that the sum of our residuals is close to zero(or minimum) and the
predicted values are sufficiently close to the actual values.
Learning rates are so tuned so as to prevent the overfitting which the gradient boosted trees are prone to.
Gradient boosted trees

In Gradient boosted trees, models are sequentially trained, and each model minimizes
the loss function (y = ax + b + e, e needs special attention as it is an error term) of the
whole system using Gradient descent method, as explained earlier.
The learning procedure consecutively fits new models to provide a more accurate
estimate of response variable.
The principle idea behind this algorithm is to create new base learners, which can be
maximally corelated with negative gradient of the loss function, associated with the whole
ensemble.
Pros of Gradient boosted trees: Fast, easy to tune, not sensitive to scale (features can be a
mix of continuous and categorical data), good performance, lots of software available(well
supported and tested)
Cons: Sensitive to overfitting and noise (should always cross validate)

Decision Trees and Ensemble Learning Guide
No ratings yet
Decision Trees and Ensemble Learning Guide
13 pages
Ensemble Learning Methods Explained
100% (1)
Ensemble Learning Methods Explained
24 pages
Ensemble Learning Techniques Explained
No ratings yet
Ensemble Learning Techniques Explained
14 pages
Data Science Learning Algorithms Overview
No ratings yet
Data Science Learning Algorithms Overview
15 pages
Ensemble Learning Techniques Explained
No ratings yet
Ensemble Learning Techniques Explained
41 pages
Machine Learning Model Types Explained
No ratings yet
Machine Learning Model Types Explained
14 pages
Big Data: Decision Trees & Ensemble Methods
100% (1)
Big Data: Decision Trees & Ensemble Methods
23 pages
Overview of Boosting Methods in ML
No ratings yet
Overview of Boosting Methods in ML
4 pages
Understanding Random Forest in ML
No ratings yet
Understanding Random Forest in ML
45 pages
Ensemble Methods in Machine Learning
No ratings yet
Ensemble Methods in Machine Learning
16 pages
Shrinkage Techniques in Feature Selection
No ratings yet
Shrinkage Techniques in Feature Selection
38 pages
XGBoost: Ensemble Learning Overview
No ratings yet
XGBoost: Ensemble Learning Overview
26 pages
Ensemble Learning Techniques Overview
No ratings yet
Ensemble Learning Techniques Overview
17 pages
Ensemble Learning Techniques Explained
No ratings yet
Ensemble Learning Techniques Explained
18 pages
Advanced Machine Learning Techniques
No ratings yet
Advanced Machine Learning Techniques
42 pages
Understanding Ensemble Learning Techniques
No ratings yet
Understanding Ensemble Learning Techniques
30 pages
Ensemble and Unsupervised Learning Overview
No ratings yet
Ensemble and Unsupervised Learning Overview
38 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
47 pages
Ensemble Methods in Machine Learning
No ratings yet
Ensemble Methods in Machine Learning
25 pages
Ensemble Learning Techniques Overview
No ratings yet
Ensemble Learning Techniques Overview
27 pages
EM Algorithm and Machine Learning Techniques
No ratings yet
EM Algorithm and Machine Learning Techniques
16 pages
Ensemble Learning Techniques Explained
No ratings yet
Ensemble Learning Techniques Explained
7 pages
Boosting Techniques: AdaBoost vs XGBoost
No ratings yet
Boosting Techniques: AdaBoost vs XGBoost
6 pages
Supervised vs Unsupervised Learning Techniques
No ratings yet
Supervised vs Unsupervised Learning Techniques
5 pages
Random Forest: Boosting vs. Bagging
100% (1)
Random Forest: Boosting vs. Bagging
14 pages
Decision Trees & Ensemble Learning Guide
No ratings yet
Decision Trees & Ensemble Learning Guide
10 pages
Decision Trees and Ensemble Learning Guide
No ratings yet
Decision Trees and Ensemble Learning Guide
70 pages
Ensemble Methods: Bagging & Boosting Techniques
100% (1)
Ensemble Methods: Bagging & Boosting Techniques
48 pages
Importance of Ensemble Learning Techniques
No ratings yet
Importance of Ensemble Learning Techniques
47 pages
Ensemble Learning and Random Forest Guide
No ratings yet
Ensemble Learning and Random Forest Guide
28 pages
Tree and Probabilistic Models Overview
No ratings yet
Tree and Probabilistic Models Overview
46 pages
Master Machine Learning in 3 Steps
No ratings yet
Master Machine Learning in 3 Steps
17 pages
Chapter5 - Machine Learning
No ratings yet
Chapter5 - Machine Learning
37 pages
Understanding Ensemble Methods in ML
No ratings yet
Understanding Ensemble Methods in ML
19 pages
Boosting Algorithms in Machine Learning
No ratings yet
Boosting Algorithms in Machine Learning
15 pages
Ensemble Learning Techniques Explained
No ratings yet
Ensemble Learning Techniques Explained
29 pages
Advanced Ensemble Learning Techniques
No ratings yet
Advanced Ensemble Learning Techniques
44 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
45 pages
In-Depth Guide to Machine Learning Algorithms
No ratings yet
In-Depth Guide to Machine Learning Algorithms
167 pages
Gradient Boosted Trees for Regression
No ratings yet
Gradient Boosted Trees for Regression
33 pages
Overview of Ensemble Methods in ML
No ratings yet
Overview of Ensemble Methods in ML
31 pages
Ensemble Methods in Machine Learning
No ratings yet
Ensemble Methods in Machine Learning
41 pages
Machine Learning Algorithms Overview
No ratings yet
Machine Learning Algorithms Overview
12 pages
Random Forest and Ensemble Learning Guide
No ratings yet
Random Forest and Ensemble Learning Guide
68 pages
Weak Learners in Ensemble Learning
No ratings yet
Weak Learners in Ensemble Learning
9 pages
Ensemble Learning and Random Forests Guide
No ratings yet
Ensemble Learning and Random Forests Guide
15 pages
Machine Learning Types and Processes
No ratings yet
Machine Learning Types and Processes
29 pages
Benefits of Ensemble Learning Techniques
100% (1)
Benefits of Ensemble Learning Techniques
12 pages
Ensemble Learning Techniques Explained
No ratings yet
Ensemble Learning Techniques Explained
31 pages
Understanding Boosting in Machine Learning
No ratings yet
Understanding Boosting in Machine Learning
12 pages
Ensemble Learning in Machine Learning
No ratings yet
Ensemble Learning in Machine Learning
25 pages
Techniques for Handling Missing Data in ML
No ratings yet
Techniques for Handling Missing Data in ML
1 page
Ensemble Learning Techniques Explained
No ratings yet
Ensemble Learning Techniques Explained
18 pages
Ensemble Methods in Machine Learning
No ratings yet
Ensemble Methods in Machine Learning
14 pages
Ensemble Learning Techniques Explained
No ratings yet
Ensemble Learning Techniques Explained
13 pages
Ensemble Techniques in AIML
No ratings yet
Ensemble Techniques in AIML
26 pages
Coleridge's Biographia Literaria Insights
No ratings yet
Coleridge's Biographia Literaria Insights
3 pages
English Language Test Questions and Answers
No ratings yet
English Language Test Questions and Answers
26 pages
Konsep Hari Kiamat dalam Islam
No ratings yet
Konsep Hari Kiamat dalam Islam
16 pages
Programming for Problem Solving Exam Guide
No ratings yet
Programming for Problem Solving Exam Guide
8 pages
CyberLines: CSE Department Highlights
No ratings yet
CyberLines: CSE Department Highlights
18 pages
Grade 2 SASMO Practice Pack
No ratings yet
Grade 2 SASMO Practice Pack
3 pages
Masterclass With Hadar Shemesh Workbook
No ratings yet
Masterclass With Hadar Shemesh Workbook
9 pages
Alphabetical Listing of Berean Expositor
No ratings yet
Alphabetical Listing of Berean Expositor
17 pages
QMS Documentation Process Overview
No ratings yet
QMS Documentation Process Overview
3 pages
Kapampangan Short Story: Miss Phathupats
No ratings yet
Kapampangan Short Story: Miss Phathupats
8 pages
Kiswahili Course Outline for Beginners
No ratings yet
Kiswahili Course Outline for Beginners
4 pages
Syntax-Directed Translation in Compilers
No ratings yet
Syntax-Directed Translation in Compilers
10 pages
Red Hat JBoss Enterprise Application Platform-7.4-Installation Guide-En-US
No ratings yet
Red Hat JBoss Enterprise Application Platform-7.4-Installation Guide-En-US
50 pages
OLAP Operations and Examples Explained
No ratings yet
OLAP Operations and Examples Explained
4 pages
Propositional Logic Overview and Exercises
No ratings yet
Propositional Logic Overview and Exercises
124 pages
Organizing Food Systems Inquiry
100% (1)
Organizing Food Systems Inquiry
6 pages
Tristubh Hymns of Rigveda X
No ratings yet
Tristubh Hymns of Rigveda X
10 pages
Adriana's Monologue in Act 2, Scene 1
No ratings yet
Adriana's Monologue in Act 2, Scene 1
3 pages
Khayal Music and Its Cultural Significance
No ratings yet
Khayal Music and Its Cultural Significance
5 pages
Muslim Views of Jews and Judaism in The Medieval Period - A Comparative Study of Ibn Hazm and Al-Shahrastani PDF
No ratings yet
Muslim Views of Jews and Judaism in The Medieval Period - A Comparative Study of Ibn Hazm and Al-Shahrastani PDF
36 pages
Analysis of "The Enemy" by Pearl S. Buck
100% (2)
Analysis of "The Enemy" by Pearl S. Buck
5 pages
Inquiry-Based Learning in 21st Century Literature
No ratings yet
Inquiry-Based Learning in 21st Century Literature
3 pages
Understanding Theodicy and Evil
No ratings yet
Understanding Theodicy and Evil
9 pages
Java Example Programs for Beginners
No ratings yet
Java Example Programs for Beginners
18 pages
DevOps Engineer with AWS Expertise
No ratings yet
DevOps Engineer with AWS Expertise
7 pages
Transformative Maths Education in Rohini
No ratings yet
Transformative Maths Education in Rohini
11 pages
Avaya Test Bank: 582 Questions & Answers
No ratings yet
Avaya Test Bank: 582 Questions & Answers
14 pages
Total English B Advanced Exam Guide
100% (1)
Total English B Advanced Exam Guide
4 pages
Cognitive Linguistics and Social Interac PDF
No ratings yet
Cognitive Linguistics and Social Interac PDF
361 pages
VMware Administrator Resume Overview
No ratings yet
VMware Administrator Resume Overview
4 pages

Gradient Boosted Trees Explained

Uploaded by

Gradient Boosted Trees Explained

Uploaded by

Gradient boosted trees

Dr. Geetha Kuntoji

Department of Civil Engineering

BMS College of Engineering

28 Feb 2025 (10.30am to 5.00pm)

Also known as Knowledge Discovery in Databases (KDD).

Statistics Visualization Artificial

Knowledge Pattern And so on....

. Someone who knows both

groups or classes. prediction variable. clusters.

Semi-supervised: Some data is labeled

but most of it is unlabeled and a mixture

of supervised and unsupervised

techniques can be used.

ayesien Networks Statistics Hybrids

It is basically a collection or ensemble of model of numerous decision trees. A collection of

the predictions and

We will cover this in detail now.

It's also a machine learning technique which produces which produces a

We shall now see some maths behind it.

We will see what is learning rate.

You might also like