0% found this document useful (0 votes)
15 views

Lecture 2.1 - AML

Uploaded by

Vivek Sreekar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Lecture 2.1 - AML

Uploaded by

Vivek Sreekar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Advanced Machine

Learning with
TensorFlow
22TCSE532
Lecture_2.1
Introduction to Ensemble Methods
Ensemble methods use multiple models together to make better predictions than a single model can.

Ensemble

Bagging Boosting

● Trains several models on ● Trains models one after another, each


different parts of the data. trying to fix the mistakes of the
previous one.
● Combines their predictions to
make a final decision (e.g., ● Combines their predictions to make a
Random Forest). stronger model (e.g., AdaBoost,
Gradient Boosting).
Advantages of Ensemble Methods:

● Reduces Overfitting:
○ By using many models, the final prediction is less likely to be overly tailored to the
training data, making it perform better on new data.

● Improves Accuracy:
○ Ensemble methods usually give more accurate results than using a single model
because they combine the strengths of multiple models.
Row Sampling with Replacement

m<n
M1
d1`m - For each and every model we will provide
the sample of the dataset D (with n
records) as d1`m (m is no of records).
M2
- For M2 we will again resample the records
and pick the other sample of records to
Dataset give input to them.
d2`m
M3 - This is basically called as row sampling
n with replacement.

- d1` not equal to d2`.Although some


records may get repeated.

Mn
- Take the test data d`` and get the
d`` predictions(output).
M1 1
m<n - Once we get the output for all
d1`m
different models, then we will
d`` (Test Data) apply voting classifier.
M2 0 - Now majority of the votes given
Dataset d2`m 1 as output will be considered final
output.
M3 1
n

BOOTSTRAP AGGREGATION
Mn 1
d`` 1
M1
m<n
d1`m

d`` (Test Data)


M2 0 Feature(column) sampling
with replacement is also
Dataset d2`m 1 done in RF.
M3 1
n

BOOTSTRAP AGGREGATION
Mn 1

IN RANDOM FOREST M1, M2… Mn ARE REPLACED WITH DECISION TREES.


Random Forest

d`` 1
DT1
m<n
r ’X n’ Base learner is Decision
Tree.
d`` (Test Data) DT2 0
r Dataset 1
DT3 1
n

n = no of columns 1
DTn
r = no of rows
Whenever we create decision tree to it’s complete depth

Decision Tree has 2 properties:-

• Low BIAS(means it will get trained so well on training dataset such that training
error will be very less)

• High VARIANCE(for the test data these decision trees will be prone to give
larger amount of errors)

THAT IS WHY WHEN IT IS CREATED TO ITS COMPLETE DEPTH IT LEADS TO


OVERFITTING.
Now what is happening in Random Forest?

In RF we use multiple decision tree, and as we discussed in last


slide each and every decision tree will have high variance BUT

When we combine these Decision Trees wrt majority vote

WHAT WILL HAPPEN?


HIGH VARIANCE LOW VARIANCE

When we combine the Decision trees wrt to majority


vote then high variance is get converted into low
variance.
d`` 1
DT1 Due to Feature
m<n
r ’X n’
and row sampling
d`` (Test Data) newly added data
DT2 0 will be splitted
across all the
1000 r Dataset 1 models, so data
DT3 1 change will not
200 impact the score
of any individual
n model. It will still
generalise it.
n = no of columns 1
DTn
r = no of rows
What if we are handling a regression problem?

d``
DT1 1.14 We will either take
m<n mean or median of
r ’X n’
d`` (Test Data) the outputs. It
DT2 0.95 depends upon the
r Dataset
0.75 distribution of the
output.
DT3 1.05

n Hyperparameter = No of
Decision Trees
n = no of columns 0.87
DTn
r = no of rows
What is Out of bag evaluation in Random Forest(Bagging)?

d`` 1
DT1
m<n OOB Score
r ’X n’
d`` (Test Data)
DT2 0

1000 r Dataset 1
DT3 1
k

Out of n
Bag(OOB)
n = no of columns 1
DTn
r = no of rows
Data

Train Test

Train Validation

⅔*n ⅓*n
If I will set the OOB parameter to TRUE, OOB data will
become/considered as a Validation data.

What is OOB score?

It is nothing but the accuracy wrt val dataset.


What is Boosting?

● Definition: Sequentially trains models to correct errors of previous


models.

● Process:
○ Models are trained one after another, each trying to correct the
mistakes of the previous one.
○ Combines weak learners to form a strong learner.

● Types: AdaBoost, Gradient Boosting, XGBoost, CatBoost.


Process:

1. Initialization:
○ Start with an initial model trained on the data.

2. Sequential Training:
○ Train a series of models sequentially.
○ Each new model focuses on the errors made by
the previous models.

3. Weight Adjustment:
○ Increase the weight of incorrectly predicted
examples to emphasize their importance in
subsequent training.

4. Combination:
○ Combine the predictions of all models to make
the final prediction (e.g., weighted sum for
regression, majority voting for classification).
What is AdaBoost?(Adaptive Boosting)

Adjusts the weights of incorrectly classified examples so that subsequent models


focus more on difficult cases.

In AdaBoost decision tree is created with only 1 depth

Stumps
Sample
weight

Calculating sample
weight

w= 1/n
Entropy or gini coefficient or we can
use both to select the stump, the one Selecting a base
with least value will be selected learner
f1 f2 f3
Let say DT with f1 (employee id) is
selected

f1
5
1
Now we need to find the total error for the
record which is incorrectly classified Finding Total Error

Total error = Sum of weight of wrong output

Total error = 1/6


performance of the stump =

= ½ ln [5]

= 0.804

Finding the
Why we have calculated the total error and performance of the
performance of the stump? stump
Because we need to update the sample weight.
The weight for the correct predictions will be
reduced and wrong predictions increase before
sending data to 2nd base learner.
Now we have to increase the weight of wrong classified record
and we have to decrease the weight for the correctly classified
record.
Update the weight of incorrectly classified point

New Sample weight = weight X e performance


= 1/6 X e 0.804
= 0.372

Update Weight
Update the weight of incorrectly classified point

New Sample weight = weight X e -performance


= 1/6 X e -0.804
= 0.07

Update Weight

We can see that sample weight adds up to 1 but the updated weight does not.
Divide by sum of
Because summation of all the values under updated weight is updated weights
not 1 therefore i.e 0.72 Normalizing Updated
Weights
Now we will be using the Normalised weights and select the all
misclassified records for the second base learner to learn .
Choosing the Second Base Learner

1. Selecting the Second Learner:


○ In the second iteration, you again train a weak learner on the re-weighted
dataset.
○ The new learner will focus more on the samples that were misclassified in the
first iteration because their weights have been increased.
2. Identifying Wrong Records:
○ The misclassified records from the first iteration are those where the prediction
of the first weak learner does not match the true label.
○ These samples now have higher weights, meaning the second weak learner
will give them more importance during training. Select second base
Iterative Process learner
● This process repeats for a specified number of iterations or until a certain error
threshold is reached.
● Each weak learner contributes to the final strong classifier through a weighted vote
based on its accuracy.
Model Validation vs. Model Testing: Overview
Model Validation: This step involves tuning and evaluating the model's performance
during the training phase. It uses a validation set (distinct from the training data) to
assess the model's accuracy and adjust hyperparameters to improve its
generalization capabilities. The goal is to ensure that the model is not overfitting to the
training data.

Model Testing: This is the final evaluation step, performed after the model is trained
and validated. It uses a test set (which the model has never seen before) to measure
the model's true performance in a real-world scenario. The results on the test set
provide an unbiased estimate of how the model will perform on new, unseen data.

You might also like