0% found this document useful (0 votes)
17 views12 pages

TM Adaboost

Text mining can improve model accuracy through techniques like AdaBoost that combine multiple weak learners. AdaBoost gives misclassified samples more weight with each iteration to focus learning. It trains an ensemble of models on weighted versions of the data and combines their predictions, increasing predictive performance with fewer risks of overfitting compared to a single model. AdaBoost was demonstrated on a banking dataset involving phone calls to clients about bank term deposits. Hyperparameters like the number of estimators and learning rate can be tuned for optimal results.

Uploaded by

Alisha Samal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views12 pages

TM Adaboost

Text mining can improve model accuracy through techniques like AdaBoost that combine multiple weak learners. AdaBoost gives misclassified samples more weight with each iteration to focus learning. It trains an ensemble of models on weighted versions of the data and combines their predictions, increasing predictive performance with fewer risks of overfitting compared to a single model. AdaBoost was demonstrated on a banking dataset involving phone calls to clients about bank term deposits. Hyperparameters like the number of estimators and learning rate can be tuned for optimal results.

Uploaded by

Alisha Samal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Text Mining

ADA BOOST IN BANKING DOMAIN


GROUP 5
Bagging vs Boosting
SIGNIFICANCE OF ADABOOST

Boosting can improve the accuracy of the model by combining several weak
models’ accuracies and averaging them for regression or voting over them for
classification to increase the accuracy of the final model.

Adaboost gives more weightage to misclassified samples during every


iteration.

Adaboosting combines multiple weak learners to achieve strong predictive


performance.

AdaBoost is less prone to overfitting as well. In addition to boosting weak


learners, we can also fine-tune hyperparameters in these ensemble
techniques to get even better accuracy.
Initially all the samples have equal weights
Depending upon the number of features, number of stumps are made(3
features,3 stumps)
Gini index of all those stumps is calculated(Gini index is the probability of a
class getting misclassified)
Stump with lowest gini index is chosen as base learner for that iteration.
Total error of that stump is calculated then its amount of say and then with
amount of say, the new sample weights are calculated.
For the misclassified classes, the amount of say will have (+) coefficient for
it to have a greater value and for correctly classified it will be (-) so, it has
lesser weightage in the next iteration.
Total error = sum total of weights of the Significance of Alpha and Error rate
mis classified classes.

Calculating Amount of say


(Alpha)

Calculating New Sample Weight


After getting the new weights:
What happens in training data

We will normalize the weights so


that their range goes from 0 to 1

We will create buckets based on


the normalized weights.

Random numbers are generated. What happens in testing data

Based on the random numbers


generated and where they lie in
the buckets the new dataset.

This process is repeated till we


achieve the desired training error
or number of iterations you want.
CASE STUDY
The data is related to direct marketing campaigns of a banking institution. The marketing
campaigns were based on phone calls. Often, more than one contact to the same client was
required, in order to access if the product (bank term deposit) would be ('yes') or not ('no')
subscribed.

Dataset Link
ADABOOST ALGORITHM AND ITS
HYPERPARAMETERS
Steps: Hyperparameters:
n_estimatorsint, default=50
Datacleaning: Checking of null values, The number of weak learners to
Drop null values. train iteratively.
learning_ratefloat, default=1.0
Feature selection: Select the Controls the contribution of each
important features based on the classifier. There is a trade-off
correlation with the target variable between learning_rate and
n_estimators.
Normalization of data: random_state
base_estimatorobject, default=None
Spliting of dataset into Train and Test Use GRIDSEARCHCV
dataset:
Hyperparameters:
n_estimators int, For values, variable
learning_rate = o.o1, 0.1, 1
random_state = 42
base_estimatorobject, default= Decision Tree
Hyperparameters:
n_estimators int, For values, 50, 500, 1000
learning_rate = Variable
random_state = 42
base_estimatorobject, default= Decision Tree
Confusion Matrices
Confusion Matrix
Output

You might also like