Bagging and Boosting Regression Algorithms
Bagging and Boosting Regression Algorithms
away?
The straight answer is no . You will browse some 100 reviews before buying a new
product.
Ensemble models in Machine learning operate on a similar idea.
This is true for a diverse set of models in comparison to a single model .This
In this technique , multiple models are used to make predictions for each data point.
The predictions we get from the majority of the models are used for the final
predictions.
For eg , we asked 5 of them to rate a movie , we will assume 3 of them rated it as 4 and
5 4 5 4 4
Final Rating
5
EXAMPLE CODE:
X_train consists of independent variables in training .
Y_train is the target variable for training data.
The validation set is x_test (independent variables) and y_test(target
variable).
EXAMPLE CODE:
model1 = tree.DecisionTreeClassifier()
model2 = KNeighborsClassifier()
model3= LogisticRegression()
model1.fit(x_train,y_train)
model2.fit(x_train,y_train)
model3.fit(x_train,y_train)
pred1=model1.predict(x_test)
pred2=model2.predict(x_test)
pred3=model3.predict(x_test)
final_pred = np.array([])
model1 = LogisticRegression(random_state=1)
model2 = tree.DecisionTreeClassifier(random_state=1)
model.score(x_test,y_test)
AVERAGING:
Multiple predictions are made for each point in averaging.
We take an average of predictions from all models and use it to make the final
prediction.
It can be used for making predictions in regression problems or while making
All the models are assigned different weights defining the importance of each model for
prediction.
If two of the colleagues are critics , while others have no prior experience in this field , then
the answers by these two friends are given more importance as compared to other people.
The result is calculated as [(5*0.23) + (4*0.23) + (5*0.18) + (4*0.18) + (4*0.18)] = 4.41.
Output:
Colleague 1 Colleague 2 Colleague 3 Colleague 4
weight 0.23 0.23 0.18 0.18
model1.fit(x_train,y_train)
model2.fit(x_train,y_train)
model3.fit(x_train,y_train)
pred1=model1.predict_proba(x_test)
pred2=model2.predict_proba(x_test)
pred3=model3.predict_proba(x_test)
finalpred=(pred1*0.3+pred2*0.3+pred3*0.4)
Advanced Ensemble Techniques
Stacking:
It is an ensemble learning technique that uses prediction from multiple
models(Decision tree , knn and svm) to build a new model.
This model is used for predictions on the test set.
Following steps are used to create a stacked ensemble:
Steps:
1. The train set is split into 10 parts.
Advanced Ensemble Techniques
2. The base model is built on 9 parts and predictions are made on the
10th part.
This is done for each part of the train set.
Advanced Ensemble Techniques
3.The decision tree is then built on the whole train dataset.
4.Using this model , predictions are made on the test set.
Advanced Ensemble Techniques
5. Steps 2 to 4 are repeated for another base model ( say knn)
resulting in another set of predictions for train set and test set.
Advanced Ensemble Techniques
6.The predictions from the train set are used as features to build a
new model.
Advanced Ensemble Techniques
7. This model is used to make predictions on the test prediction set.
Sample Code:
First , define a function to make predictions on n-folds of train and
test dataset.
This function returns the prediction for train and test for each model.
Sample Code:
def Stacking(model,train,y,test,n_fold): folds=StratifiedKFold(n_splits=n_fold,random_state=1)
test_pred=np.empty((test.shape[0],1),float)
train_pred=np.empty((0,1),float)
for train_indices,val_indices in folds.split(train,y.values):
x_train,x_val=train.iloc[train_indices],train.iloc[val_indices]
y_train,y_val=y.iloc[train_indices],y.iloc[val_indices]
model.fit(X=x_train,y=y_train) train_pred=np.append(train_pred,model.predict(x_val))
test_pred=np.append(test_pred,model.predict(test))
return test_pred.reshape(-1,1),train_pred
Sample Code:
Create two Base Models – Decision tree and knn:
model1 = tree.DecisionTreeClassifier(random_state=1)
test_pred1 ,train_pred1=Stacking(model=model1,n_fold=10,
train=x_train,test=x_test,y=y_train)
train_pred1=pd.DataFrame(train_pred1)
test_pred1=pd.DataFrame(test_pred1)
Sample Code:
model2 = KNeighborsClassifier()
test_pred2 ,train_pred2=Stacking(model=model2,n_fold=10,train=x_train
,test=x_test,y=y_train)
train_pred2=pd.DataFrame(train_pred2)
test_pred2=pd.DataFrame(test_pred2)
Sample Code:
Create a third model , Logistic Regression on the predictions of
model = LogisticRegression(random_state=1)
model.fit(df,y_train)
model.score(df_test, y_test)
Sample Code:
The Stacking model we have created has only two levels.
The decision tree and knn models are built at level zero and Logistic
only.
The holdout set and predictions are used to build a model which is run
new model.
This model is used to make predictions on the test and meta
features.
Sample Code:
Build two models , decision tree and knn on the train set in order to make
model1.fit(x_train, y_train)
val_pred1=model1.predict(x_val)
test_pred1=model1.predict(x_test)
val_pred1=pd.DataFrame(val_pred1)
test_pred1=pd.DataFrame(test_pred1)
Sample Code:
model2 = KNeighborsClassifier()
model2.fit(x_train,y_train)
val_pred2=model2.predict(x_val)
test_pred2=model2.predict(x_test)
val_pred2=pd.DataFrame(val_pred2)
test_pred2=pd.DataFrame(test_pred2)
Blending:
Combining the meta features and the validation set , a logistic regression
df_val=pd.concat([x_val, val_pred1,val_pred2],axis=1)
df_test=pd.concat([x_test, test_pred1,test_pred2],axis=1)
model = LogisticRegression()
model.fit(df_val,y_val)
model.score(df_test,y_test)
Bagging:
The main idea behind bagging is to combine the results of multiple models(for eg , decision trees) to
input.
One of the techniques used is Bootstrapping.
It is a sampling technique in which we create subsets of observations from the original dataset , with
replacement.
The size of the subsets is of the same size as the original set.
Bagging techniques uses these subsets(bags) to get a fair idea of the distribution(complete set).
The size of the subsets created for bagging may be less than the original set.
Bagging:
Bagging:
1. Multiple subsets are created from original dataset selecting
observations with replacement.
2. A base model is created on each of these subsets.
3. The models run in parallel and are independent of each other.
4. The final predictions are determined by combining the predictions
from all the models.
Bagging:
Boosting:
Various Boosting techniques are used which are used in building an ensemble
model.
Boosting methods are used to build an ensemble model in an increment way.
The main principle is build the model incrementally by training each base
learners which are sequentially trained over multiple iterations of training data.
The sklearn ensemble module is having two boosting methods.
Boosting:
AdaBoost:
It is one of the powerful boosting ensemble method.
The main key is in the way they give weights to the instances in the
dataset.
The algorithm needs to pay less attention to the instances while
constructing subsequent models.
Classification With AdaBoost:
The scikit – learn module provides sklearn . ensemble. AdaBoostClassifier.
While building this classifier , the main parameter this module use is the
base_estimator.
Base_estimator is the value of the base estimator from which the boosted
ensemble is built.
If we choose this parameter’s value to none then , the base estimator would
be DecisionTreeClassifer(max_depth=1).
Implementation Example:
In the following example , we are building a AdaBoost Classifier by using
100, random_state = 0)
Example:
print(ADBclf.predict([[0, 2, 3, 0, 1, 1, 1, 1, 2, 2]]))
Output:
[1]
Example:
Now , we can check the score as follows:
ADBclf.score(X, y)
Output:
0.995
Example:
We can also use sklearn dataset to build classifier using Extra – tree
method.
In the example below, we are using pima-Indians diabtes dataset.
Example:
from pandas import read_csv
path = r"C:\pima-indians-diabetes.csv“
headernames = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
Y = array[:,8]
seed = 5
num_trees = 100
max_features = 5
print(results.mean())
Output:
0.7851435406698566
Regression With AdaBoost:
For creating a Regressor with Ada Boost method , the Sci-kit Learn
sklearn.ensemble.AdaBoostClassifier.
Implementation Example:
In the following example , we are building a AdaBoostregressor and also predicting for
False)
ADBregr = RandomForestRegressor(random_state = 0,n_estimators = 100)
ADBregr.fit(X, y)
Output:
AdaBoostRegressor(base_estimator = None, learning_rate = 1.0, loss = 'linear', n_estimators = 100,
random_state = 0)
Example:
print(ADBregr.predict([[0, 2, 3, 0, 1, 1, 1, 1, 2, 2]]))
Output:
[85.50955817]
Gradient Tree Boosting:
It is also called Gradient Boosted Regression Trees(GBRT).
models.
It can be used for regression as well as classification problems.
The main advantage lies in the fact that they naturally handle the mixed
type data.
Boosting:
Classification With Gradient Tree Boost:
For creating a Gradient Tree Boost Classifier , the Scikit – Learn module provides
sklearn.ensemble.GradientBoostingClassifier.
While building this classifier , the main parameter that the module use is ‘loss’.
If we choose loss = deviance , it refers to deviance for classification with probabilistic outputs.
If we choose the parameter’s value to exponential , then it recovers the AdaBoost Algorithm.
A hyper – parameter named learning rate (in the range of (0.0, 1.0]) will control overfitting via
shrinkage.
Implementation Example:
In the following example , we are building a Gradient Boosting Classifier by using
sklearn.ensemble.GradientBoostingClassifier.
This classifier is fitted with 50 week learners.
X, y = make_hastie_10_2(random_state = 0)
path = r"C:\pima-indians-diabetes.csv"
headernames = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
array = data.values
X = array[:,0:8]
Y = array[:,8]
))
Example:
seed = 5
kfold = KFold(n_splits = 10, random_state = seed)
num_trees = 100
max_features = 5
max_features)
results = cross_val_score(ADBclf, X, Y, cv = kfold)
print(results.mean())
Output:
0.7946582356674234
Regression With Gradient Tree Boost:
For creating a regressor with Gradient tree Boost method , the Scikit-
‘loss’.
The default value for loss is ‘ls’.
Implementation Example:
A gradient Boosting Regressor is built by using the Gradient Boosting Regressor by using
sklearn.ensemble.GradientBoostingRegressor.
The mean_squared_error is found by using the mean_squared_error() method.
import numpy as np
= 'ls').fit(X_train, y_train)
Output:
Once fitted , we can find the mean_squared_error as follows:
mean_squared_error(y_test, GDBreg.predict(X_test)).
Output:
5.391246106657164
Gradient Boosting:
It is an ensemble machine learning algorithm that works well for regression as
Each subsequent tree in a series is built on the errors calculated by the previous
tree.
We need to predict the age group of people using the following data.
Gradient Boosting:
Gradient Boosting:
The mean age is assumed to be the predicted value for all the
of age.
Gradient Boosting:
A tree model is created using the errors created above as the target
variable.
The main objective is to find the best split in order to minimize the
error.
Gradient Boosting:
The predcitions by this model are combined with the predictions 1.
New errors are calculated using this predicted value and actual value.
Gradient Boosting:
Steps 2 to 6 is repeated till the max. number of iterations is reached.
Code:
model= GradientBoostingClassifier(learning_rate=0.01,random_state=1)
model.fit(x_train, y_train)
model.score(x_test,y_test)
0.81621621621621621
Sample Code – Regression Problem
from sklearn.ensemble import GradientBoostingRegressor
model= GradientBoostingRegressor()
model.fit(x_train, y_train)
model.score(x_test,y_test)
Parameters
Min_samples_split:
Min_samples_leaf:
Max_depth:
Higher depth will allow the model to learn relations specific to a particular
sample.
Parameters
Max_leaf_nodes:
Max_features:
The max . Number of features to be considered while searching for the best split.
It is one of the most powerful algorithms that is used for doing any type of
predictions.
XGBoost algorithm has predictive power and is 10 times faster than other
Parallel Processing:
It implements parallel processing .
It is faster than GBM.
XGBoost:(extreme Gradient Boosting)
Parallel Processing:
High Flexibility:
XGBoost makes splits upto the max_depth specified and then starts pruning the tree
backwards.
It removes splits beyond which there is no positive sign.
XGBoost allows a user to run a cross – validation at each iteration of the boosting process .
It is easy to get the exact optimum number of boosting iterations in a single run.
XGBoost:Sample Code
import xgboost as xgb
model=xgb.XGBClassifier(random_state=1,learning_rate=0.01)
model.fit(x_train, y_train)
model.score(x_test,y_test)
0.82702702702702702
XGBoost:Sample Code
import xgboost as xgb
model=xgb.XGBRegressor()
model.fit(x_train, y_train)
model.score(x_test,y_test)
XGBoost: Parameters
Nthread:
If you wish to run all the cores automatically , the algorithm will detect automatically.
Eta:
Makes the model more robust by shrinking the weights on each step.
XGBoost: Parameters
Min_child_weight:
Defines the min. sum of weights of all observations required in a child.
It is used to avoid over – fitting.
Higher values prevent a model from learning relations.
Max_depth:
It is used to define the maximum depth.
Higher depth will allow the model to learn relations very specific to a
particular sample.
XGBoost: Parameters
max_leaf_nodes:
overfitting but values that are too small might lead to under-fitting.
XGBoost: Parameters
colsample_bytree
tree.