DS_7
DS_7
A limitation of bagging is that the same greedy algorithm is used to create each tree, meaning that it is likely that
the same or very similar split points will be chosen in each tree making the different trees very similar (trees will
be correlated). This, in turn, makes their predictions similar, mitigating the variance originally sought.
We can force the decision trees to be different by limiting the features (rows) that the greedy algorithm can evaluate
at each split point when creating the tree. This is called the Random Forest algorithm.
Like bagging, multiple samples of the training dataset are taken and a different tree trained on each. The difference
is that at each point a split is made in the data and added to the tree, only a fixed subset of attributes can be
considered.
For classification problems, the type of problems we will look at in this tutorial, the number of attributes to be
considered for the split is limited to the square root of the number of input features.
num_features_for_split = sqrt(total_input_features)
The result of this one small change are trees that are more different from each other (uncorrelated) resulting
predictions that are more diverse and a combined prediction that often has better performance that single tree or
bagging alone.
6. Laboratory Exercise
Procedure
i. Use google colab for programming.
ii. Import required packages.
iii. Demonstrate random forest classifier for any given dataset.
iv. Add relevant comments in your programs and execute the code. Test it for various cases.
Post-Experiments Exercise:
A. Extended Theory:
a. Write real life applications of Random Forest Classifier.
B. Conclusion:
1. Write what was performed in the program (s) .
2. What is the significance of program and what Objective is achieved?
C. References:
[1] https://machinelearningmastery.com/implement-random-forest-scratch-python/.
[2] https://www.geeksforgeeks.org/random-forest-classifier-using-scikit-learn/
[3] https://www.analyticsvidhya.com/blog/2021/06/understanding-random-forest/