0% found this document useful (0 votes)
4 views

DS_7

The document outlines an experiment for a Data Science Lab course at St. Francis Institute of Technology, focusing on implementing the Random Forest supervised learning algorithm. It includes objectives, prerequisites, theoretical background, and laboratory procedures for using Google Colab to demonstrate the algorithm on a dataset. Additionally, it covers post-experiment exercises, including real-life applications and conclusions regarding the significance of the program.

Uploaded by

Revati
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

DS_7

The document outlines an experiment for a Data Science Lab course at St. Francis Institute of Technology, focusing on implementing the Random Forest supervised learning algorithm. It includes objectives, prerequisites, theoretical background, and laboratory procedures for using Google Colab to demonstrate the algorithm on a dataset. Additionally, it covers post-experiment exercises, including real-life applications and conclusions regarding the significance of the program.

Uploaded by

Revati
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

St.

Francis Institute of Technology, Mumbai-400 103


Department Of Information Technology
A.Y. 2024-2025
Class: BE-ITA/B, Semester: VII
Subject: Data Science Lab
Experiment – 7
1. Aim: To implement Supervised Learning algorithm - Random Forest.
2. Objectives: Students should be familiarize with Learning Architectures and Frameworks
3. Prerequisite: Python basics
4. Pre-Experiment Exercise:
Theory:
Random Forest Algorithm
Decision trees involve the greedy selection of the best split point from the dataset at each step.
This algorithm makes decision trees susceptible to high variance if they are not pruned. This high variance can be
harnessed and reduced by creating multiple trees with different samples of the training dataset (different views of
the problem) and combining their predictions. This approach is called bootstrap aggregation or bagging for short.

A limitation of bagging is that the same greedy algorithm is used to create each tree, meaning that it is likely that
the same or very similar split points will be chosen in each tree making the different trees very similar (trees will
be correlated). This, in turn, makes their predictions similar, mitigating the variance originally sought.
We can force the decision trees to be different by limiting the features (rows) that the greedy algorithm can evaluate
at each split point when creating the tree. This is called the Random Forest algorithm.

Like bagging, multiple samples of the training dataset are taken and a different tree trained on each. The difference
is that at each point a split is made in the data and added to the tree, only a fixed subset of attributes can be
considered.
For classification problems, the type of problems we will look at in this tutorial, the number of attributes to be
considered for the split is limited to the square root of the number of input features.

num_features_for_split = sqrt(total_input_features)
The result of this one small change are trees that are more different from each other (uncorrelated) resulting
predictions that are more diverse and a combined prediction that often has better performance that single tree or
bagging alone.

6. Laboratory Exercise
Procedure
i. Use google colab for programming.
ii. Import required packages.
iii. Demonstrate random forest classifier for any given dataset.
iv. Add relevant comments in your programs and execute the code. Test it for various cases.
Post-Experiments Exercise:
A. Extended Theory:
a. Write real life applications of Random Forest Classifier.
B. Conclusion:
1. Write what was performed in the program (s) .
2. What is the significance of program and what Objective is achieved?
C. References:
[1] https://machinelearningmastery.com/implement-random-forest-scratch-python/.
[2] https://www.geeksforgeeks.org/random-forest-classifier-using-scikit-learn/
[3] https://www.analyticsvidhya.com/blog/2021/06/understanding-random-forest/

You might also like