A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
When you combine these diverse models probabilistically, through techniques like
bagging (Bootstrap Aggregating) or boosting, you effectively average out the biases
and errors, which tend to cancel each other out as the number of models increases. This
results in a more robust estimate of the underlying probability distribution of the data,
reducing variance and improving predictive accuracy.
Strengths:
1. Handles Both Classification and Regression: Random Forest can be used for both
classification and regression tasks, making it versatile and applicable to a wide range of
problems.
2. Outlier Robustness: Random Forest is generally robust to outliers in the data. Outliers
do not have a significant impact on the ensemble's performance, as they might with
some other algorithms.
3. Parallelization: The individual decision trees in a Random Forest can be trained in
parallel, making it computationally efficient, especially when dealing with large datasets.
Weaknesses:
1. Limited Extrapolation Capability: Random Forest is less suitable for extrapolation
tasks, where you need to make predictions outside the range of the training data. It
tends to make flat predictions beyond the observed data.
2. Bias Toward Features with Many Categories: Random Forest may have a bias towards
features with many categories or levels, as they can be more likely to appear in
individual trees. This can affect feature importance scores.
3. Large Memory Footprint: Storing a large Random Forest model can require a
significant amount of memory, especially when dealing with a large number of trees and
features.
(c)What are hyperparameters of the Random Forest model? How do
you find these values.
max_depth
min_sample_split
max_leaf_nodes
min_samples_leaf
n_estimators
max_sample (bootstrap sample)
max_features
The Working of the Random Forest Algorithm is quite intuitive. It is implemented in two
phases: The first is to combine N decision trees with building the random forest, and the
second is to make predictions for each tree created in the first phase.
Step 4: For classification and regression, accordingly, the final output is based on
Majority Voting or Averaging, accordingly.
import numpy as np
n_samples = X.shape[0]
tree = DecisionTreeClassifier(max_depth=max_depth)
tree.fit(X, y)
return tree
forest = []
for _ in range(n_trees):
forest.append(tree)
return forest
# Main code
if __name__ == "__main__":
# Define hyperparameters
# Now, you have a trained Random Forest (forest) ready for making predictions.
Q2
space via a mapping function, , where the data becomes linearly separable.