Hyperparameters of Random Forest Classifier

Last Updated : 03 Jul, 2025

Random Forest is a machine learning method that builds many decision trees during training. It then combines the results of these trees to make a final decision. Understanding and adjusting the settings i.e hyperparameters of Random Forest can greatly improve how well the model performs.

Lets see a few important hyperparameters of Random Forest:

1. min_samples_leaf

Definition: This sets the minimum number of samples that must be present in a leaf node. It ensures that the tree doesn’t create nodes with very few samples which could lead to overfitting.
Impact: A higher value results in fewer but more general leaf nodes which can help in preventing overfitting, especially in cases of noisy data.
Recommendation: Set between 1-5 for optimal generalization and reduced overfitting.

2. n_estimators

Definition: This defines the number of decision trees in the forest. A higher number of trees usually leads to better performance because it allows the model to generalize better by averaging the predictions of multiple trees.
Impact: More trees improve accuracy but also increase the time required for training and prediction.
Recommendation: Use 100-500 trees to ensure good accuracy and model robustness without excessive computation time.

3. max_features

Definition: This controls the number of features to consider when splitting a node. It determines the maximum number of features to be considered for each tree.
Impact: Fewer features at each split make the model more random which can help reduce overfitting. However less features may lead to underfitting.
Recommendation: Use "sqrt" or "log2" for better balance between bias and variance.

4. bootstrap

Definition: This determines whether bootstrap sampling (sampling with replacement) is used when constructing each tree in the forest.
Impact: If set to True each tree is trained on a random sample of the data making the model more diverse. If False all trees use the full dataset.
Recommendation: Set to True for better randomness and model robustness which helps in reducing overfitting.

5. min_samples_split

Definition: This defines the minimum number of samples required to split an internal node. It ensures that nodes with fewer samples are not split, helping to keep the tree simpler and more general.
Impact: A higher value prevents the model from splitting too many nodes with small sample sizes, reducing the risk of overfitting.
Recommendation: A value between 2-10 is ideal, depending on dataset size and the problem complexity.

6. max_samples

Definition: This specifies the maximum number of samples to draw from the dataset to train each base estimator (tree) when bootstrap=True.
Impact: Limiting the number of samples per tree speeds up the training process but may reduce accuracy, as each tree is trained on a subset of data.
Recommendation: Set between 0.5 and 1.0, depending on the dataset size and desired trade-off between speed and accuracy.

7. max_depth

Definition: This sets the maximum depth of each decision tree. The depth of a tree refers to how many levels exist in the tree.
Impact: Deeper trees can capture more detailed patterns but if the tree grows too deep, it may overfit the data making the model less generalizable to unseen data.
Recommendation: A max depth between 10-30 is recommended for most problems to prevent overfitting and ensure simplicity.

Advanced Hyperparameter Tuning Techniques

Grid Search

Definition: A brute-force technique to search through a predefined set of hyperparameter values. The model is trained with every combination of values in the search space.
Impact: Helps find the best combination of hyperparameters by trying all possible values in the specified grid.
Recommendation: Use for small datasets or when computational cost is not a major concern.

Python

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
import numpy as np
from sklearn.datasets import make_classification

X, y = make_classification(
    n_samples=1000, n_features=20, n_informative=10, n_classes=2, random_state=42)

c_space = np.logspace(-5, 8, 15)
param_grid = {'C': c_space}

logreg = LogisticRegression()

logreg_cv = GridSearchCV(logreg, param_grid, cv=5)

logreg_cv.fit(X, y)

print("Tuned Logistic Regression Parameters: {}".format(logreg_cv.best_params_))
print("Best score is {}".format(logreg_cv.best_score_))

Output

Tuned Logistic Regression Parameters: {'C': 0.006105402296585327}
Best score is 0.853

Randomized Search

Definition: Instead of trying every possible combination, this method randomly samples combinations of hyperparameters from the search space.
Impact: Faster than grid search and can provide good results without checking every combination.
Recommendation: Ideal for larger datasets or when you want to quickly find a reasonable set of parameters.

Python

from sklearn.model_selection import RandomizedSearchCV
from sklearn.tree import DecisionTreeClassifier
from scipy.stats import randint
import numpy as np
from sklearn.datasets import make_classification

X, y = make_classification(
    n_samples=1000, n_features=20, n_informative=10, n_classes=2, random_state=42)

param_dist = {
    "max_depth": [3, None],
    "max_features": randint(1, 9),
    "min_samples_leaf": randint(1, 9),
    "criterion": ["gini", "entropy"]
}

tree = DecisionTreeClassifier()
tree_cv = RandomizedSearchCV(tree, param_dist, cv=5)
tree_cv.fit(X, y)

print("Tuned Decision Tree Parameters: {}".format(tree_cv.best_params_))
print("Best score is {}".format(tree_cv.best_score_))

Output

Tuned Decision Tree Parameters: {'criterion': 'entropy', 'max_depth': None, 'max_features': 6, 'min_samples_leaf': 6} 
Best score is 0.8

Bayesian Optimization

Definition: A probabilistic model-based approach that finds the optimal hyperparameters by balancing exploration (testing unexplored areas) and exploitation (focusing on areas already known to perform well).
Impact: More efficient than grid and random search, especially when hyperparameters interact in complex ways.
Recommendation: Use for complex models or when computational resources are limited.

Interpreting Random Forest Classification Results

saurabh48782

Improve

Article Tags :

Practice Tags :

Machine Learning