Hyperparameters Hyperparameters For Decision Trees: Maximum Depth
Hyperparameters Hyperparameters For Decision Trees: Maximum Depth
16. Hyperparameters
Hyperparameters for Decision Trees
In order to create decision trees that will generalize to new problems well, we can
tune a number of different aspects about the trees. We call the different aspects of a
decision tree "hyperparameters". These are some of the most important
hyperparameters used in decision trees:
Maximum Depth
The maximum depth of a decision tree is simply the largest possible length between
the root to a leaf. A tree of maximum length
kkk
can have at most
2k2^k2k
leaves.
However, min_samples_split doesn't control the minimum size of leaves. As you can
see in the example on the right, above, the parent node had 20 samples, greater than
min_samples_split = 11, so the node was split. But when the node was split, a child
node was created with that had 5 samples, less than min_samples_split = 11.
When splitting a node, one could run into the problem of having 99 samples in one
of them, and 1 on the other. This will not take us too far in our process, and would be
a waste of resources and time. If we want to avoid this, we can set a minimum for the
number of samples we allow on each leaf.
Minimum number of samples per leaf
This number can be specified as an integer or as a float. If it's an integer, it's the
minimum number of samples allowed in a leaf. If it's a float, it's the minimum
percentage of samples allowed in a leaf. For example, 0.1, or 10%, implies that a
particular split will not be allowed if one of the leaves that results contains less than
10% of the samples in the dataset.
QUIZ QUESTION::
Let's test your intuition. Which sizes of features are associated with underfitting and
which with overfitting? Drag the answers to the corresponding boxes.
ANSWER CHOICES:
Feature Underfitting/Overfitting
Overfitting
Underfitting
Underfitting
Overfitting
SOLUTION:
Feature Underfitting/Overfitting
Overfitting Large maximum depth
Overfitting Small minimum samples per split
Underfitting Small maximum depth
Underfitting Large minimum samples per split
Underfitting Small maximum depth
Underfitting Large minimum samples per split
Overfitting Large maximum depth
Overfitting Small minimum samples per split
Next Concept
udacimak v1.4.0