Tree
Tree
đến XGBoost
Mì AI
Decision Tree
• A Decision Tree is a supervised Machine learning algorithm. It is used in
both classification and regression algorithms.
• Methods:
• Information Gain (ID3)
• GINI
GINI
https://www.analyticsvidhya.com/blog/2020/10/all-about-decision-tree-from-scratch-with-python-implementation/
GINI
https://www.analyticsvidhya.com/blog/2020/10/all-about-decision-tree-from-scratch-with-python-implementation/
GINI
https://www.analyticsvidhya.com/blog/2020/10/all-about-decision-tree-from-scratch-with-python-implementation/
GINI
https://www.analyticsvidhya.com/blog/2020/10/all-about-decision-tree-from-scratch-with-python-implementation/
GINI
https://www.analyticsvidhya.com/blog/2020/10/all-about-decision-tree-from-scratch-with-python-implementation/
GINI
GINI
GINI
GINI
GINI
GINI
GINI
GINI
Regresion Tree
Regresion Tree
Regresion Tree
Regresion Tree
Regresion Tree
Pro and Cons
Advantages of a decision tree
• Easy to visualize and interpret: Its graphical representation is very intuitive to
understand and it does not require any knowledge of statistics to interpret it.
• Useful in data exploration: We can easily identify the most significant variable and
the relation between variables with a decision tree. It can help us create new
variables or put some features in one bucket.
• Less data cleaning required: It is fairly immune to outliers and missing data, hence
less data cleaning is needed.
• The data type is not a constraint: It can handle both categorical and numerical
data.
Pro and Cons
Disadvantages of decision tree
• Overfitting: single decision tree tends to overfit the data which is solved by setting
constraints on model parameters i.e. height of the tree and pruning. Sensitive to
noisy data. It can overfit noisy data.
• Not exact fit for continuous data: It losses some of the information associated
with numerical variables when it classifies them into different categories.
• The small variation(or variance) in data can result in the different decision tree. This can be
reduced by bagging and boosting algorithms.
• Decision trees are biased with imbalance dataset, so it is recommended that balance out the
dataset before creating the decision tree.
Overfit
Setting Constraints on tree size
• Minimum samples for a node split
• Minimum samples for a leaf node
• Maximum depth of the tree (vertical depth)
• Maximum number of leaf nodes
• Maximum features to consider for a split
• The number of features to consider while searching for the best split. These will be randomly
selected.
• As a thumb-rule, the square root of the total number of features works great but we should
check up to 30–40% of the total number of features.
• Higher values can lead to over-fitting but depend on case to case.
Overfit
Pruning
Nấu Mì