0% found this document useful (0 votes)
17 views

Tree

- Decision trees are a type of supervised machine learning algorithm that can be used for classification or regression problems. They work by splitting the data into subsets based on attribute values and their structure can be represented graphically. - The key components of a decision tree are nodes, edges, the root node, and leaf nodes. Methods for selecting the best attributes at each split include information gain and GINI index. - Decision trees have advantages like being intuitive to understand, handling both numerical and categorical data, and requiring little data cleaning. Disadvantages include potential overfitting, losing some information for continuous variables, and bias with imbalanced data. Techniques like setting constraints on tree size and pruning can help reduce overfitting.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Tree

- Decision trees are a type of supervised machine learning algorithm that can be used for classification or regression problems. They work by splitting the data into subsets based on attribute values and their structure can be represented graphically. - The key components of a decision tree are nodes, edges, the root node, and leaf nodes. Methods for selecting the best attributes at each split include information gain and GINI index. - Decision trees have advantages like being intuitive to understand, handling both numerical and categorical data, and requiring little data cleaning. Disadvantages include potential overfitting, losing some information for continuous variables, and bias with imbalanced data. Techniques like setting constraints on tree size and pruning can help reduce overfitting.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Từ trồng cây gây rừng

đến XGBoost
Mì AI
Decision Tree
• A Decision Tree is a supervised Machine learning algorithm. It is used in
both classification and regression algorithms.

• Decision Trees usually implement exactly the human thinking ability


while making a decision, so it is easy to understand.
Decision Tree
Decision Tree

• Nodes: It is The point where the tree


splits according to the value of some
attribute/feature of the dataset

• Edges: It directs the outcome of a split


to the next node.
Decision Tree

• Root: This is the node where the


first split takes place

• Leaves: These are the terminal


nodes that predict the outcome
of the decision tree.
Decision Tree
Decision Tree Type
Classification Tree
• While building a Decision tree, the main thing is to select the best
attribute from the total features list of the dataset for the root node as
well as for sub-nodes. The selection of best attributes is being
achieved with the help of a technique known as the Attribute selection
measure (ASM).

• Methods:
• Information Gain (ID3)
• GINI
GINI

https://www.analyticsvidhya.com/blog/2020/10/all-about-decision-tree-from-scratch-with-python-implementation/
GINI

https://www.analyticsvidhya.com/blog/2020/10/all-about-decision-tree-from-scratch-with-python-implementation/
GINI

https://www.analyticsvidhya.com/blog/2020/10/all-about-decision-tree-from-scratch-with-python-implementation/
GINI

https://www.analyticsvidhya.com/blog/2020/10/all-about-decision-tree-from-scratch-with-python-implementation/
GINI

https://www.analyticsvidhya.com/blog/2020/10/all-about-decision-tree-from-scratch-with-python-implementation/
GINI
GINI
GINI
GINI
GINI
GINI
GINI
GINI
Regresion Tree
Regresion Tree
Regresion Tree
Regresion Tree
Regresion Tree
Pro and Cons
Advantages of a decision tree
• Easy to visualize and interpret: Its graphical representation is very intuitive to
understand and it does not require any knowledge of statistics to interpret it.
• Useful in data exploration: We can easily identify the most significant variable and
the relation between variables with a decision tree. It can help us create new
variables or put some features in one bucket.
• Less data cleaning required: It is fairly immune to outliers and missing data, hence
less data cleaning is needed.
• The data type is not a constraint: It can handle both categorical and numerical
data.
Pro and Cons
Disadvantages of decision tree
• Overfitting: single decision tree tends to overfit the data which is solved by setting
constraints on model parameters i.e. height of the tree and pruning. Sensitive to
noisy data. It can overfit noisy data.

• Not exact fit for continuous data: It losses some of the information associated
with numerical variables when it classifies them into different categories.

• The small variation(or variance) in data can result in the different decision tree. This can be
reduced by bagging and boosting algorithms.
• Decision trees are biased with imbalance dataset, so it is recommended that balance out the
dataset before creating the decision tree.
Overfit
Setting Constraints on tree size
• Minimum samples for a node split
• Minimum samples for a leaf node
• Maximum depth of the tree (vertical depth)
• Maximum number of leaf nodes
• Maximum features to consider for a split
• The number of features to consider while searching for the best split. These will be randomly
selected.
• As a thumb-rule, the square root of the total number of features works great but we should
check up to 30–40% of the total number of features.
• Higher values can lead to over-fitting but depend on case to case.
Overfit
Pruning
Nấu Mì

You might also like