Machine Learning-Lecture 05
Machine Learning-Lecture 05
LECTURE – 05
DECISION TREES (DT)
Decision Tree
The decision tree is one of the most important
machine learning algorithms. It is used for
both classification and regression problems.
1. CART (Classification and Regression Trees) — This makes use of Gini impurity
as the metric.
2. ID3 — This uses entropy and information gain as metric.
Example of ID3
Classification using the ID3 algorithm
Consider whether a dataset based on which we will determine whether to play
golf or not.
Advantages
•Simple to understand and to interpret.
•Requires little data preparation.
•The cost of using the tree (i.e., predicting data) is logarithmic in the number of
data points used to train the tree.
•Able to handle both numerical and categorical data.
•Able to handle multi-output problems.
Disadvantages
•Prone to Overfitting.
•Unstable to Changes in the Data.
•Unstable to Noise.
•Non-Continuous.
•Unbalanced Classes.
•Greedy Algorithm: make the best locally optimal choice at each step not considering the choice
will lead to the best tree globally.
•Computationally Expensive on Large Datasets.
• Complex Calculations on Large Datasets.
Reference
https://medium.com/@ashirbadpradhan8115/decision-tree-id3-
algorithm-machine-learning-4120d8ba013b