Decision Trees
Decision Trees
September 2, 2024
Decision Trees
■ What is it?
▶ Decision trees are a type of supervised machine learning
algorithm.
▶ A Decision Tree is a flowchart-like structure used for
decision-making.
▶ It consists of nodes representing decisions or outcomes,
branches for possible choices, and leaves for final decisions or
classifications.
■ Why is it important?
▶ Decision Trees are intuitive and interpretable, making them
useful for both predictive modeling and understanding decision
processes.
▶ They can handle both categorical and numerical data.
■ Benefits
▶ Easy to visualize and understand, even for non-technical
audiences.
▶ Can be used for both classification and regression tasks.
Definition & Purpose
■ Definition:
▶ A Decision Tree is a model that predicts the value of a target
variable based on input variables.
▶ It is structured as a flowchart with nodes, branches, and leaves.
■ Purpose:
▶ To make decisions that lead to favorable outcomes.
▶ Helps in classification and regression tasks.
■ Example:
▶ Predicting whether a customer will purchase a product based
on attributes like age, income, and browsing history.
Classification Trees
■ Example:
▶ ”If income > $50,000 and credit score > 700, then approve
the loan.”
Types of Decision Trees
■ Classification Trees
▶ Used for predicting the class or category of an object based on
input features.
▶ Example: Predicting whether an email is spam or not spam
based on keywords.
■ Regression Trees
▶ Used for predicting a continuous value.
▶ Example: Predicting sales of a product based on advertising
spend.
Building a Decision Tree
■ Algorithms
▶ CART: Classification and Regression Tree.
▶ ID3: Iterative Dichotomiser 3.
▶ CHAID: Chi-squared Automatic Interaction Detector.
Splitting Criteria: What is it?
■ Why is it important?
▶ The choice of splitting criterion impacts:
• Accuracy: Ensures that the tree makes accurate predictions
by creating clear branches.
• Complexity: Affects whether the tree is too complex
(overfitting) or too simple (underfitting).
■ Benefits
▶ A well-chosen splitting criterion can lead to:
• Improved Performance: Ensures the tree generalizes well and
performs better on unseen data.
• Meaningful Splits: Helps create splits that are intuitive and
easy to interpret.
Information Gain and Entropy