Decision Trees
Decision Trees
Definition
• Decision trees are a popular method used in data mining and machine
learning for both classification and regression tasks. They model
decisions and their possible consequences, including chance event
outcomes.
Differences from Regression
• Model Representation:
• Decision Trees: Represented as a tree structure with nodes and
branches.
• Linear Regression: Represented by an equation that models the
relationship between input features and the continuous output.
Decision Rules:
• Decision Trees: Use a series of if-then-else decision rules derived from
the data features.
• Linear Regression: Uses a linear equation to model the relationship
between the dependent and independent variables.
Types of Datasets Suitable for
Decision Trees
• Categorical Data: Suitable for datasets where the output variable is
categorical (e.g., yes/no, spam/ham, disease/no disease).
Implementation
• To implement a decision tree for predicting whether a patient has a
particular disease, we can use a sample dataset and the
DecisionTreeClassifier from the sklearn library in Python. Here’s a
step-by-step implementation:
• Load and prepare the data
• Train a decision tree classifier
• Make predictions
• Evaluate the model
Accuracy
• The accuracy score of a model ranges from 0 to 1: