ML Unit 3_Questions
ML Unit 3_Questions
Decision Trees are used in machine learning for both classification and regression problems.
Depending on the type of target variable (categorical or continuous), the tree is classified as a
Classification Tree or a Regression Tree.
Splitting Uses Gini Index, Entropy (Information Uses Mean Squared Error (MSE) or
Criteria Gain), or Gain Ratio Variance Reduction
Evaluation
Accuracy, Precision, Recall, F1-score Mean Squared Error (MSE), R² score
Metric
Handling
Less sensitive to outliers Highly sensitive to outliers
Outliers
Tree Structure Often deeper with multiple branches More compact and pruned
Conclusion:
Definition:
Information Gain (IG) is a metric used in Decision Trees to determine the best feature to split the
data. It measures the reduction in uncertainty (entropy) after splitting the data based on an
attribute.
Formula:
Example:
Suppose we have a dataset of weather conditions where we predict whether a person will play
tennis (Yes/No) based on outlook (Sunny, Rainy, Overcast).
The attribute with the highest Information Gain is chosen for the split.
Importance:
Definition:
Formula:
Where:
Example:
Goal: Reduce entropy at each step by splitting the dataset using best attributes.
Definition:
The Gini Index (also called Gini Impurity) is another metric used for measuring impurity in a dataset.
It determines how mixed the classes are within a node.
Formula:
Where:
Example:
Interpr
etation:
Gain Ratio is an improved version of Information Gain that penalizes attributes with many unique
values to prevent bias.
Formula:
InformationGain
GainRatio=
SplitInformation
Where:
Split Information: Measures how evenly data is divided across different attribute values.
Example:
1. "Color" (Red, Green, Blue) → High Information Gain but many unique values.
1. Calculating Entropy
It looks like there was an issue processing the file directly. However, I can still manually solve the
numerical problems based on the formulas provided in the PPT.
Problem:
A dataset contains 6 instances of Class A and 4 instances of Class B. Compute Entropy.
Solution:
Final Answer: Entropy = 0.971
Problem:
A dataset has 6 instances of Class A and 4 instances of Class B. Compute Gini Index.
Problem:
Given a dataset with 10 instances, split into two groups:
Subset 1: 4 instances (3 A, 1 B)
Subset 2: 6 instances (3 A, 3 B)
Calculate Information Gain.
Solution:
Problem:
Compute Gain Ratio given:
• Decision Tree is a Supervised learning technique that can be used for both classification and
Regression problems.
• It is a classification and prediction tool having a tree-like structure, where each internal node
denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node
(terminal node) holds a class label.
• The goal of using a Decision Tree is to create a training model that can be used to predict the class
or value of the target variable by learning simple decision rules inferred from prior data(training
data).