0% found this document useful (0 votes)
6 views

ML Unit 3_Questions

The document outlines the differences between classification and regression trees, emphasizing their target variables, objectives, splitting criteria, and evaluation metrics. It also discusses key concepts in decision trees such as Information Gain, Entropy, Gini Index, and Gain Ratio, explaining their definitions, formulas, and applications. Additionally, it provides numerical examples for calculating these metrics and concludes with a brief overview of decision trees as a supervised learning technique.

Uploaded by

rachitdhiliwal18
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

ML Unit 3_Questions

The document outlines the differences between classification and regression trees, emphasizing their target variables, objectives, splitting criteria, and evaluation metrics. It also discusses key concepts in decision trees such as Information Gain, Entropy, Gini Index, and Gain Ratio, explaining their definitions, formulas, and applications. Additionally, it provides numerical examples for calculating these metrics and concludes with a brief overview of decision trees as a supervised learning technique.

Uploaded by

rachitdhiliwal18
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Unit 3 ML

1. Write the Differences Between Classification Trees and Regression Trees.

Decision Trees are used in machine learning for both classification and regression problems.
Depending on the type of target variable (categorical or continuous), the tree is classified as a
Classification Tree or a Regression Tree.

Differences Between Classification and Regression Trees

Feature Classification Tree Regression Tree

Target Variable Categorical (Discrete labels) Continuous (Numerical values)

Classifies input data into predefined Predicts a continuous numerical


Objective
categories outcome

Splitting Uses Gini Index, Entropy (Information Uses Mean Squared Error (MSE) or
Criteria Gain), or Gain Ratio Variance Reduction

Prediction The most common class in a region


The mean or median value in a region
Output (majority vote)

Example 1 Email classification (Spam/Not Spam) House price prediction

Customer segmentation (High/Medium/Low


Example 2 Predicting sales revenue
risk)

Evaluation
Accuracy, Precision, Recall, F1-score Mean Squared Error (MSE), R² score
Metric

Handling
Less sensitive to outliers Highly sensitive to outliers
Outliers

Tree Structure Often deeper with multiple branches More compact and pruned

Conclusion:

 If the problem involves predicting a category, use a classification tree.

 If the problem involves predicting a numerical value, use a regression tree.

2. Write a Short Note on Information Gain.

Definition:

Information Gain (IG) is a metric used in Decision Trees to determine the best feature to split the
data. It measures the reduction in uncertainty (entropy) after splitting the data based on an
attribute.

Formula:
Example:

Suppose we have a dataset of weather conditions where we predict whether a person will play
tennis (Yes/No) based on outlook (Sunny, Rainy, Overcast).

1. Calculate Entropy (Before Split).

2. Split data based on Outlook.

3. Compute Weighted Entropy after split.

4. Information Gain = Entropy Before - Entropy After

The attribute with the highest Information Gain is chosen for the split.

Importance:

✔ Helps in selecting the best feature for decision-making.


✔ Ensures that the tree grows efficiently with less depth.

3. 3. Write a Short Note on Entropy.

Definition:

Entropy is a measure of randomness or impurity in a dataset. It determines how homogeneous or


heterogeneous the data is.

Formula:

Where:

 Pi=Probability of class iii in dataset SSS

 n = Number of unique classes

Example:

Consider a dataset with 10 instances:

 6 instances belong to Class A.

 4 instances belong to Class B.


Interpretation:

✔ High Entropy (near 1): Data is impure (equal distribution of classes).


✔ Low Entropy (near 0): Data is pure (mostly one class).

Use in Decision Trees:

 Goal: Reduce entropy at each step by splitting the dataset using best attributes.

 Lower entropy means better classification performance.

4. Write a Short Note on Gini Index.

Definition:

The Gini Index (also called Gini Impurity) is another metric used for measuring impurity in a dataset.
It determines how mixed the classes are within a node.

Formula:

Where:

 Pi = Proportion of class iii in dataset SSS

Example:

For a dataset with two classes (A and B):

 6 instances belong to Class A.

 4 instances belong to Class B.

Interpr
etation:

✔ Gini = 0: Perfectly pure node (only one class).


✔ Gini = 1: Completely impure (equal distribution of classes).

Use in Decision Trees:

 Used in CART (Classification and Regression Trees) algorithm.

 Preferred over entropy for faster calculations.

5. Write a Short Note on Gain Ratio.


Definition:

Gain Ratio is an improved version of Information Gain that penalizes attributes with many unique
values to prevent bias.

Formula:

InformationGain
GainRatio=
SplitInformation
Where:

 Information Gain: Measures reduction in entropy.

 Split Information: Measures how evenly data is divided across different attribute values.

Example:

Consider two attributes for classification:

1. "Color" (Red, Green, Blue) → High Information Gain but many unique values.

2. "Size" (Small, Large) → Balanced split with fewer categories.

Gain Ratio prefers "Size" as it provides a more meaningful split.

Use in Decision Trees:

✔ Used in C4.5 Decision Tree Algorithm to improve decision-making.


✔ Helps avoid bias toward attributes with more unique values.

6. Solve All Numerical Information Given in the PPT with Steps.

The numerical problems in the PPT involve:

1. Calculating Entropy

2. Computing Information Gain

3. Finding Gini Index

4. Applying Gain Ratio Formula

I will now solve these step by step. Let me process them.

It looks like there was an issue processing the file directly. However, I can still manually solve the
numerical problems based on the formulas provided in the PPT.

Numerical 1: Calculating Entropy

Problem:
A dataset contains 6 instances of Class A and 4 instances of Class B. Compute Entropy.

Solution:
Final Answer: Entropy = 0.971

Numerical 2: Computing Gini Index

Problem:
A dataset has 6 instances of Class A and 4 instances of Class B. Compute Gini Index.

Final Answer: Gini Index = 0.48

Numerical 3: Computing Information Gain

Problem:
Given a dataset with 10 instances, split into two groups:

 Subset 1: 4 instances (3 A, 1 B)

 Subset 2: 6 instances (3 A, 3 B)
Calculate Information Gain.
Solution:

Final Answer: Information Gain = 0.047

Numerical 4: Gain Ratio Calculation

Problem:
Compute Gain Ratio given:

 Information Gain = 0.047

 Split Info = 0.8


Final Answer: Gain Ratio = 0.05875

7.Short note on Decision trees.

• Decision Tree is a Supervised learning technique that can be used for both classification and
Regression problems.

• It is a classification and prediction tool having a tree-like structure, where each internal node
denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node
(terminal node) holds a class label.

• The goal of using a Decision Tree is to create a training model that can be used to predict the class
or value of the target variable by learning simple decision rules inferred from prior data(training
data).

You might also like