ML SP24 Mid Term Exam - Solution (1)
ML SP24 Mid Term Exam - Solution (1)
Note: Solve the questions on separately provided Answer Sheet. Attempt all questions.
Question No. 1 (CLO-1(SO-1)) (05+05+08 = 18 Marks)
a- Please compute & fill in the table below with information relevant to Market Basket Analysis.
b- Construct the FP-Growth trees (only) for the modified database given below.
Modified database after eliminating the
Frequency of each item in sorted order non-frequent items and reorganizing
Page 1 of 8
c- Given a transaction t = 1 2 3 5 6 , what are the possible Hash Tree subsets of size 3? Show
level wise hierarchy of subset operation.
Page 2 of 8
Question No. 2 (CLO-2(SO-2,4)) (05+10+04+07 = 26 Marks)
a- Performance Evaluation Metrics: You are working on a machine learning project to
classify birds into two categories: Sparrow and Not-Sparrow. The table below shows the
results of a test dataset where a machine learning model was used to predict whether a
bird is a "Sparrow" or "Not-Sparrow" based on various features. The table provides the
actual labels and the model's predicted labels for 10 test cases. Construct the Confusion
Matrix according to the above scenario along with:
• Odd registration # will compute Accuracy and Recall.
• Even registration # will compute F1-Score.
TID Actual Predicted
1 Sparrow Not- Sparrow
2 Sparrow Sparrow
3 Owl Sparrow
4 Sparrow Sparrow
5 Panda Sparrow
6 Owl Sparrow
7 Sparrow Sparrow
8 Sparrow Sparrow
9 Sparrow Not- Sparrow
10 Rabbit Not- Sparrow
Page 3 of 8
b- Decision Tree: You have been given a dataset containing 14 records related to playing
football based on weather conditions.
I. Apply the ID3 algorithm (Entropy and Information Gain) to determine which attribute
from the above should be selected as the root node of the decision tree.
Page 4 of 8
Page 5 of 8
Page 6 of 8
c- Underfitting & Overfitting: Mention the comparison according to the mentioned aspects.
Aspects Underfitting Overfitting
Performance on Training Data High accuracy Low accuracy
Increase model
Reduce model complexity, feature
complexity, engineering, collect
Solutions regularization, cross- more training data,
validation, early choose more powerful
stopping model
Page 7 of 8
d- Cross Validation: Justify how the figure below is related to the concept of Cross Validation.
Additionally, name the types of Exhaustive Methods and briefly describe one.
Justification:
The image illustrates the concept of Cross Validation in machine learning. In Cross Validation,
data is divided into different subsets or “folds,” and each subset takes a turn being the test
set (used to evaluate the model) while the other subsets are used to train the model. The
process is repeated multiple times, with each subset tested once. This allows for a thorough
evaluation since the model is tested on different data each time, reducing the chances of
overfitting and providing a more accurate measure of performance. The average result from
all these tests gives a reliable performance estimate for the model.
Types of Exhaustive Methods in Cross Validation
• Leave-One-Out Cross Validation (LOOCV): Each sample in the dataset is used once
as the test set, while the remaining samples form the training set. This method is
exhaustive because it creates as many folds as there are samples in the dataset.
• Leave-P-Out Cross Validation: In this method, PPP data points are left out in each
iteration to serve as the test set, while the remaining points are used for training. This
method becomes computationally expensive as PPP increases, especially for large
datasets.
• What is the name of your semester project along with a brief description of the main aim
and idea of your proposed semester project?
• Which dataset do you plan to explore? (mention the main web source of your dataset)
• What algorithms do you intend to apply for your project, and what platform
(language/library/tool) will you use?
Page 8 of 8