0% found this document useful (0 votes)
22 views

ML SP24 Mid Term Exam - Solution (1)

The document outlines the Mid-Term Examination for the Machine Learning course at COMSATS University Islamabad, detailing the exam structure, including questions on Market Basket Analysis, performance evaluation metrics, decision trees, underfitting vs. overfitting, and cross-validation methods. It specifies the maximum marks, time allowed, and instructions for answering. Additionally, it includes a section for students to describe their semester project, including the dataset and algorithms they plan to use.

Uploaded by

Hanzala Shafique
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

ML SP24 Mid Term Exam - Solution (1)

The document outlines the Mid-Term Examination for the Machine Learning course at COMSATS University Islamabad, detailing the exam structure, including questions on Market Basket Analysis, performance evaluation metrics, decision trees, underfitting vs. overfitting, and cross-validation methods. It specifies the maximum marks, time allowed, and instructions for answering. Additionally, it includes a section for students to describe their semester project, including the dataset and algorithms they plan to use.

Uploaded by

Hanzala Shafique
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

COMSATS University Islamabad, Wah Campus

Mid-Term Examination Fall 2024


Department of Computer Science

Program(s)/Classes: BCS 6 A Date: 29th Oct 2024


Subject: Machine Learning (CSC354) Maximum Marks: 50 Marks
Instructor Name(s): Prof. Dr Sheraz Anjum Total Time Allowed: 1.5 hr

Note: Solve the questions on separately provided Answer Sheet. Attempt all questions.
Question No. 1 (CLO-1(SO-1)) (05+05+08 = 18 Marks)
a- Please compute & fill in the table below with information relevant to Market Basket Analysis.

Items Rules Confidence Support


TID
1 Apple, Banana, Carrot {Apple} -> {Date} 2/3 = 0.67 2/5 = 0.4

2 Apple, Carrot, Date


{Carrot} -> {Apple} 2/4 = 0.5 2/5 = 0.4
3 Banana, Carrot, Date
4 Apple, Date, Eggs
{ Banana, Carrot } -> {Date} 1/3 = 0.2 1/5 = 0.33
5 Banana, Carrot, Eggs

b- Construct the FP-Growth trees (only) for the modified database given below.
Modified database after eliminating the
Frequency of each item in sorted order non-frequent items and reorganizing

Page 1 of 8
c- Given a transaction t = 1 2 3 5 6 , what are the possible Hash Tree subsets of size 3? Show
level wise hierarchy of subset operation.

Page 2 of 8
Question No. 2 (CLO-2(SO-2,4)) (05+10+04+07 = 26 Marks)
a- Performance Evaluation Metrics: You are working on a machine learning project to
classify birds into two categories: Sparrow and Not-Sparrow. The table below shows the
results of a test dataset where a machine learning model was used to predict whether a
bird is a "Sparrow" or "Not-Sparrow" based on various features. The table provides the
actual labels and the model's predicted labels for 10 test cases. Construct the Confusion
Matrix according to the above scenario along with:
• Odd registration # will compute Accuracy and Recall.
• Even registration # will compute F1-Score.
TID Actual Predicted
1 Sparrow Not- Sparrow
2 Sparrow Sparrow
3 Owl Sparrow
4 Sparrow Sparrow
5 Panda Sparrow
6 Owl Sparrow
7 Sparrow Sparrow
8 Sparrow Sparrow
9 Sparrow Not- Sparrow
10 Rabbit Not- Sparrow

Page 3 of 8
b- Decision Tree: You have been given a dataset containing 14 records related to playing
football based on weather conditions.
I. Apply the ID3 algorithm (Entropy and Information Gain) to determine which attribute
from the above should be selected as the root node of the decision tree.

Day Weather Temperature Humidity Wind Play Football?


1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
3 Cloudy Hot High Weak Yes
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
7 Cloudy Cool Normal Strong Yes
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
10 Rain Mild Normal Weak Yes
11 Sunny Mild Normal Strong Yes
12 Cloudy Mild High Strong Yes
13 Cloudy Hot Normal Weak Yes
14 Rain Mild High Strong No

Page 4 of 8
Page 5 of 8
Page 6 of 8
c- Underfitting & Overfitting: Mention the comparison according to the mentioned aspects.
Aspects Underfitting Overfitting
Performance on Training Data High accuracy Low accuracy

Bias - Variance Low - High High - Low

Model Complexity Too high Too low

Increase model
Reduce model complexity, feature
complexity, engineering, collect
Solutions regularization, cross- more training data,
validation, early choose more powerful
stopping model

Page 7 of 8
d- Cross Validation: Justify how the figure below is related to the concept of Cross Validation.
Additionally, name the types of Exhaustive Methods and briefly describe one.

Justification:
The image illustrates the concept of Cross Validation in machine learning. In Cross Validation,
data is divided into different subsets or “folds,” and each subset takes a turn being the test
set (used to evaluate the model) while the other subsets are used to train the model. The
process is repeated multiple times, with each subset tested once. This allows for a thorough
evaluation since the model is tested on different data each time, reducing the chances of
overfitting and providing a more accurate measure of performance. The average result from
all these tests gives a reliable performance estimate for the model.
Types of Exhaustive Methods in Cross Validation
• Leave-One-Out Cross Validation (LOOCV): Each sample in the dataset is used once
as the test set, while the remaining samples form the training set. This method is
exhaustive because it creates as many folds as there are samples in the dataset.
• Leave-P-Out Cross Validation: In this method, PPP data points are left out in each
iteration to serve as the test set, while the remaining points are used for training. This
method becomes computationally expensive as PPP increases, especially for large
datasets.

Question No. 4 (CLO-5(SO-2,3,4,5)) (06 Marks)

• What is the name of your semester project along with a brief description of the main aim
and idea of your proposed semester project?
• Which dataset do you plan to explore? (mention the main web source of your dataset)
• What algorithms do you intend to apply for your project, and what platform
(language/library/tool) will you use?

***Success is not measured by how well you cheat,


but by how honestly you strive***

Page 8 of 8

You might also like