Mock Exam Paper
Mock Exam Paper
Instructions to Candidates
You are advised (but not required) to spend the first ten minutes of the examination
reading the questions and planning how you will answer those you have selected.
2. You’re training an ensemble model and notice that the validation error is significant.
lower than the training error. Name two possible reasons for this to happen.
(5 marks)
3. Give one similarity and one difference between feature selection and principal
component analysis (PCA).
(6 marks)
4. Which of the following tends to work best on small datasets? Explain briefly.
a. Logistic Regression
b. K-nearest neighbor
(9 marks)
QUESTION 2 (Total 25 marks)
1. Regression can be performed with categorical and continuous variables. Briefly explain.
(6 marks)
2. Both PCA and linear regression can be thought of as algorithms for minimizing a sum of
squared errors. Explain which error is being minimized in each algorithm.
(4 marks)
3. Explain what effect the following operations have on the bias and variance of your
model will.
a. Regularizing the weights in a linear regression model
b. Regularizing the weights in a logistic regression model
c. Pruning a decision tree
(9 marks)
4. List three potential real-world scenarios where logistic regression has been applied.
(6 marks)
Page 2 of 4
Module Code: CM2604 Exam Period: TBD
Module Title: Machine Learning
1. Suppose we are using a linear SVM with some large C value and are given the following
dataset. Draw the decision boundary of linear SVM. Provide a brief explanation.
(6 marks)
2. The following dataset is used to learn a decision tree which predicts if people pass
machine learning (Yes or No), based on their previous GPA (High, Medium, or Low) and
whether or not they studied. Draw the full decision tree that would be learned for this
dataset.
(12 marks)
Page 3 of 4
Module Code: CM2604 Exam Period: TBD
Module Title: Machine Learning
4. Briefly explain the usage of Apriori function in association mining using an example.
(4 marks)
1. What is data imbalance in machine learning? What practical strategies can be proposed
for mitigating the data imbalance scenario in machine learning? Briefly discuss.
(6 marks)
2. Suppose you are going to release a new dataset by applying resampling strategies to an
existing dataset which is suffering from class imbalance issue. Critically discuss the
ethical impact of this process.
(4 marks)
Predicted Class
Corpus
Tree Bush Grass
Tree 8 2 6
Actual Class Bush 6 7 5
Grass 10 8 4
Page 4 of 4