Data mining algorithms - exam 23/24
Data mining algorithms - exam 23/24
CSI-6-DMA Semester 1
Question 1
Choose the best answer to each of the following questions (1 mark each):
1.1. For a given association rule, moving an item from the consequent of the rule to
the antecedent of the rule __________ the support of the association rule.
(a) never changes
(b) may change
(c) increases
(d) reduces
1.2. For a given association rule, moving an item from the antecedent of the rule to
the consequent of the rule __________ the confidence of the association rule.
(a) never increases
(b) increases
(c) reduces
(d) may increase
1.4. In a confusion matrix for a two-class classifier, the sum of all the off-diagonal
elements in the matrix is the total number of the __________ samples that have
been classified __________ by the classifier.
(a) testing, correctly
(b) testing, incorrectly
(c) training, correctly
(d) training, incorrectly
1.5. In k-fold cross-validation, each fold is used for training _____________ and testing
_____________.
(a) k times, k times
(b) once, once
(c) k-1 times, once
(d) once, k-1 times
Page 1 of 5
Data Mining and Big Data Analytics 2023/24
CSI-6-DMA Semester 1
1.7. Suppose that dataset X has 2 samples of two classes, 1 from each class, and
dataset Y has 10 samples of two classes, 5 from each class. Then the entropy
value of dataset X is __________ the entropy value of dataset Y.
(a) 10% of
(b) 40% of
(c) the same as
(d) 250% of
1.8. A boxplot, also known as a box and whisker plot, can be used to show any
outliers for a _____________ type of variables.
(a) categorical and continuous
(b) categorical and discrete
(c) numeric and both discrete and continuous
(d) both categorical and numeric
1.9. The k-means clustering algorithm can be used for which of the following tasks?
(a) Outlier and anomaly detection.
(b) Partition a sample space into several non-overlapping segments.
(c) Unsupervised classification.
(d) All of the above.
1.10. Which of the following statements is true in the context of data mining?
(a) An association rule doesn’t represent a causal relationship between items.
(b) The output of a logistic regression model indicates the likelihood
(probability) of a sample to be classified into a class.
(c) A linear regression model-based classifier can be represented in the form of a
decision tree.
(d) All of the above.
Total: 10 Marks
Page 2 of 5
Data Mining and Big Data Analytics 2023/24
CSI-6-DMA Semester 1
Question 2
(a) Write brief notes to discuss how to choose a proper minimum support threshold in
association rule analysis.
(12 marks)
(b) The scatter plot below is based on a survey on properties in an area of outer
London, where RM donates average number of rooms per dwelling, and MEDV
represents average property price in sterling. Examine the plot carefully and
discuss any patterns you may identify from the plot in terms of the relationship
between the two variables involved.
(13 marks)
Total: 25 Marks
Page 3 of 5
Data Mining and Big Data Analytics 2023/24
CSI-6-DMA Semester 1
Question 3
Give one variable as an example for each of these data types. Your answer should
include some possible values that each variable can take on.
(12 marks)
(a) Consider a dataset about road accidents in the area of London Borough of
Southwark over a certain period of time. The variables of the dataset are shown
below. Discuss what data pre-processing tasks may need to be undertaken and
explain why, if the k-means clustering algorithm is to be applied for grouping the
accidents into meaningful segments. Your answer should be clearly relevant to
these five variables.
(18 marks)
Total: 30 Marks
Page 4 of 5
Data Mining and Big Data Analytics 2023/24
CSI-6-DMA Semester 1
Question 4
A binary decision tree for a two-class classification problem has been built in a two-
dimensional space. Let 𝑥 and 𝑦 donate the two axes of the space. The decision rules
that the tree represents are as follows:
𝑰𝑭 1 ≤ 𝑥 ≤ 2, 𝑻𝒉𝒆𝒏 𝐶𝑙𝑎𝑠𝑠 0;
𝑶𝒕𝒉𝒆𝒓𝒘𝒊𝒔𝒆 𝐶𝑙𝑎𝑠𝑠 1
Suppose six samples, as shown in Figure 1., have been chosen to test the performance
of the classifier.
: Class 0
y
: Class 1
0
0 0.5 1 1.5 2 2.5 3 x
(c) Give the confusion matrix of the classifier with appropriate entries. You must
clearly show how you get your answer. You may assume Class 1 is the positive
class.
(14 marks)
(d) Calculate the accuracy and the TP (True Positive) rate of the classier. You must
clearly show how you get your answer.
(10 marks)
Total: 35 Marks
END OF PAPER
Page 5 of 5