0% found this document useful (0 votes)

4 views5 pages

Data mining algorithms - exam 23/24

Uploaded by

mau.spires

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views5 pages

Data mining algorithms - exam 23/24

Uploaded by

mau.spires

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Data Mining and Big Data Analytics 2023/24

CSI-6-DMA Semester 1

Question 1

Choose the best answer to each of the following questions (1 mark each):

1.1. For a given association rule, moving an item from the consequent of the rule to
the antecedent of the rule __________ the support of the association rule.
(a) never changes
(b) may change
(c) increases
(d) reduces

1.2. For a given association rule, moving an item from the antecedent of the rule to
the consequent of the rule __________ the confidence of the association rule.
(a) never increases
(b) increases
(c) reduces
(d) may increase

1.3. Given three itemsets X, Y, and Z, where 𝑋 ⊂ 𝑌 ⊂ 𝑍. If Y is infrequent, then

__________.
(a) both 𝑋 and 𝑍 are infrequent
(b) 𝑋 is either frequent or infrequent
(c) both 𝑋 and 𝑍 are frequent
(d) 𝑍 is either frequent or infrequent

1.4. In a confusion matrix for a two-class classifier, the sum of all the off-diagonal
elements in the matrix is the total number of the __________ samples that have
been classified __________ by the classifier.
(a) testing, correctly
(b) testing, incorrectly
(c) training, correctly
(d) training, incorrectly

1.5. In k-fold cross-validation, each fold is used for training _____________ and testing
_____________.
(a) k times, k times
(b) once, once
(c) k-1 times, once
(d) once, k-1 times

1.6. In modelling, a given dataset is usually divided into .

(a) descriptive, validation and test subsets
(b) descriptive, training and testing subsets
(c) predictive, training and testing subsets
(d) predictive, testing and validation subsets

Page 1 of 5
Data Mining and Big Data Analytics 2023/24
CSI-6-DMA Semester 1

1.7. Suppose that dataset X has 2 samples of two classes, 1 from each class, and
dataset Y has 10 samples of two classes, 5 from each class. Then the entropy
value of dataset X is __________ the entropy value of dataset Y.
(a) 10% of
(b) 40% of
(c) the same as
(d) 250% of

1.8. A boxplot, also known as a box and whisker plot, can be used to show any
outliers for a _____________ type of variables.
(a) categorical and continuous
(b) categorical and discrete
(c) numeric and both discrete and continuous
(d) both categorical and numeric

1.9. The k-means clustering algorithm can be used for which of the following tasks?
(a) Outlier and anomaly detection.
(b) Partition a sample space into several non-overlapping segments.
(c) Unsupervised classification.
(d) All of the above.

1.10. Which of the following statements is true in the context of data mining?
(a) An association rule doesn’t represent a causal relationship between items.
(b) The output of a logistic regression model indicates the likelihood
(probability) of a sample to be classified into a class.
(c) A linear regression model-based classifier can be represented in the form of a
decision tree.
(d) All of the above.

Total: 10 Marks

Page 2 of 5
Data Mining and Big Data Analytics 2023/24
CSI-6-DMA Semester 1

Question 2

(a) Write brief notes to discuss how to choose a proper minimum support threshold in
association rule analysis.
(12 marks)

(b) The scatter plot below is based on a survey on properties in an area of outer
London, where RM donates average number of rooms per dwelling, and MEDV
represents average property price in sterling. Examine the plot carefully and
discuss any patterns you may identify from the plot in terms of the relationship
between the two variables involved.

(13 marks)

Total: 25 Marks

Page 3 of 5
Data Mining and Big Data Analytics 2023/24
CSI-6-DMA Semester 1

Question 3

(a) Consider the following data types:

a. Nominal and binary.
b. Ordinal and binary.
c. Interval and continuous.
d. Ratio and continuous.

Give one variable as an example for each of these data types. Your answer should
include some possible values that each variable can take on.
(12 marks)

(a) Consider a dataset about road accidents in the area of London Borough of
Southwark over a certain period of time. The variables of the dataset are shown
below. Discuss what data pre-processing tasks may need to be undertaken and
explain why, if the k-means clustering algorithm is to be applied for grouping the
accidents into meaningful segments. Your answer should be clearly relevant to
these five variables.

Value range if numeric

Variable Data
Variable Description variable or distinct values if
Name Type
categorical variable
ACC_ID Accident ID Nominal Sequential integer number
Level of accident
S_LEVEL Ordinal Slight, Serious, Fatal
severity
Number of vehicles in
VECL Ratio 1 – 16
an accident
Roundabout, Slip Road,
J_ DTL Junction detail Nominal
Crossroads
COST Cost of an accident (£) Ratio 100.00 – 200,000.00

(18 marks)

Total: 30 Marks

Page 4 of 5
Data Mining and Big Data Analytics 2023/24
CSI-6-DMA Semester 1

Question 4

A binary decision tree for a two-class classification problem has been built in a two-
dimensional space. Let 𝑥 and 𝑦 donate the two axes of the space. The decision rules
that the tree represents are as follows:

𝑰𝑭 1 ≤ 𝑥 ≤ 2, 𝑻𝒉𝒆𝒏 𝐶𝑙𝑎𝑠𝑠 0;
𝑶𝒕𝒉𝒆𝒓𝒘𝒊𝒔𝒆 𝐶𝑙𝑎𝑠𝑠 1

Suppose six samples, as shown in Figure 1., have been chosen to test the performance
of the classifier.
: Class 0
y
: Class 1

0
0 0.5 1 1.5 2 2.5 3 x

Figure 1. Test samples with their class

labels.
(a) What is the entropy value of the dataset used for testing the classifier’s
performance? You must clearly show how you get your answer.
(3 marks)

(b) Sketch the binary decision tree.

(8 marks)

(c) Give the confusion matrix of the classifier with appropriate entries. You must
clearly show how you get your answer. You may assume Class 1 is the positive
class.
(14 marks)

(d) Calculate the accuracy and the TP (True Positive) rate of the classier. You must
clearly show how you get your answer.
(10 marks)

Total: 35 Marks

END OF PAPER
Page 5 of 5

WQD7005 (Alternative Assessment)
100% (1)
WQD7005 (Alternative Assessment)
4 pages
Predictive Modeling MCQs IMT
100% (1)
Predictive Modeling MCQs IMT
19 pages
Data Mining Exam Questions
No ratings yet
Data Mining Exam Questions
25 pages
Pathology -Dr.Priyanka Sachdev - (17 Oct 23)
No ratings yet
Pathology -Dr.Priyanka Sachdev - (17 Oct 23)
188 pages
Introduction To Data Mining Instructors Solution Manual 1st Ed Tan download
No ratings yet
Introduction To Data Mining Instructors Solution Manual 1st Ed Tan download
54 pages
Exam Advanced Data Mining Date: 5-11-2009 Time: 14.00-17.00: General Remarks
100% (1)
Exam Advanced Data Mining Date: 5-11-2009 Time: 14.00-17.00: General Remarks
5 pages
MCQ
100% (7)
MCQ
37 pages
Data Analytics Questions
No ratings yet
Data Analytics Questions
40 pages
Data mining algorithms - exam 22/23
No ratings yet
Data mining algorithms - exam 22/23
5 pages
SMAI Question Papers
No ratings yet
SMAI Question Papers
13 pages
Data Mining
No ratings yet
Data Mining
7 pages
DM_Practice_Problem_Set-2
No ratings yet
DM_Practice_Problem_Set-2
7 pages
HW 2
No ratings yet
HW 2
7 pages
DM 2019
No ratings yet
DM 2019
7 pages
3rd_data(1) (1)
No ratings yet
3rd_data(1) (1)
18 pages
Exam-dm1-121017-ans
No ratings yet
Exam-dm1-121017-ans
8 pages
Answer Midterm Exam Data Mining1 2021 - 2022
100% (1)
Answer Midterm Exam Data Mining1 2021 - 2022
4 pages
Final Exam BWA44603
No ratings yet
Final Exam BWA44603
4 pages
Business Intelligence and Analytics: Systems For Decision Support, 10e (Sharda) Chapter 5 Data Mining
100% (1)
Business Intelligence and Analytics: Systems For Decision Support, 10e (Sharda) Chapter 5 Data Mining
13 pages
data_mining_end_23_24
No ratings yet
data_mining_end_23_24
2 pages
DM QB
No ratings yet
DM QB
7 pages
Data Final
No ratings yet
Data Final
17 pages
Exam SRM Sample Questions
No ratings yet
Exam SRM Sample Questions
69 pages
Uct633 Mst e Mar25
No ratings yet
Uct633 Mst e Mar25
2 pages
Sample Questions
No ratings yet
Sample Questions
51 pages
Multiple Choice Questions
No ratings yet
Multiple Choice Questions
56 pages
DM-I Q Paper 2024
No ratings yet
DM-I Q Paper 2024
12 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
13 pages
Semester Two Examinations 2023 DATA7703
No ratings yet
Semester Two Examinations 2023 DATA7703
15 pages
C-3 Pap365er
No ratings yet
C-3 Pap365er
4 pages
mcq-on-data-mining
No ratings yet
mcq-on-data-mining
20 pages
ML_MCQs_Set
No ratings yet
ML_MCQs_Set
18 pages
Assignment Data Mining
No ratings yet
Assignment Data Mining
27 pages
Exam SRM Sample Questions
No ratings yet
Exam SRM Sample Questions
71 pages
BD Chapter 5
No ratings yet
BD Chapter 5
14 pages
III Yr B.Tech. - Computer Science & Engineering/Information Technology Data Mining
No ratings yet
III Yr B.Tech. - Computer Science & Engineering/Information Technology Data Mining
2 pages
Data Warehousing and Mining
No ratings yet
Data Warehousing and Mining
4 pages
Mid Semester Regular-DM
No ratings yet
Mid Semester Regular-DM
3 pages
WQD7005 (Alternative Assessment)
No ratings yet
WQD7005 (Alternative Assessment)
4 pages
Dcs 7302
No ratings yet
Dcs 7302
17 pages
212 Final-Solution
No ratings yet
212 Final-Solution
23 pages
Questions-For-Data-Mining-2020 Eng Marwan
No ratings yet
Questions-For-Data-Mining-2020 Eng Marwan
19 pages
DataMining - Workbook MCQ
No ratings yet
DataMining - Workbook MCQ
16 pages
Data Science Cse
No ratings yet
Data Science Cse
24 pages
DDA3020 22
No ratings yet
DDA3020 22
4 pages
3 Marks Dobara
No ratings yet
3 Marks Dobara
6 pages
Isp565 - Its665 Feb 22
No ratings yet
Isp565 - Its665 Feb 22
17 pages
Data Mining - Sem 3 - Assignment - 2
No ratings yet
Data Mining - Sem 3 - Assignment - 2
5 pages
Think faster focus better and remember more Rewiring our brain to stay younger...
No ratings yet
Think faster focus better and remember more Rewiring our brain to stay younger...
34 pages
Assignment 6: Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 6: Introduction To Machine Learning Prof. B. Ravindran
3 pages
B._Sc._H_Computer_S_3OWYH6v
No ratings yet
B._Sc._H_Computer_S_3OWYH6v
6 pages
Idaman Pharma Magnesium Trisilicate Tablet Compound
No ratings yet
Idaman Pharma Magnesium Trisilicate Tablet Compound
3 pages
quiz2_B
No ratings yet
quiz2_B
6 pages
Machine Learning Foundations and Applications Assignment 1 Due Date: 10 October, 2021
No ratings yet
Machine Learning Foundations and Applications Assignment 1 Due Date: 10 October, 2021
3 pages
MBA-CM - Kemilembe Bais - 2017
No ratings yet
MBA-CM - Kemilembe Bais - 2017
100 pages
20 Years Speciliased Pyq Garima Goel Biological Classification
No ratings yet
20 Years Speciliased Pyq Garima Goel Biological Classification
14 pages
History of Management Thought
No ratings yet
History of Management Thought
41 pages
ARTEMIS Road Model Description V04d 071008
No ratings yet
ARTEMIS Road Model Description V04d 071008
169 pages
UNIT 1 Practice Quiz - MCQs - ML
100% (1)
UNIT 1 Practice Quiz - MCQs - ML
10 pages
Lecture 5-Complications of Diabetes
No ratings yet
Lecture 5-Complications of Diabetes
28 pages
Final Exam, Data Mining (CEN 871) : Name Surname: Student's ID
No ratings yet
Final Exam, Data Mining (CEN 871) : Name Surname: Student's ID
2 pages
data analytic mcq
No ratings yet
data analytic mcq
5 pages
Ballast Water Management Plan
No ratings yet
Ballast Water Management Plan
51 pages
FRCC PC 015 A7 02-MLM-MLZ-April2015
No ratings yet
FRCC PC 015 A7 02-MLM-MLZ-April2015
52 pages
Canadian Customs Tariff Schedule - HS 72 Iron and Steel
No ratings yet
Canadian Customs Tariff Schedule - HS 72 Iron and Steel
33 pages
(Fall 2011) CS-402 Data Mining - Final Exam-SUB - v03
No ratings yet
(Fall 2011) CS-402 Data Mining - Final Exam-SUB - v03
6 pages
IS328 Final Exam
No ratings yet
IS328 Final Exam
12 pages
640005
No ratings yet
640005
4 pages
KEEP TALKING 1 UNIT 9 - QUESTION TAGS
No ratings yet
KEEP TALKING 1 UNIT 9 - QUESTION TAGS
3 pages
unit test 3
No ratings yet
unit test 3
2 pages
EOY - Timetable - 2022 - SCH Website PDF
No ratings yet
EOY - Timetable - 2022 - SCH Website PDF
9 pages
CA2-Question Bank MCQ (PEC-CSBS601D)
No ratings yet
CA2-Question Bank MCQ (PEC-CSBS601D)
9 pages
Teaching Different Levels
No ratings yet
Teaching Different Levels
2 pages
East West University
No ratings yet
East West University
23 pages
TVL Empowerment Technologies q3 m4
No ratings yet
TVL Empowerment Technologies q3 m4
15 pages
Soal CISDM
No ratings yet
Soal CISDM
3 pages
MANTRA-MONSOON PREPRADNESS PLAN
No ratings yet
MANTRA-MONSOON PREPRADNESS PLAN
12 pages
Elevation Depression
No ratings yet
Elevation Depression
7 pages
Final QUIZ TAYO SA ICT
No ratings yet
Final QUIZ TAYO SA ICT
4 pages
What Is An Interview?
No ratings yet
What Is An Interview?
37 pages
Modern Mining Company - Calcined Petroleum Coke Plant - FEED STAGE
No ratings yet
Modern Mining Company - Calcined Petroleum Coke Plant - FEED STAGE
5 pages
CH1-E3-E4 CM-3G Concept
100% (1)
CH1-E3-E4 CM-3G Concept
19 pages
Objectives of Activity Planning
No ratings yet
Objectives of Activity Planning
13 pages
Flusser Vilem 1995 Three Essays and An Introduction
No ratings yet
Flusser Vilem 1995 Three Essays and An Introduction
12 pages
zOS' Address Space - Virtual Storage Layout
100% (1)
zOS' Address Space - Virtual Storage Layout
9 pages
Contract For International Sale of Goods: Seller Cataclysm Equipment LTD
No ratings yet
Contract For International Sale of Goods: Seller Cataclysm Equipment LTD
6 pages
Product Datasheet: Circuit Breaker Compact NSX250H - TMD - 250 A - 3 Poles 3d
No ratings yet
Product Datasheet: Circuit Breaker Compact NSX250H - TMD - 250 A - 3 Poles 3d
2 pages
Handling Suspicious Items and Packages in Hotels
No ratings yet
Handling Suspicious Items and Packages in Hotels
3 pages
Activity 2: Assignment Content
No ratings yet
Activity 2: Assignment Content
2 pages
Constraint Networks: Targeting Simplicity for Techniques and Algorithms
From Everand
Constraint Networks: Targeting Simplicity for Techniques and Algorithms
Christophe Lecoutre
No ratings yet