CA2-Question Bank MCQ (PEC-CSBS601D)
CA2-Question Bank MCQ (PEC-CSBS601D)
3. Which of the following algorithm is used for pattern mining without candidate generation?
(a) Apriori (b) FP-Growth (c) Both of (a) & (b) (d) None of (a) & (b)
4. The learning which is used to find the hidden pattern in unlabeled data is called?
a) Reinforcement learning
b) Supervised learning
c) Unsupervised learning
d) None of these
a) Operational
b) Historical
c) Transactional
d) Optimized
7. The learning which is used to find the hidden pattern in labeled data is called?
a) Reinforcement learning
b) Supervised learning
c) Unsupervised learning
d) None of these
8. The Jaccard Similarity of two disjoint sets is-
(a) 1 (b) 0 (c) Can be any value (d) Can be any value between 0 and 1
9. The left-hand side of an association rule is called:
(a) Antecedent (b) Consequent (c) Former (d) Beginner
10. Which one of the following is known as data about data?
(a) Metadata (b) Microdata (c) Minidata (d) Multidata
11. Apriori follows downward closure property.
13. Euclidean distance between the objects P, Q and R (1,2,3), (2,1,0) is:
a) 3.22
b) 3.32
c) 3.42
d) None of the above
(a) if-then (b) but-yet (c) true-false (d) All of the mentioned
(a) Association Rule Mining (b) Clustering (c) Classification (d) All of the above
20. What does the acronym ETL stands for?
(c) Neither of (a) & (b) (d) Both (a) & (b)
(a) Occurrence frequency (b) Occurrence time (c) Occurrence weight (d) None
(a) Weakness of a rule (b) Strength of a rule (c) Existence of a rule (d) None
25. An item X appears in 17 transactions out of 277 transactions in a database. What is the
support of X?
26. A TDB has 500 transactions out of which X appears in 77 transactions and out of these 77
transactions another item Y is found in 35 transactions. Find the confidence of the rule
X→Y.
28. Let sup( X), sup (Y) and sup (XUY) are 0.2, 0.3 and 0.1 respectively. What is the relationship
between X→Y and Y→X?
32. Apriori generates a k-itemset from the join operation of two _________ itemsets.
(a) OLTP (b) OLAP (c) Operational databases (d) All of the above
38. Which one of the following issues must be considered before investing in data mining?
(a) Compatibility (b) Functionality (c) Both (a) and (b) (d) None of the above
39. Which of the following is the right approach to Data Mining?
40. Which of the following refers to the steps of the knowledge discovery process, in which the
several data sources are combined?
41. ______________is an essential process where intelligent methods are applied to extract data
patterns.
(a) Data warehousing (b) Data mining (c) Text mining (d) Data selection
(a) Its contents vary automatically with time (b) Its life-span is very limited
(c) It contains historical data (d) Its content has explicit time-stamp
44. Which of the following is the most important language for Data Science?
a) Java (b) Ruby (c) R d) None of the mentioned
45. Clustering is a –
a) Divide the data points into groups b) Classify the data point into different classes
c) Predict the output values of input data points d) All of the above
a) It remains the same even after the system crashes b) Its life-span is very limited
50. Manhattan distance between the objects P, Q and R (1, 2, 3), (2, 1, 0) is:
c) The systematic description of the syntactic structure of a specific database. It describes the structure of
the attributes the tables and foreign key relationships.
d) None of these
52. Which of the following terms is used as a synonym for data mining?
c) Both A and B
a) A subject oriented integrated time variant non-volatile collection of data in support of management.
d) All of these
c) It is a relational database
a) 2 b) 4 c) 6 d) 8
57. For the same data as above, what are the number of candidate 3-itemsets and frequent 3-
itemsets respectively?
58. Continuing with the same data, how many association rules can be derived from the frequent
itemset {A, B, E}? (Note: for a frequent itemset X, consider only rules of the form S→ (X-S),
where S is a non-empty subset of X.)
a) 3 b) 6 c) 7 d) 8
59. For the same frequent itemset as mentioned above, which among the following rules have a
minimum confidence of 60%?
a) A intersect B imply E b) A intersect E imply B
61. A data warehouse is built as a separate repository of data, different from the operational data of
an enterprise because:
c) A data warehouse contains summarized data whereas the operational database contains transactional
data
d) None of these
63. Which one of the following can be considered as the correct application of the data mining?
a) Distance metrics
b) Number of clusters
c) Initial guess to the cluster centroids
d) All of the above
67. Which of the following is the Euclidean distance between the two data points:
A (1,3) and B (2,3)?
a) 2
b) 4
c) 8
d) 1