0% found this document useful (0 votes)
157 views9 pages

CA2-Question Bank MCQ (PEC-CSBS601D)

The document contains a question bank with 53 multiple choice questions related to the subject of data mining and analytics. The questions cover topics like data mining algorithms, data warehousing concepts, and data preprocessing techniques. This appears to be a set of practice or sample questions for students.

Uploaded by

niladri47530
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
157 views9 pages

CA2-Question Bank MCQ (PEC-CSBS601D)

The document contains a question bank with 53 multiple choice questions related to the subject of data mining and analytics. The questions cover topics like data mining algorithms, data warehousing concepts, and data preprocessing techniques. This appears to be a set of practice or sample questions for students.

Uploaded by

niladri47530
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

ACADEMY OF TECHNOLOGY

Subject (Code): Data Mining & Analytics (PEC-CSBS 601D)


Question Bank
Semester/Branch: 6th Sem/ CSBS

1. The output of KDD is


(a) Data (b) Information (c) Knowledge (d) None of these

2. Support of a frequent pattern is


(a) <minsup (b) ≤ minsup (c) >minsup (d) ≥ minsup

3. Which of the following algorithm is used for pattern mining without candidate generation?

(a) Apriori (b) FP-Growth (c) Both of (a) & (b) (d) None of (a) & (b)

4. The learning which is used to find the hidden pattern in unlabeled data is called?

a) Reinforcement learning
b) Supervised learning
c) Unsupervised learning
d) None of these

5. The _______ data are stored in a data warehouse.

a) Operational
b) Historical
c) Transactional
d) Optimized

6. The Jaccard similarity between 2 sets A and B is given by-

(a) Sizeof (A Union B)/Sizeof (A Intersection B)

(b) Sizeof (A Intersection B)/Sizeof (A Union B)

(c) Sizeof (A Intersection B)

(d) Sizeof (A Union B)

7. The learning which is used to find the hidden pattern in labeled data is called?

a) Reinforcement learning
b) Supervised learning
c) Unsupervised learning
d) None of these
8. The Jaccard Similarity of two disjoint sets is-
(a) 1 (b) 0 (c) Can be any value (d) Can be any value between 0 and 1
9. The left-hand side of an association rule is called:
(a) Antecedent (b) Consequent (c) Former (d) Beginner
10. Which one of the following is known as data about data?
(a) Metadata (b) Microdata (c) Minidata (d) Multidata
11. Apriori follows downward closure property.

(a) True (b) False

12. Apriori was proposed by

(a) J. Dalton, A. Turing (b) A. Turing, R. Agrawal

(c) R. Agrawal, R. Srikant (d) J. McCarthy, R. Srikant

13. Euclidean distance between the objects P, Q and R (1,2,3), (2,1,0) is:

a) 3.22
b) 3.32
c) 3.42
d) None of the above

14. Association rule expresses relationship in the form of-

(a) if-then (b) but-yet (c) true-false (d) All of the mentioned

15. Data mining is an integral part of-

(a) DBMS (b) RDBMS (c) KDD (d) None of these

16. Which one is the application of association rule mining?

(a) Market basket analysis (b) Credit card fraud detection

(c) Medical diagnosis (d) All of the above

17. Which of the following data structure is followed by FP-growth algorithm?

(a) Trie (b) Graph (c) Linked list (d) Array

18. Which one of the following are interestingness measures?

(a) Confidence (b) Lift (c) Support (d) All of these

19. Which one is a Data Mining Technique (DMT)?

(a) Association Rule Mining (b) Clustering (c) Classification (d) All of the above
20. What does the acronym ETL stands for?

a) Explain, transfer and lead


b) Extract, transform and load
c) Extract, transfer and load
d) Effect, transfer and load

21. If Lift (a→b)>1 then it express

(a) Positive correlation (b) Negative correlation

(c) Neither of (a) & (b) (d) Both (a) & (b)

22. Expansion for DSS in DW is__________

a) Decision support system


b) Decision single system
c) Data storable system
d) Data support system

23. For an itemset, support measure expresses-

(a) Occurrence frequency (b) Occurrence time (c) Occurrence weight (d) None

24. Which one is expressed by confidence measure?

(a) Weakness of a rule (b) Strength of a rule (c) Existence of a rule (d) None

25. An item X appears in 17 transactions out of 277 transactions in a database. What is the
support of X?

(a) 0.6 (b) 0.06 (c) 1.6 (d) 1.06

26. A TDB has 500 transactions out of which X appears in 77 transactions and out of these 77

transactions another item Y is found in 35 transactions. Find the confidence of the rule

X→Y.

(a) 0.045 (b) 1.45 (c) 0.45 (d) 4.5

27. Let sup(X)=0.2, sup(Y)=0.4 and sup(XUY)=0.1. Find the lift(X,Y).

(a) 1.25 (b) 0.25 (c) 0.025 (d) 0.0025

28. Let sup( X), sup (Y) and sup (XUY) are 0.2, 0.3 and 0.1 respectively. What is the relationship
between X→Y and Y→X?

(a) X→Y=Y→X (b) X→Y≠Y→X (c) X→Y≤Y→X (d) Undetermined


29. A TDB has 6 items such as U, V, W, X, Y, Z. Out of 10 transactions U, V, W, X, Y, Z
appears in 2, 5, 7, 3, 4 and 6 transactions respectively. Let the minimum support threshold is 0.5.
Find the list of rare items.

(a) {U, X, Y, Z} (b) {X, Y, Z} (c) {U, X, Y} (d) {V, W, Z}

30. Rare itemsets occur-

(a) Frequently (b) Infrequently (c) Regularly (d) None

31. Number of items present in a 5-itemset is-

(a) 4 (b) 5 (c) 6 (d) None

32. Apriori generates a k-itemset from the join operation of two _________ itemsets.

(a) (k-1) (b) k (c) (k+1) (d) (2k-1)

33. Which type of learning is Association Rule Mining?

(a) Supervised (b) Unsupervised (c) Semi-supervised (d) None

34. What are the functions of data mining?

(a) Association analysis (b) Correlation analysis

(c) Prediction analysis (d) All of the above

35. What does OLTP stands for

(a) Offline Transaction Processing (b) Online Transaction Processing

(c) Outline Transaction Processing (d) None

36. What do data warehouses support?

(a) OLTP (b) OLAP (c) Operational databases (d) All of the above

37. What does OLAP stands for

(a) Online Advanced Processing (b) Online Analytical Processing

(c) Online Advanced Preparation (d) Online Analytical Performance

38. Which one of the following issues must be considered before investing in data mining?

(a) Compatibility (b) Functionality (c) Both (a) and (b) (d) None of the above
39. Which of the following is the right approach to Data Mining?

(a) Infrastructure, exploration, analysis, exploitation, interpretation

(b) Infrastructure, exploration, analysis, interpretation, exploitation

(c) Infrastructure, analysis, exploration, interpretation, exploitation

(d) None of these

40. Which of the following refers to the steps of the knowledge discovery process, in which the
several data sources are combined?

a) Data selection b) Data cleaning c) Data transformation d) Data Integration

41. ______________is an essential process where intelligent methods are applied to extract data
patterns.

(a) Data warehousing (b) Data mining (c) Text mining (d) Data selection

42. A data warehouse is said to contain a ‘time-varying’ collection of data because:

(a) Its contents vary automatically with time (b) Its life-span is very limited

(c) It contains historical data (d) Its content has explicit time-stamp

43. Which of the following is performed by Data Scientist?


(a) Define the question (b) Create reproducible code

(c) Challenge results (d) All of the mentioned

44. Which of the following is the most important language for Data Science?
a) Java (b) Ruby (c) R d) None of the mentioned

45. Clustering is a –

a) Supervised Learning b) Unsupervised Learning c) Reinforcement Learning d) None of these

46. The goal of clustering is to

a) Divide the data points into groups b) Classify the data point into different classes

c) Predict the output values of input data points d) All of the above

47. The content of a data warehouse is said to be ‘non-volatile’, because

a) It remains the same even after the system crashes b) Its life-span is very limited

c) It is a read-only data d) It disappears when the system is switched-off

48. The right-hand side of an association rule is called

a) Consequent b) Onset c) Antecedent d) Precedent


49. Jaccard similarity of two sets: A= {0, 1, 2, 5, 6} and B= {0, 2, 3, 4, 5, 7, 9}

a) 0.22 b) 0.33 c) 0.44 d) 0.55

50. Manhattan distance between the objects P, Q and R (1, 2, 3), (2, 1, 0) is:

a) 3 b) 4 c) 5 d) None of the above

51. Data cleaning is

a) Large collection of data mostly stored in a computer system

b) The removal of noise errors and incorrect input from a database

c) The systematic description of the syntactic structure of a specific database. It describes the structure of
the attributes the tables and foreign key relationships.

d) None of these

52. Which of the following terms is used as a synonym for data mining?

a) Knowledge discovery in databases b) Data warehousing

c) Regression analysis d) Parallel processing in databases

53. When does k-means clustering stop creating or optimizing clusters?

a) After finding no new reassignment of data points

b) After the algorithm reaches the defined number of iterations

c) Both A and B

d) None of the above

54. Data Warehousing is defined as________

a) A subject oriented integrated time variant non-volatile collection of data in support of management.

b) Selecting the right dataset for KDD

c) The real discovery stage of KDD process

d) All of these

55. A data warehouse is an ‘integrated’ collection of data because:

a) It is a collection of data of different types

b) It is a collection of data derived from multiple sources

c) It is a relational database

d) It contains summarized data


56.

a) 2 b) 4 c) 6 d) 8

57. For the same data as above, what are the number of candidate 3-itemsets and frequent 3-
itemsets respectively?

a) 1, 1 b) 2,2 c) 2,1 d) 3,2

58. Continuing with the same data, how many association rules can be derived from the frequent
itemset {A, B, E}? (Note: for a frequent itemset X, consider only rules of the form S→ (X-S),
where S is a non-empty subset of X.)

a) 3 b) 6 c) 7 d) 8

59. For the same frequent itemset as mentioned above, which among the following rules have a
minimum confidence of 60%?
a) A intersect B imply E b) A intersect E imply B

c) A imply B intersect E d) All of the above


60.

61. A data warehouse is built as a separate repository of data, different from the operational data of
an enterprise because:

a) It is necessary to keep the operational data free of any warehouse operation

b) A data warehouse cannot afford to allow corrupted data within it

c) A data warehouse contains summarized data whereas the operational database contains transactional
data

d) None of these

62. If Lift (a→b) < 1 then it expresses

(a) Positive correlation (b) Negative correlation


(c) No correlation (d) All of these

63. Which one of the following can be considered as the correct application of the data mining?

a) Fraud detection b) Corporate Analysis & Risk management

c) Management & market analysis d) All of the above

64. The term "DMQL" stands for _____

a) Data Marts Query Language


b) DBMiner Query Language
c) Data Mining Query Language
d) None of the above
65. What is needed by K-means clustering?

a) Distance metrics
b) Number of clusters
c) Initial guess to the cluster centroids
d) All of the above

66. Cluster analysis can be employed to:

a) examine a firm's product offerings relative to competition.


b) group cities into homogeneous clusters for test marketing.
c) identify buyer groups sharing similar choice criteria.
d) All of the above

67. Which of the following is the Euclidean distance between the two data points:
A (1,3) and B (2,3)?
a) 2
b) 4
c) 8
d) 1

68. Which of the following applied to the warehouse?


a) Write only
b) Read only
c) Both (a) and (b)
d) None of these

69. How many steps are in Association Rule Mining (ARM)?


a) 1 b) 2 c) 3 d) 4

70. If Lift (a→b) = 1 then it expresses


(a) Positive correlation (b) Negative correlation
(c) No correlation (d) All of these

You might also like