Data Mining Question Bank
Data Mining Question Bank
UNIT-1
Q.1 What is data Mining ?
Q.2 Explain the differences between Knowledge discovery and data mining.
Q.5 What is the relation between data warehousing and data mining?
Q.7 What type of benefit you might hope to get from data mining?
Q.12As a bank manager, how would you decide whether to give loan to an applicant or not?
Q.13 What steps you would follow to identify a fraud for a credit card company.
Q.14 Explain the differences between “ Explorative Data Mining” and “Predictive Data
Mining” and give one example of each.
Q.15 State three different application for which data mining techniques seem appropriate.
Informally explain each application.
Q.16 Explain briefly the differences between “classification” and ‘’clustering” and give an
informal example of an application that would benefit from each techniques.
(b) regression
(c) Clustering
(d) Smoothing
(e) Generalization
(f) Aggregation
Q.24 How is data warehouse different from a database? How are they similar?
Q.25 Can you briefly describe the four stages of knowledge discovery(KDD)? Can you
describe the multi-tiered data warehouse architecture?
UNIT-2
Q.1 A data set for analysis includes only one attribute X:
X={ 7,12,5,8,5,9,13,12,19,7,12,12,13,3,4,5,13,8,7,6}
Q.3 What do you mean by Market Basket analysis and how it can help in a supermarket?
Q.4 Explain whether association rule mining is supervised or unsupervised type of learning.
Q.7 The heights of players of a school’s basket ball team are 72”,74”,70”,78”,75” and 70”.
Find the mean height.
Q.8The batting averages for members of a basket ball team are 0.234, 0.256, 0.321, 0.333,
0.290. Find the median batting average.
Q.9Consider the Data set D. Given the minimum support2, apply apriori algorithm on this dataset.
Transaction ID Items
100 A,C,D
200 B,C,E
300 A,B,C,E
400 B,E
Q.10 Describe example of data set for which apriori check would actually increase the cost?
By describe I mean either show an instance of the data set or describe how would it look like.
Q.11Same question for MaxMiner. When does MaxMiner perform worse than apriori. How
does MaxMiner generate the frequency counts for every itemset which meets support
constraints?
Q.12 Describe a data set for which sampling would actually increase the amount of work. In
other words it would be faster to work on full data set.
Q.15 Under what conditions AVG(Salary) > 100K would be downward closed; upward
closed?
Q.16 Assume that each item in supermarket is bought by 1% of transactions. Assume that
there are 10 million transactions and that items are statistically independent. Assume mid-sup
= 10. What is the expected size of a frequent set? What is the expected number of frequent
sets?
Q.17 Suppose that you have data describing the closing prices of the stock you own for the
last 1000 days. Suppose you are interested in generating all rules which tell you about
chances of your stock going up on a given day provided you know the pattern (up or down)
on K preceding days, with some minsup and minconf defined. How would you model this
problem as association rule mining problem, is there a way to represent this as transactions
with binary attributes like in the supermarket case?
Q.18 (i) With a neat sketch explain the architecture of a data warehouse
(ii) Discuss the typical OLAP operations with an example.
Q.19 (i) Discuss how computations can be performed efficiently on data cubes.
(ii) Write short notes on data warehouse meta data.
Q.22 (a) Write and explain the algorithm for mining frequent item sets without candidate
generation. Give relevant example.
Q.23 Discuss the approaches for mining multi level association rules from the transactional
databases. Give relevant example.
Q.24 (i) Explain the algorithm for constructing a decision tree from training samples.
(ii) Explain Bayes theorem.
UNIT-3
Q.1 Classification is supervised learning. Justify.
Q.3 Entropy is an important concept in information theory. Explain its significance in mining
context.
Q.4 What are over fitted models? Explain their effects on performance.
Q.7 What are the advantages and disadvantages of decision tress over other classification
methods?
UNIT-4
Q.22 How does a snowflake schema differ from a star schema ? Name two advantages and
two disadvantages of the snowflake schema.
Q.25 Why is the entity-relationship modelling technique not suitable for the data warehouse.
UNIT-5
Q.1 How is Data Mining different from OLAP? Explain Briefly.
Q.2 Is the data warehouse a prerequisite for data mining? Does the Data warehouse helps
data mining. If so in what ways?
Q.3 List out few common provisions to be found in a good security policy.
Q.4 Give reasons why the data warehouse must be back up. How is this different from an
OLTP system.
Q.5 How do the statics help to find tuning the data warehouse.
Q.7 List out five reasons why you think data quality is critical in a Data Warehouse.
Q.8 Explain how Data Quality is much more than just Data Accuracy. Give an example.
Q.11 How does the data warehouse differ from an operational system in uses and value.
Q.12 State Dr. Codd’s guidelines for OLAP system, giving a brief description for each.
Q 13Name any three advantages of the STAR schema . Can you think of any disadvantages
of STAR Schema.
Q16 Describe the composition of primary keys for the dimension and fact table.
Q 19 Discuss The major design issues that need to be addressed before proceeding with the
data design.
Q 20 Name four distinguishing characteristics of DATA WAREHOUSE architecture.
Describe each briefly.
Q22 what are three major areas in the data warehouse. Is this a logical divison, If so , why do
you think so, Relate the architectural components to the three major areas.
Q.23 what are the similarities and differences between data warehouse & Database.