Data Mining Imp Solutions
Data Mining Imp Solutions
Q1) What is the role of data cube computation and exploration in data warehousing?
Ans:- In cube computation, aggregation is performed on the tuples (or cells) that share
the same set of dimension values. Thus it is important to explore sorting, hashing,
and grouping operations to access and group such data together to facilitate
computation of such aggregates.
Ans:- The main difference between data warehousing and data mining is that data
warehousing is the process of compiling and organizing data into one common
database, whereas data mining is the process of extracting meaningful data from that
database. Data mining can only be done once data warehousing is complete.
Ans:- A support vector machine (SVM) is machine learning algorithm that analyzes
data for classification and regression analysis. SVM is a supervised learning method
that looks at data and sorts it into one of two categories. An SVM outputs a map of the
sorted data with the margins between the two as far apart as possible. SVMs are used
in text categorization, image classification, handwriting recognition and in the sciences.
Ans:- Star and Snowflake Schema in Data Warehouse with Model Examples
(guru99.com)
Q7) Define KDD. Identify and describe the phases in KDD process. Eludicate the key
differences between KDD versus data mining.
KDD Process in Data Mining: What You Need To Know? | upGrad blog
KDD is the overall process of extracting knowledge from data while Data Mining is
a step inside the KDD process, which deals with identifying patterns in data. In other
words, Data Mining is only the application of a specific algorithm based on the overall
goal of the KDD process.
Ans:- Classification in data mining is a common technique that separates data points
into different classes. It allows you to organize data sets of all sorts, including complex
and large datasets as well as small and simple ones.
Ans) The Slice operation takes one specific dimension from a cube given and
represents a new sub-cube which provides information from another point of view.
The Dice operation in the contrary emphasizes two or more dimensions from a cube.
Q11) Describe various methods for data cube materialization.
Ans) Data cube operations are used to manipulate data to meet the needs of users.
These operations help to select particular data for the analysis purpose. There are
mainly 5 operations listed below-
Roll-up: operation and aggregate certain similar data attributes having the same
dimension together. For example, if the data cube displays the daily income of a
customer, we can use a roll-up operation to find the monthly income of his salary.
Dicing: this operation does a multidimensional cutting, that not only cuts only one
dimension but also can go to another dimension and cut a certain range of it. As a
result, it looks more like a subcube out of the whole cube(as depicted in the figure).
For example- the user wants to see the annual salary of Jharkhand state
employees.
Pivot: this operation is very important from a viewing point of view. It basically
transforms the data cube in terms of view. It doesn’t change the data present in the
data cube. For example, if the user is comparing year versus branch, using the
pivot operation, the user can change the viewpoint and now compare branch
versus item type.
Q12) What is the difference between agglomerative and divisive approach of clustering?
Ans) Information gain is the amount of information that's gained by knowing the
value of the attribute, which is the entropy of the distribution before the split minus the
entropy of the distribution after it.
Q14) What are genetic algorithms?
Ans) Bitmap indexing is a type of database indexing built on a single key. It uses
bitmaps. Bitmap indexing is used for large databases in which the cardinality of columns
is very low. And those columns are frequently used in the query. It retrieves data quickly
for low cardinality columns in massive databases.
Q17) Discuss ROLAP, MOLAP and HOLAP servers involved in data warehouse.
Ans) Data mining is the process of extracting and discovering patterns in large data
sets involving methods at the intersection of machine learning, statistics, and database
systems.[1] Data mining is an interdisciplinary subfield of computer
science and statistics with an overall goal of extracting information (with intelligent
methods) from a data set and transforming the information into a comprehensible
structure for further use.[1][2][3][4] Data mining is the analysis step of the "knowledge
discovery in databases" process, or KDD.[5] Aside from the raw analysis step, it also
involves database and data management aspects, data pre-
processing, model and inference considerations, interestingness
metrics, complexity considerations, post-processing of discovered
structures, visualization, and online updating.[1]
efficient. It takes more time to define distances between each diamond than to compute
since the error is linear proportional to the distance, whereas the error we try to
distance.