0% found this document useful (0 votes)
23 views

Data Mining Imp Solutions

imp soln

Uploaded by

fake banda
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Data Mining Imp Solutions

imp soln

Uploaded by

fake banda
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Last years questions data mining

Q1) What is the role of data cube computation and exploration in data warehousing?

Ans:- In cube computation, aggregation is performed on the tuples (or cells) that share
the same set of dimension values. Thus it is important to explore sorting, hashing,
and grouping operations to access and group such data together to facilitate
computation of such aggregates.

Q2) Difference between data warehousing and data mining?

Ans:- The main difference between data warehousing and data mining is that data
warehousing is the process of compiling and organizing data into one common
database, whereas data mining is the process of extracting meaningful data from that
database. Data mining can only be done once data warehousing is complete.

Q3) What do you mean by SVM?

Ans:- A support vector machine (SVM) is machine learning algorithm that analyzes
data for classification and regression analysis. SVM is a supervised learning method
that looks at data and sorts it into one of two categories. An SVM outputs a map of the
sorted data with the margins between the two as far apart as possible. SVMs are used
in text categorization, image classification, handwriting recognition and in the sciences.

Q4) Explain 3 tier architectures of data warehouse model?

Ans:- Three-Tier Data Warehouse Architecture - javatpoint

Q5) Differences Between OLAP and OLTP?

Ans:- Difference between OLAP and OLTP in DBMS - GeeksforGeeks


Q6) Describe the star, snowflake and fact constellations schemes for multidimensional
database. Take an example of any oraganisation and draw the diagram of star,
snowflake and fact constellations.

Ans:- Star and Snowflake Schema in Data Warehouse with Model Examples
(guru99.com)

Data Warehouse | What is Fact Constellation Schema - javatpoint

Q7) Define KDD. Identify and describe the phases in KDD process. Eludicate the key
differences between KDD versus data mining.

Ans:- Knowledge discovery in databases (KDD) is the process of discovering useful


knowledge from a collection of data. This widely used data mining technique is a
process that includes data preparation and selection, data cleansing, incorporating prior
knowledge on data sets and interpreting accurate solutions from the observed
results.Major KDD application areas include marketing, fraud detection,
telecommunication and manufacturing.

KDD Process in Data Mining: What You Need To Know? | upGrad blog

KDD is the overall process of extracting knowledge from data while Data Mining is
a step inside the KDD process, which deals with identifying patterns in data. In other
words, Data Mining is only the application of a specific algorithm based on the overall
goal of the KDD process.

Q8) What is a classification? Describe classification as a two step process. Also


compare classification with prediction.

Ans:- Classification in data mining is a common technique that separates data points
into different classes. It allows you to organize data sets of all sorts, including complex
and large datasets as well as small and simple ones.

Data Mining - Classification & Prediction (tutorialspoint.com)

Q9) Write a note on non- linear regression prediction techniques.

Q10) what is the difference between Slice and Dice operator?

Ans) The Slice operation takes one specific dimension from a cube given and
represents a new sub-cube which provides information from another point of view.
The Dice operation in the contrary emphasizes two or more dimensions from a cube.
Q11) Describe various methods for data cube materialization.

Ans) Data cube operations are used to manipulate data to meet the needs of users.
These operations help to select particular data for the analysis purpose. There are
mainly 5 operations listed below-
 Roll-up: operation and aggregate certain similar data attributes having the same
dimension together. For example, if the data cube displays the daily income of a
customer, we can use a roll-up operation to find the monthly income of his salary.

 Drill-down: this operation is the reverse of the roll-up operation. It allows us to


take particular information and then subdivide it further for coarser granularity
analysis. It zooms into more detail. For example- if India is an attribute of a country
column and we wish to see villages in India, then the drill-down operation splits
India into states, districts, towns, cities, villages and then displays the required
information.

 Slicing: this operation filters the unnecessary portions. Suppose in a particular


dimension, the user doesn’t need everything for analysis, rather a particular
attribute. For example, country=”jamaica”, this will display only about jamaica and
only display other countries present on the country list.

 Dicing: this operation does a multidimensional cutting, that not only cuts only one
dimension but also can go to another dimension and cut a certain range of it. As a
result, it looks more like a subcube out of the whole cube(as depicted in the figure).
For example- the user wants to see the annual salary of Jharkhand state
employees.

 Pivot: this operation is very important from a viewing point of view. It basically
transforms the data cube in terms of view. It doesn’t change the data present in the
data cube. For example, if the user is comparing year versus branch, using the
pivot operation, the user can change the viewpoint and now compare branch
versus item type.

Q12) What is the difference between agglomerative and divisive approach of clustering?

Ans) Agglomerative Hierarchical clustering method allows the clusters to be read


from bottom to top and it follows this approach so that the program always reads from
the sub-component first then moves to the parent. Whereas, divisive uses top-bottom
approach in which the parent is visited first then the child.

Q13) What is information gain?

Ans) Information gain is the amount of information that's gained by knowing the
value of the attribute, which is the entropy of the distribution before the split minus the
entropy of the distribution after it.
Q14) What are genetic algorithms?

Ans) A genetic algorithm is an adaptive heuristic search algorithm inspired by


"Darwin's theory of evolution in Nature ." It is used to solve optimization problems in
machine learning. It is one of the important algorithms as it helps solve complex
problems that would take a long time to solve.

Q15) What is bitmap indexing?

Ans) Bitmap indexing is a type of database indexing built on a single key. It uses
bitmaps. Bitmap indexing is used for large databases in which the cardinality of columns
is very low. And those columns are frequently used in the query. It retrieves data quickly
for low cardinality columns in massive databases.

Q16) Discuss major steps in KDD process

Q17) Discuss ROLAP, MOLAP and HOLAP servers involved in data warehouse.

Ans) Data Warehouse | Types of OLAP - javatpoint

Q18) Differntiate between Classification and clustering?

Ans) Clustering is unsupervised learning while Classification is a supervised


learning technique. It groups similar instances on the basis of features whereas
classification assign predefined tags to instances on the basis of features. Clustering
split the dataset into subsets to group the instances with similar features.

Q19) What is Decision tree?

Ans) Decision Trees A decision tree is a non-parametric supervised learning


algorithm, which is utilized for both classification and regression tasks. It has a
hierarchical, tree structure, which consists of a root node, branches, internal nodes and
leaf nodes.

Q20) What is agglomerative clustering?

Ans) Agglomerative Clustering is a type of hierarchical clustering algorithm. It is an


unsupervised machine learning technique that divides the population into several
clusters such that data points in the same cluster are more similar and data points in
different clusters are dissimilar. Points in the same cluster are closer to each other.

Q21) What is a cube?

Ans) When data is grouped or combined in multidimensional matrices called Data


Cubes. The data cube method has a few alternative names or a few variants, such as
"Multidimensional databases," "materialized views," and "OLAP (On-Line Analytical
Processing)."

Q22) Explain classification using backpropagation.

Ans) Backpropagation in Data Mining - GeeksforGeeks

Q23) What is Data Mining? Explain data mining system architecture.

Ans) Data mining is the process of extracting and discovering patterns in large data
sets involving methods at the intersection of machine learning, statistics, and database
systems.[1] Data mining is an interdisciplinary subfield of computer
science and statistics with an overall goal of extracting information (with intelligent
methods) from a data set and transforming the information into a comprehensible
structure for further use.[1][2][3][4] Data mining is the analysis step of the "knowledge
discovery in databases" process, or KDD.[5] Aside from the raw analysis step, it also
involves database and data management aspects, data pre-
processing, model and inference considerations, interestingness
metrics, complexity considerations, post-processing of discovered
structures, visualization, and online updating.[1]

Data Mining Architecture - Javatpoint

Q24) Applications of data mining.


 Ans) Financial Data Analysis. The financial data in banking and financial industry is
generally reliable and of high quality which facilitates systematic data analysis and
data mining.
 Retail Industry. Data Mining has its great application in Retail Industry because it
collects large amount of data from on sales, customer purchasing history, goods
transportation, consumption and services.
 Telecommunication Industry. Today the telecommunication industry is one of the
most emerging industries providing various services such as fax, pager, cellular
phone, internet messenger, images, e-mail, web data transmission, etc. ...
 Biological Data Analysis. In recent times, we have seen a tremendous growth in the
field of biology such as genomics, proteomics, functional Genomics and biomedical
research.
 Other Scientific Applications. The applications discussed above tend to handle
relatively small and homogeneous data sets for which the statistical techniques are
appropriate.
 Intrusion Detection. Intrusion refers to any kind of action that threatens integrity,
confidentiality, or the availability of network resources.

Q25) Which Clustering is better: k means or k medoids? Explain why?

Ans) K-Medoids is more robust because less sensitive to outliers.K-Means is more

efficient. It takes more time to define distances between each diamond than to compute

a mean.This implies that the K-medoids Algorithm is more robust/resistant to outliers,

since the error is linear proportional to the distance, whereas the error we try to

minimize while using the K-means Algorithm is quadratically proportional to the

distance.

You might also like