0% found this document useful (0 votes)
23 views

11 Data Mining

The document discusses several theoretical foundations of data mining, including: 1) data reduction which trades accuracy for speed, 2) data compression which encodes data in terms of bits or patterns, 3) pattern discovery which finds associations, classifications and sequences in data, 4) probability theory which discovers joint distributions based on statistics, 5) a microeconomic view which finds patterns for decision making and optimization, and 6) inductive databases which perform induction on stored data and patterns.

Uploaded by

bharathimanian
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

11 Data Mining

The document discusses several theoretical foundations of data mining, including: 1) data reduction which trades accuracy for speed, 2) data compression which encodes data in terms of bits or patterns, 3) pattern discovery which finds associations, classifications and sequences in data, 4) probability theory which discovers joint distributions based on statistics, 5) a microeconomic view which finds patterns for decision making and optimization, and 6) inductive databases which perform induction on stored data and patterns.

Uploaded by

bharathimanian
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 1

11.

3 Additional Themes on Data Mining

Due to the broad scope of data mining and the large variety of data methodologies, not all of the themes on data mining can be thoroughly covered 11.3.1 Theoretical Foundations of Data Mining

mining

A solid and systematic theoretical foundation is important because it can help provide a coherent framework for the development, evaluation, and practice of data mining technology 1. Data reduction: In this theory, the basis of data mining is to reduce the data representation. Data reduction trades accuracy for speed in response to the need to obtain quick approximate answers to queries on very large databases. Data reduction techniques include singular value decomposition (the driving element behind principal components analysis), wavelets, regression, log-linear models, histograms, clustering, sampling, and the construction of index trees 2. Data compression: According to this theory, the basis of data mining is to compress the given data by encoding in terms of bits, association rules, decision trees, clusters, and so on. Encoding based on the minimum description length principle states that the best theory to infer from a set of data is the one that minimizes the length of the theory and the length of the data when encoded, using the theory as a predictor for the data. This encoding is typically in bits. 3. Pattern discovery: In this theory, the basis of data mining is to discover patterns occurring in the database, such as associations, classification models, sequential patterns, and so on. Areas such as machine learning, neural network, association mining, sequential pattern mining, clustering, and several other subfields contribute to this theory. 4. Probability theory: This is based on statistical theory. In this theory, the basis of data mining is to discover joint probability distributions of random variables, for example, Bayesian belief networks or hierarchical Bayesian models. 5. Microeconomic view: The microeconomic view considers data mining as the task of finding patterns that are interesting only to the extent that they can be used in the decision-making process of some enterprise (e.g., regarding marketing strategies and production plans). This view is one of utility, in which patterns are considered interesting if they can be acted on. Enterprises are regarded as facing optimization problems, where the object is to maximize the utility or value of a decision. In this theory, data mining becomes a nonlinear optimization problem. 6. Inductive databases: According to this theory, a database schema consists of data and patterns that are stored in the database. Data mining is therefore the problem of performing induction on databases, where the task is to query the data and the theory (i.e., patterns) of the database. This view is popular among many researchers in database systems.

You might also like