01
01
Concepts and
Techniques
— Chapter 1 —
— Introduction —
Task-relevant Data
Data Selection
Warehouse
Data Cleaning
Data Integration
Databases
May 10, 2025 Data Mining: Concepts and Techniq 12
Data Mining and Business
Intelligence
Increasing potential
to support
business decisions End User
Decisio
n
Making
Data Presentation Business
Analyst
Visualization Techniques
Data Mining Data
Information Discovery Analyst
Data Exploration
Statistical Summary, Querying, and Reporting
Database
Technology Statistics
Machine Visualization
Learning Data Mining
Pattern
Recognition Other
Algorithm Disciplines
General functionality
Descriptive data mining
Predictive data mining
Different views lead to different classifications
Data view: Kinds of data to be mined
Knowledge view: Kinds of knowledge to be
discovered
Method view: Kinds of techniques utilized
Application view: Kinds of applications adapted
May 10, 2025 Data Mining: Concepts and Techniq 17
Data Mining: On What Kinds of
Data?
Database-oriented data sets and applications
Relational database, data warehouse, transactional database
Advanced data sets and advanced applications
Data streams and sensor data
Time-series data, temporal data, sequence data (incl. bio-
sequences)
Structure data, graphs, social networks and multi-linked data
Object-relational databases
Heterogeneous databases and legacy databases
Spatial data and spatiotemporal data
Multimedia database
Text databases
The World-Wide Web
May 10, 2025 Data Mining: Concepts and Techniq 18
Data Mining Functionalities
Multidimensional concept description: Characterization and
discrimination
Generalize, summarize, and contrast data
characteristics, e.g., dry vs. wet regions
Frequent patterns, association, correlation vs. causality
Diaper Beer [0.5%, 75%] (Correlation or causality?)
Classification and prediction
Construct models (functions) that describe and
distinguish classes or concepts for future prediction
E.g., classify countries based on (climate), or classify
cars based on (gas mileage)
Predict some unknown or missing numerical values
Simplicity
e.g., (association) rule length, (decision) tree size
Certainty
e.g., confidence, P(A|B) = #(A and B)/ #(B), classification
reliability or accuracy, certainty factor, rule strength, rule
quality, discriminating weight, etc.
Utility
potential usefulness, e.g., support (association), noise
threshold (description)
Novelty
not previously known, surprising (used to remove
redundant rules, e.g., Illinois vs. Champaign rule
implication support ratio)
May 10, 2025 Data Mining: Concepts and Techniq 45
Primitive 5: Presentation of Discovered
Patterns
Motivation
A DMQL can provide the ability to support ad-hoc and
interactive data mining
By providing a standardized language like SQL
Hope to achieve a similar effect like that SQL has on
relational database
Foundation for system development and evolution
Facilitate information exchange, technology
transfer, commercialization and wide acceptance
Design
DMQL is designed with the primitives described earlier
Pattern Evaluation
Know
Data Mining Engine ledge
-Base
Database or Data
Warehouse Server