01Intro (2)
01Intro (2)
Concepts and
Techniques
(3rd ed.)
— Chapter 1 —
Task-relevant Data
Data Cleaning
Data Integration
Databases
Knowledge Discovery (KDD) Process
The knowledge discovery process is an iterative sequence of
the following steps:
Data cleaning (to remove noise and inconsistent data)
Data integration (where multiple data sources may be
combined)
Data selection (where data relevant to the analysis task
are retrieved from the database)
Data transformation (where data are transformed and
consolidated into forms appropriate for mining by
performing summary or aggregation operations)
Data mining (an essential process where intelligent
methods are applied to extract data patterns)
Pattern evaluation (to identify the truly interesting
patterns representing knowledge based on
interestingness measures)
Knowledge presentation (where visualization and
knowledge representation techniques are used to present
mined knowledge to users)
Data Mining
Data mining is the process of discovering interesting
patterns and knowledge from large amount of data
Example: A Web Mining
Framework
Increasing potential
to support
business decisions End User
Decisio
n
Making
Data Presentation Business
Analyst
Visualization Techniques
Data Mining Data
Information Discovery Analyst
Data Exploration
Statistical Summary, Querying, and Reporting
Techniques utilized
Data-intensive, data warehouse (OLAP), machine learning,
e.g., first buy digital camera, then buy large
SD memory cards
Periodicity analysis
Approximate and consecutive motifs
Similarity-based analysis
streams
Structure and Network Analysis
Graph mining
Finding frequent subgraphs (e.g., chemical compounds),
(edges)
e.g., author networks in CS, terrorist networks
Multiple heterogeneous networks
A person could be multiple information networks:
friends, family, classmates, …
Links carry a lot of semantic information: Link mining
Web mining
Web is a big information network: from PageRank to
Google
Analysis of Web information networks
Web community discovery, opinion mining, usage
Evaluation of Knowledge
Are all mined knowledge interesting?
One can mine tremendous amount of “patterns” and
knowledge
Some may fit only certain dimension space (time, location,
…)
Some may not be representative, may be transient, …
Evaluation of mined knowledge → directly mine only
interesting knowledge?
Descriptive vs. predictive
Coverage
Typicality vs. novelty
Accuracy
Timeliness
Chapter 1. Introduction
Why Data Mining?
What Is Data Mining?
A Multi-Dimensional View of Data Mining
What Kind of Data Can Be Mined?
What Kinds of Patterns Can Be Mined?
What Technology Are Used?
What Kind of Applications Are Targeted?
Major Issues in Data Mining
A Brief History of Data Mining and Data Mining Society
Summary
Data Mining: Confluence of Multiple
Disciplines