Lecture 1
Lecture 1
7
Architecture: Typical Data Mining System
Pattern evaluation
Knowledge-base
Database or data
warehouse server
Data cleaning & data integration Filtering
Data
Databases Warehouse
8
Data Mining: On What Kinds of Data?
• Relational database
• Data warehouse
• Transactional database
• Advanced database and information repository
– Spatial and temporal data
– Time-series data
– Stream data
– Multimedia database
– Text databases & WWW
9
Examples of Large Datasets
• Scientific
– NASA, EOS project: 50 GB per hour
– Environmental datasets
Examples of Data mining Applications
15
Data Mining Functionalities
• Cluster analysis
– Class label is unknown: Group data to form new classes, e.g., cluster
houses to find distribution patterns
– Maximizing intra-class similarity & minimizing interclass similarity
• Outlier analysis
– Outlier: a data object that does not comply with the general behavior of
the data
– Useful in fraud detection, rare events analysis
• Trend and evolution analysis
– Trend and deviation: regression analysis
– Sequential pattern mining, periodicity analysis
16
Data Mining: Confluence of Multiple Disciplines
Database
Statistics
Systems
Algorithm Other
Disciplines
17