Week1-2
Week1-2
agenda
Major Issues in Data Mining
• Interestingness measures
• A pattern is interesting if it is easily understood by humans,
valid on new or test data with some degree of certainty,
potentially useful, novel, or validates some hypothesis that a
user seeks to confirm
• Data to be mined
• Relational, data warehouse, transactional, stream, object-
oriented/relational, active, spatial, time-series, text, multi-
media, heterogeneous, legacy, WWW
• Knowledge to be mined
• Characterization, discrimination, association, classification,
clustering, trend/deviation, outlier analysis, etc.
• Multiple/integrated functions and mining at multiple levels
• Techniques utilized
• Database-oriented, data warehouse (OLAP), machine learning,
statistics, visualization, etc.
• Applications adapted
• Retail, telecommunication, banking, fraud analysis, bio-data
mining, stock market analysis, text mining, Web mining, etc.
Data Mining: Classification Scheme
• General functionality
• Descriptive data mining
• Predictive data mining
• Interrelated data
• Software programs
• to manage & access
• Data
• Data access
• Concurrent
• Shared
• Distributed
• Based on ER model
• Contains a unique
key
• SQL Vs. Data
mining
• SQL: Look for
customers or
sales in a month
• Data mining:
determine credit
Transactional Database
• User interaction
• Data mining query languages and ad-hoc mining
• Expression and visualization of data mining results
• Interactive mining of knowledge at multiple levels of abstraction
• Applications and social impacts
• Domain-specific data mining & invisible data mining
• Protection of data security, integrity, and privacy
Data Mining
Functionalities
• Data mining tasks can be classified in to two
categories:
• Descriptive: Characterize the general properties
of data
• Predictive: Inferences on current data in order
to make predictions
• A measure of certainty may also be associated with
each pattern
Data Mining Functionalities
• Multidimensional concept description: Characterization and
discrimination
• Characterization: Generalize or summarize the target class or
class under study based upon features, and contrast data
characteristics, e.g., dry vs. wet regions
• Discrimination: Is comparing a target class with a set of
contrasting classes
Increasing potential
to support
business decisions End User
Decisio
n
Making
Data Presentation Business
Analyst
Visualization Techniques
Data Mining Data
Information Discovery Analyst
Data Exploration
Statistical Summary, Querying, and Reporting
DBA
Data Preprocessing/Integration, Data Warehouses
Data Sources
Paper, Files, Web documents, Scientific experiments, Database Systems
Architecture: Typical Data Mining System
Pattern Evaluation
Know
Data Mining Engine ledge
-Base
Database or Data
Warehouse Server