chap1_DMBI_Jan_April2022 (1)
chap1_DMBI_Jan_April2022 (1)
• Fraud detection
– Find outliers of unusual transactions
• Financial planning
– Summarize and compare the resources and spending
Data Exploration
Statistical Summary, Querying, and Reporting
Information Machine
Science Data Mining Learning
Visualization Other
Disciplines
• Evolution Analysis
– Describes and models regularities or trends for objects
whose
behavior changes over time.
• E.g. Identify stock evolution regularities for overall stocks and for the
stocks of
particular companies.
Information Machine
Science Data Mining Learning
Visualization Other
Disciplines
• Database
– Relational, data warehouse, transactional, stream, object-oriented/
relational, active, spatial, time-series, text, multi-media,
heterogeneous, legacy, WWW
• Knowledge
– Characterization, discrimination, association, classification,
clustering, trend/deviation, outlier analysis, etc.
– Multiple/integrated functions and mining at multiple levels
• Techniques utilized
– Database-oriented, data warehouse (OLAP), machine learning,
statistics,
visualization, etc.
• Applications adapted
– Retail, telecommunication, banking, fraud analysis, bio-data
• How to construct a data mining query?
– The primitives allow the user to interactively communicate
with
be used
– Database or data warehouse name
performed
– Characterization
– Discrimination
– Association
– Classification/prediction
– Clustering
– Outlier analysis
– Other data mining tasks
knowledge,
confidence, etc.
(1)
(3)
(2)
(1)
(1)
(1)
(2)
(1)
(5)
Data Mining: Concepts and
40
Techniques
• Automated vs. query-driven?
– Finding all the patterns autonomously in a database?—
unrealistic because the patterns could be too many but
uninteresting
• Data mining should be an interactive process
– User directs what to be mined
• Users must be provided with a set of primitives to be used to
communicate with the data mining system
• Incorporating these primitives in a data mining query
language
– More flexible user interaction
– Foundation for design of graphical user interface
Data Mining: Concepts and
– Standardization of data Techniques
mining industry and practice 41
• No coupling
– Flat file processing, no utilization of any functions of a
DB/DW
system
– Not recommended
• Loose coupling
– Fetching data from DB/DW
– Does not explore data structures and query
optimization methods provided by DB/DW system
– Difficult to achieve high scalability and good
performance with
large data sets Data Mining: Concepts and
42
Techniques
• Semi-tight
– Efficient implementations of a few essential data mining
primitives in a DB/DW system are provided, e.g., sorting,
indexing, aggregation,
histogram analysis, multiway join, precomputation of some
stat
functions
– Enhanced DM performance
• Tight
– DM is smoothly integrated into a DB/DW system, mining
query is
optimized based on mining
Data query
Mining: Concepts andanalysis, data structures,
43
Techniques
indexing, query processing methods of a DB/DW system
• Mining methodology and User interaction
– Mining different kinds of knowledge
• DM should cover a wide spectrum of data analysis and knowledge
discovery tasks
• Enable to use the database in different ways
• Require the development of numerous data mining techniques
– Interactive mining of knowledge at multiple levels of
abstraction
• Difficult to know exactly what will be discovered
• Allow users to focus the search, refine data mining requests
– Incorporation of background knowledge
• Guide the discovery process
• Allow discovered patterns to be expressed in concise terms and different
levels of abstraction
– Data mining queryData
languages and
Mining: Concepts and ad hoc data mining
44
Techniques
• High-level query languages need to be developed
– Presentation and visualization of results
• Knowledge should be easily understood and directly usable
• High level languages, visual representations or other expressive forms
• Require the DM system to adopt the above techniques
– Handling noisy or incomplete data
• Require data cleaning methods and data analysis methods that can
handle noise
– Pattern evaluation – the interestingness problem
• How to develop techniques to access the interestingness of discovered
patterns, especially with subjective measures bases on user beliefs or
expectations