Data Mining
Data Mining
— Chapter 1 —
1
Chapter 1. Introduction
Why Data Mining?
What Is Data Mining?
A Multi-Dimensional View of Data Mining
What Kind of Data Can Be Mined?
What Kinds of Patterns Can Be Mined?
What Technology Are Used?
What Kind of Applications Are Targeted?
Major Issues in Data Mining
A Brief History of Data Mining and Data Mining Society
Summary
2
Why Data Mining?
3
Evolution of Database Technology
1960s:
Data collection, database creation, IMS and network DBMS
1970s:
Relational data model, relational DBMS implementation
1980s:
RDBMS, advanced data models (extended-relational, OO, deductive, etc.)
Application-oriented DBMS (spatial, scientific, engineering, etc.)
1990s:
Data mining, data warehousing, multimedia databases, and Web databases
2000s
Stream data management and mining
Data mining and its applications
Web technology (XML, data integration) and global information systems
4
May 31, 2023 Data Mining: Concepts and Techniques 5
May 31, 2023 Data Mining: Concepts and Techniques 6
Contd..
This is data rich but information poor situation.
Due to tremendous data they have far exceeded human
ability for comprehension without powerful tools.
Important decisions are often based not on data stored in
data repository but on decision maker’s intuition.
Efforts have been made to develop expert system and
knowledge-based technologies rely on users to input
knowledge into knowledge bases.
But this procedure is prone to errors and is time
consuming.
So systematic development of data mining tools are
required that can turn data tombs into golden nuggets of
knowledge.
9
Knowledge Discovery (KDD) Process
This is a view from typical
database systems and data
warehousing communities Pattern Evaluation
Data mining plays an essential role
in the knowledge discovery process
Data Mining
Task-relevant Data
Data Selection
Warehouse
Data Cleaning
Data Integration
Databases
10
Data Cleaning: Remove noise and inconsistent data.
Data Integration: Multiple data sources may be
combined.
Data selection: Data relevant to analysis task are
retrieved from the database.
Data transformation: Consolidated into forms
appropriate for mining .
Data mining:process where intelligent methods are
applied to extract data patterns.
Pattern Evaluation: Identify truly interesting
patterns based on measures.
Knowledge presentation: Visualization techniques
are used to present mined knowledge to users.
May 31, 2023 Data Mining: Concepts and Techniques 11
Example: A Web Mining Framework
12
Chapter 1. Introduction
Why Data Mining?
What Is Data Mining?
A Multi-Dimensional View of Data Mining
What Kind of Data Can Be Mined?
What Kinds of Patterns Can Be Mined?
What Technology Are Used?
What Kind of Applications Are Targeted?
Major Issues in Data Mining
A Brief History of Data Mining and Data Mining Society
Summary
13
Chapter 1. Introduction
Why Data Mining?
What Is Data Mining?
A Multi-Dimensional View of Data Mining
What Kind of Data Can Be Mined?
What Kinds of Patterns Can Be Mined?
What Technology Are Used?
What Kind of Applications Are Targeted?
Major Issues in Data Mining
A Brief History of Data Mining and Data Mining Society
Summary
14
Database Data
Collection of interrelated data known as a
database and a set of software programs to
manage and access the data.
Relational database: collection of tables,each of
which is assigned a unique name.
Each table consists of set of attributes and a large
set of tuples.
Each tuple is identified by unique key and
described by set of attribute values.
ER data model is often constructed for relational
databases.
Example: AllElectronics store
networks)
Web (a huge, widely distributed information repository
made available by the Internet)
May 31, 2023 Data Mining: Concepts and Techniques 24
Contd..
Various kinds of knowledge can be mined from these kinds of
data.
Temporal data: we can mine banking data for changing
trends.
Stock exchange data can be mined to uncover trends that
could help you plan investment strategies
Computer network data streams to detect intrusions based
32
Data Mining Function: (3) Classification
33
Contd..
The derived model may be represented in
Classification rules(IF-THEN)
Decision trees.
Mathematical Formulae or neural networks.
36
Contd..
Clusters of objects are formed so that objects within a
cluster have high similarity in comparison to one another,
but are rather dissimilar to objects in other clusters.
Each cluster so formed can be viewed as a class of
objects, from which rules can be derived.
Clustering can also facilitate taxonomy formation, that is,
the organization of observations into a hierarchy of classes
that group similar events together.
Ex: To identify homogeneous subpopulation of customers.
Customer data Vs
Customer location
38
Are all patterns interesting
Only a small fraction of the patterns potentially
generated would actually be of interest to a given
user.
A pattern is interesting if it is
easily understood by humans,
valid on new or test data with some degree of
certainty
potentially useful, and
novel.
A pattern is also interesting if it validates a hypothesis
that the user sought to confirm.
An interesting pattern represents knowledge.
the input examples are not class labeled .we may use
clustering to discover classes within the data.
Semi-supervised:That make use of both labeled and
learning process.
Mining Methodology
Mining various and new kinds of knowledge
DM tasks may use the same database in different ways and require
the development of numerous data mining techniques.
Due to the diversity of applications, new mining tasks continue to
emerge, making data mining a dynamic and fast-growing field.
For example, for effective knowledge discovery in information
networks, integrated clustering and ranking may lead to the
discovery of high-quality clusters and object ranks in large networks.
Mining knowledge in multi-dimensional space
We can search for interesting patterns among combinations of
dimensions (attributes) at varying levels of abstraction.
Mining knowledge in cube space can substantially enhance the power
and flexibility of data mining.
51
Contd..
Data mining: An interdisciplinary effort
The power of data mining can be substantially enhanced by
integrating new methods from multiple disciplines.
Ex: Text mining, Bug mining.
Boosting the power of discovery in a networked environment
Semantic links across multiple data objects can be used to advantage
in data mining.
Knowledge derived in one set of objects can be used to boost the
discovery of knowledge in a “related” or semantically linked set of
objects.
Ex: Web, files or documents.
55
Contd..
Diversity of data types
Handling complex types of data
It is unrealistic to expect one data mining system to mine all kinds of
data, given the diversity of data types and the different goals of data
mining.
Domain- or application-dedicated data mining systems are being
constructed for in-depth mining of specific kinds of data.
Mining dynamic, networked, and global data repositories
The discovery of knowledge from different sources of structured,
semi-structured, or unstructured yet interconnected data with diverse
data semantics poses great challenges to data mining.
Mining such gigantic, interconnected information networks may
help disclose many more patterns and knowledge in
heterogeneous data sets