0% found this document useful (0 votes)
67 views57 pages

Data Mining

The document provides an overview of data mining concepts and techniques. It discusses how the explosive growth of data has created a need for data mining to extract useful knowledge and patterns. It describes data mining as the process of discovering interesting and non-trivial patterns or knowledge from large amounts of data. This involves data cleaning, integration, transformation, selection, mining, pattern evaluation and knowledge presentation. The document also outlines different types of data that can be mined, including relational databases and data warehouses.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views57 pages

Data Mining

The document provides an overview of data mining concepts and techniques. It discusses how the explosive growth of data has created a need for data mining to extract useful knowledge and patterns. It describes data mining as the process of discovering interesting and non-trivial patterns or knowledge from large amounts of data. This involves data cleaning, integration, transformation, selection, mining, pattern evaluation and knowledge presentation. The document also outlines different types of data that can be mined, including relational databases and data warehouses.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 57

Data Mining:

Concepts and Techniques


(3rd ed.)

— Chapter 1 —

1
Chapter 1. Introduction
 Why Data Mining?
 What Is Data Mining?
 A Multi-Dimensional View of Data Mining
 What Kind of Data Can Be Mined?
 What Kinds of Patterns Can Be Mined?
 What Technology Are Used?
 What Kind of Applications Are Targeted?
 Major Issues in Data Mining
 A Brief History of Data Mining and Data Mining Society
 Summary
2
Why Data Mining?

 The Explosive Growth of Data: from terabytes to petabytes


 Data collection and data availability
 Automated data collection tools, database systems, Web,
computerized society
 Major sources of abundant data
 Business: Web, e-commerce, transactions, stocks, …
 Science: Remote sensing, bioinformatics, scientific simulation, …
 Society and everyone: news, digital cameras, YouTube
 We are drowning in data, but starving for knowledge!
 “Necessity is the mother of invention”—Data mining—Automated
analysis of massive data sets

3
Evolution of Database Technology
 1960s:
 Data collection, database creation, IMS and network DBMS
 1970s:
 Relational data model, relational DBMS implementation
 1980s:
 RDBMS, advanced data models (extended-relational, OO, deductive, etc.)
 Application-oriented DBMS (spatial, scientific, engineering, etc.)
 1990s:
 Data mining, data warehousing, multimedia databases, and Web databases
 2000s
 Stream data management and mining
 Data mining and its applications
 Web technology (XML, data integration) and global information systems

4
May 31, 2023 Data Mining: Concepts and Techniques 5
May 31, 2023 Data Mining: Concepts and Techniques 6
Contd..
 This is data rich but information poor situation.
 Due to tremendous data they have far exceeded human
ability for comprehension without powerful tools.
 Important decisions are often based not on data stored in
data repository but on decision maker’s intuition.
 Efforts have been made to develop expert system and
knowledge-based technologies rely on users to input
knowledge into knowledge bases.
 But this procedure is prone to errors and is time
consuming.
 So systematic development of data mining tools are
required that can turn data tombs into golden nuggets of
knowledge.

May 31, 2023 Data Mining: Concepts and Techniques 7


Chapter 1. Introduction
 Why Data Mining?
 What Is Data Mining?
 A Multi-Dimensional View of Data Mining
 What Kind of Data Can Be Mined?
 What Kinds of Patterns Can Be Mined?
 What Technology Are Used?
 What Kind of Applications Are Targeted?
 Major Issues in Data Mining
 A Brief History of Data Mining and Data Mining Society
 Summary
8
What Is Data Mining?

 Data mining (knowledge discovery from data)


 Extraction of interesting (non-trivial, implicit, previously
unknown and potentially useful) patterns or knowledge from
huge amount of data
 Alternative names
 Knowledge discovery (mining) in databases (KDD), knowledge
extraction, data/pattern analysis, data archeology, data
dredging, information harvesting, business intelligence, etc.
 Watch out: Is everything “data mining”?
 Simple search and query processing

9
Knowledge Discovery (KDD) Process
 This is a view from typical
database systems and data
warehousing communities Pattern Evaluation
 Data mining plays an essential role
in the knowledge discovery process
Data Mining

Task-relevant Data

Data Selection
Warehouse
Data Cleaning

Data Integration

Databases
10
 Data Cleaning: Remove noise and inconsistent data.
 Data Integration: Multiple data sources may be
combined.
 Data selection: Data relevant to analysis task are
retrieved from the database.
 Data transformation: Consolidated into forms
appropriate for mining .
 Data mining:process where intelligent methods are
applied to extract data patterns.
 Pattern Evaluation: Identify truly interesting
patterns based on measures.
 Knowledge presentation: Visualization techniques
are used to present mined knowledge to users.
May 31, 2023 Data Mining: Concepts and Techniques 11
Example: A Web Mining Framework

 Web mining usually involves


 Data cleaning
 Data integration from multiple sources
 Warehousing the data
 Data cube construction
 Data selection for data mining
 Data mining
 Presentation of the mining results
 Patterns and knowledge to be used or stored into
knowledge-base

12
Chapter 1. Introduction
 Why Data Mining?
 What Is Data Mining?
 A Multi-Dimensional View of Data Mining
 What Kind of Data Can Be Mined?
 What Kinds of Patterns Can Be Mined?
 What Technology Are Used?
 What Kind of Applications Are Targeted?
 Major Issues in Data Mining
 A Brief History of Data Mining and Data Mining Society
 Summary
13
Chapter 1. Introduction
 Why Data Mining?
 What Is Data Mining?
 A Multi-Dimensional View of Data Mining
 What Kind of Data Can Be Mined?
 What Kinds of Patterns Can Be Mined?
 What Technology Are Used?
 What Kind of Applications Are Targeted?
 Major Issues in Data Mining
 A Brief History of Data Mining and Data Mining Society
 Summary
14
Database Data
 Collection of interrelated data known as a
database and a set of software programs to
manage and access the data.
 Relational database: collection of tables,each of
which is assigned a unique name.
 Each table consists of set of attributes and a large
set of tuples.
 Each tuple is identified by unique key and
described by set of attribute values.
 ER data model is often constructed for relational
databases.
Example: AllElectronics store

May 31, 2023 Data Mining: Concepts and Techniques 15


All Electronics store
 Relational tables are customer,item,employee and
branch.
Customer (cust_id, address, age, occupation, annual_income,
credit_information, category…)
Item (item_id,brand,category,type,price,place_made,supplier,
….)
Employee (empl_id,name,category,group,salary,commission,..)
Branch (branch_id,name,address,..)
Purchases (trans_id,cust_id,emp_id,date,time,method_paid,amount)
Items_sold (trans_id,item_id,qty)
Wotks_at ( emp_id,branch_id)

 Data can be accessed by database queries written


in relational query language.
 Query will be transformed into set of relational
operations and is optimized for efficient processing.
May 31, 2023 Data Mining: Concepts and Techniques 16
Contd..
 Mining relational databases we can search for
trends or data patterns.
 Ex:
 Analyze customer data to predict risk of new customers
based on income,age etc;
 Detect deviations
 Relational databases are commonly available and
richest information repositories.

May 31, 2023 Data Mining: Concepts and Techniques 17


Data Warehouses
 Repository of information collected from multiple sources,
stored under a unified schema, residing at a single site.
 Constructed via a process of data cleaning, integration,
transformation, loading, periodic data refreshing.
 Data organised around major subjects and provide
summarized information from a historical perspective.
 It is a multidimensional data structure called a data cube ,
in which each dimension correspond to set of attributes and
each cell stores value of aggregate measure.
 Provides a multidimensional view of data and allows the
precomputation and fast access of summarized data.
 Data warehouse tools support data analysis additional tools
needed for in-depth analysis.
 Multidimensional data mining allow exploration of multiple
combination of dimensions at varying levels of granularity.

May 31, 2023 Data Mining: Concepts and Techniques 18


Contd..

May 31, 2023 Data Mining: Concepts and Techniques 19


Framework of data warehouse

May 31, 2023 Data Mining: Concepts and Techniques 20


Contd..

May 31, 2023 Data Mining: Concepts and Techniques 21


Transactional Data
 Each record in a transactional database captures a
transaction.
 Ex: Customer purchase.
 A transaction typically includes a unique transaction
identity number (trans_ID) and a list of the items making
up the transaction, such as the items purchased in the
transaction.
 A transactional database may have additional tables,
which contain other information related to the
transactions, such as item description, information about
the salesperson or the branch.
 As an analyst of AllElectronics, you may ask,“Which items
sold well together?” This kind of market basket data
analysis would enable to bundle groups of items
together as a strategy for boosting sales.

May 31, 2023 Data Mining: Concepts and Techniques 22


Contd..

May 31, 2023 Data Mining: Concepts and Techniques 23


Other kinds of data
 There are many other kinds of data that have versatile forms
and structures and rather different semantic meanings.
 Such kinds of data can be seen in many applications
 time-related or sequence data (e.g., historical records,
stock exchange data, and time-series and biological
sequence data)
 data streams (e.g., video surveillance and sensor data,
which are continuously transmitted)
 spatial data (e.g., maps)
 Engineering design data (e.g., the design of buildings)
 Hypertext and multimedia data
 Graph and networked data (e.g., social and information

networks)
 Web (a huge, widely distributed information repository
made available by the Internet)
May 31, 2023 Data Mining: Concepts and Techniques 24
Contd..
 Various kinds of knowledge can be mined from these kinds of
data.
 Temporal data: we can mine banking data for changing
trends.
 Stock exchange data can be mined to uncover trends that
could help you plan investment strategies
 Computer network data streams to detect intrusions based

on the anomaly of message flows.


 With spatial data, we may look for patterns that describe
changes in metropolitan poverty rates based on city
distances from major highways.
 By mining text data, such as literature on data mining from
the past ten years, we can identify the evolution of hot
topics in the field.
 By mining user comments on products we can assess

customer sentiments and understand how well a product is


embraced by a market.
May 31, 2023 Data Mining: Concepts and Techniques 25
Contd..
 From multimedia data, we can mine images to identify objects
and classify them by assigning semantic labels or tags.
 By mining video data of a hockey game, we can detect video
sequences corresponding to goals.
 Web mining can help us learn
 About the distribution of information on the WWW

 Characterize and classify web pages

 Uncover web dynamics and the association

 Other relationships among different web pages, users,

communities, and web-based activities.


 Mining multiple data sources of complex data often leads to
fruitful findings due to the mutual enhancement and
consolidation of such multiple sources.
 It is also challenging because of the difficulties in data cleaning
and data integration, as well as the complex interactions among
the multiple sources of such data.
May 31, 2023 Data Mining: Concepts and Techniques 26
Chapter 1. Introduction
 Why Data Mining?
 What Is Data Mining?
 A Multi-Dimensional View of Data Mining
 What Kind of Data Can Be Mined?
 What Kinds of Patterns Can Be Mined?
 What Technology Are Used?
 What Kind of Applications Are Targeted?
 Major Issues in Data Mining
 A Brief History of Data Mining and Data Mining Society
 Summary
27
Data Mining Functions
 There are a number of data mining functionalities.
 characterization and discrimination

 the mining of frequent patterns, associations, and


correlations
 classification and regression
 clustering analysis
 outlier analysis
 Data mining functionalities are used to specify the kinds of
patterns to be found in data mining tasks.
 The tasks can be classified into two categories: descriptive and
predictive.
 Descriptive mining tasks characterize properties of the data in a
target data set.
 Predictive mining tasks perform induction on the current data in
order to make predictions.

May 31, 2023 Data Mining: Concepts and Techniques 28


Class/Concept Description
 Data entries can be associated with classes or concepts.
 Ex: Classes of items for sale include computers
and printers
Concepts of customers include bigSpenders
and budgetSpenders.

 These descriptions can be derived using


 Data characterization, by summarizing the data of the
class under study in general terms.
 Data discrimination, by comparison of the target class
with one or a set of comparative classes
 by Both data characterization and discrimination.

May 31, 2023 Data Mining: Concepts and Techniques 29


Contd..
 Data characterization is a summarization of the general
characteristics or features of a target class of data.
 Data characterization methods
 data summaries based on statistical measures

 data cube-based OLAP roll-up operation

 attribute-oriented induction technique

 The output of data characterization can be presented in


various forms.
 pie charts, bar charts, curves, multidimensional data

cubes, and multidimensional tables, including


crosstabs.
 generalized relations or in rule form.
 Summarize the characteristics of customers who spend
more than $5000 a year at AllElectronics.

May 31, 2023 Data Mining: Concepts and Techniques 30


Contd..
 Data discrimination is a comparison of the general
features of the target class data objects against the
general features of objects from one or multiple
contrasting classes.
 The target and contrasting classes can be specified by a
user, and the corresponding data objects can be retrieved
through database queries.
 compare the general features of software products with
sales that increased by 10% last year against those with
sales that decreased by at least 30% during the same
period.
 The forms of output presentation are similar to those for
characteristic descriptions.
 Discrimination descriptions expressed in the form of rules
are referred to as discriminant rules.

May 31, 2023 Data Mining: Concepts and Techniques 31


Data Mining Function: (2) Association
and Correlation Analysis
 Frequent patterns (or frequent itemsets)
 What items are frequently purchased together in your
Walmart?
 Frequent sub-sequence, frequent sub structure .
 Ex:laptop and digital camera followed by memory card.
 Graph,trees.
 Association, correlation
 A typical association rule
 Diaper  Beer [0.5%, 75%] (support, confidence)
 Are strongly associated items also strongly correlated?
 Multidimension association rule
 Minimum support threshold, confidence threshold.

32
Data Mining Function: (3) Classification

 Classification and label prediction


 Construct models (functions) based on some training examples
 Describe and distinguish classes or concepts for future prediction
 E.g., classify countries based on (climate), or classify cars
based on (gas mileage)
 Predict some unknown class labels
 Typical methods
 Decision trees, naïve Bayesian classification, support vector
machines, neural networks, rule-based classification, pattern-
based classification, logistic regression, …
 Typical applications:
 Credit card fraud detection, direct marketing, classifying stars,
diseases, web-pages, …

33
Contd..
 The derived model may be represented in
 Classification rules(IF-THEN)
 Decision trees.
 Mathematical Formulae or neural networks.

May 31, 2023 Data Mining: Concepts and Techniques 34


Contd..
 A neural network is typically a collection of neuron-like processing
units with weighted connections between the units.
 Classification predicts categorical (discrete, unordered) labels,
regression models, continuous-valued functions.
 Regression is used to predict missing or unavailable numerical data
values rather than (discrete) class labels.
 Regression analysis is a statistical methodology that is most often
used for numeric prediction, although other methods exist as well.
 Regression also encompasses the identification of distribution trends
based on the available data.
 Relevance analysis which attempts to identify attributes that are
significantly relevant to the classification and regression process.
 Ex:
 Classify items in a store based on responses to a sales campaign as mild
response,good response,no respone.
 Predict amount of revenue that each item will generate during upcoming
sale based on previous sale.

May 31, 2023 Data Mining: Concepts and Techniques 35


Data Mining Function: (4) Cluster Analysis

 Unsupervised learning (i.e., Class label is unknown)


 Group data to form new categories (i.e., clusters), e.g.,
cluster houses to find distribution patterns.
 Clustering analyzes data objects without consulting class
labels.
 In many cases, class-labeled data may simply not exist at
the beginning. Clustering can be used to generate class
labels for a group of data.
 The objects are clustered or grouped based on the
principle of maximizing the intraclass similarity and
minimizing the interclass similarity.

36
Contd..
 Clusters of objects are formed so that objects within a
cluster have high similarity in comparison to one another,
but are rather dissimilar to objects in other clusters.
 Each cluster so formed can be viewed as a class of
objects, from which rules can be derived.
 Clustering can also facilitate taxonomy formation, that is,
the organization of observations into a hierarchy of classes
that group similar events together.
 Ex: To identify homogeneous subpopulation of customers.

 Customer data Vs
Customer location

May 31, 2023 Data Mining: Concepts and Techniques 37


Data Mining Function: (5) Outlier Analysis
 Outlier analysis
 Outlier: A data object that does not comply with the general
behavior of the data
 Noise or exception? ― One person’s garbage could be another
person’s treasure
 Methods:
 Statistical test
 by product of clustering or regression analysis
 Density-based methods
 Useful in fraud detection, rare events analysis

38
Are all patterns interesting
 Only a small fraction of the patterns potentially
generated would actually be of interest to a given
user.
 A pattern is interesting if it is
 easily understood by humans,
 valid on new or test data with some degree of
certainty
 potentially useful, and
 novel.
 A pattern is also interesting if it validates a hypothesis
that the user sought to confirm.
 An interesting pattern represents knowledge.

May 31, 2023 Data Mining: Concepts and Techniques 39


Objective measures of pattern
interestingnes
 Support: Percentage of transactions from a transaction database that
the given rule satisfies.
 Confidence: the degree of certainty of the detected association.

 Each interestingness measure is associated with a threshold, which


may be controlled by the user. For example, rules that do not satisfy a
confidence threshold of, say, 50% can be considered uninteresting.
 Accuracy and coverage for classification rules.
 Accuracy tells us the percentage of data that are correctly classified by a rule.
 Coverage is similar to support, in that it tells us the percentage of data to which a
rule applies.
 They are often insufficient unless combined with subjective measures
that reflect a particular user's needs and interests.

May 31, 2023 Data Mining: Concepts and Techniques 40


Subjective interestingness
measures
 Based on user beliefs in the data.
 These measures find patterns interesting if the
patterns are unexpected (contradicting a user's
belief) or offer strategic information on which the
user can act.
 In the latter case, such patterns are referred to
as actionable. For example, patterns like “a large
earthquake often follows a cluster of small
quakes” .
 Patterns that are expected can be interesting if
they confirm a hypothesis that the user wishes to
validate or they resemble a user's hunch.

May 31, 2023 Data Mining: Concepts and Techniques 41


Contd..
 DM generate all interesting patterns?
 It is often unrealistic and inefficient for data mining systems to

generate all possible patterns.


 Instead, user-provided constraints and interestingness measures

should be used to focus the search.


 Generate only interesting patterns?
 Progress has been made in this direction; however, such

optimization remains a challenging issue in data mining.


 Measures of pattern interestingness are essential for the efficient
discovery of patterns by target users.
 Such measures can be used after the data mining step to rank the
discovered patterns according to their interestingness, filtering out the
uninteresting ones.
 More important, such measures can be used to guide and constrain
the discovery process, improving the search efficiency by pruning
away subsets of the pattern space that do not satisfy pre-specified
interestingness constraints.

May 31, 2023 Data Mining: Concepts and Techniques 42


Chapter 1. Introduction
 Why Data Mining?
 What Is Data Mining?
 A Multi-Dimensional View of Data Mining
 What Kind of Data Can Be Mined?
 What Kinds of Patterns Can Be Mined?
 What Technology Are Used?
 What Kind of Applications Are Targeted?
 Major Issues in Data Mining
 A Brief History of Data Mining and Data Mining Society
 Summary
43
Data Mining: Technologies used

May 31, 2023 Data Mining: Concepts and Techniques 44


Statistics
 Statistics studies the collection, analysis, interpretation or
explanation, and presentation of data. Data mining has an
inherent connection with statistics.
 Statistical models are widely used to model data and data
classes.
 Alternatively, data mining tasks can be built on top of statistical
models.
 Statistical methods can be used to summarize or describe a
collection of data.
 Statistical methods can also be used to verify data mining
results.
 Inferential statistics (or predictive statistics) is used to draw
inferences about the process or population under investigation.
 Many statistical methods have high complexity in computation.
 When such methods are applied on large data sets, algorithms
should be carefully designed and tuned to reduce the
computational cost.
May 31, 2023 Data Mining: Concepts and Techniques 45
Machine Learning
 Machine learning investigates how computers can learn (or
improve their performance) based on data.
 classic problems in machine learning that are highly related
to data mining
 Supervised Learning:The supervision in the learning

comes from the labeled examples in the training data set.


 Unsupervised :The learning process is unsupervised since

the input examples are not class labeled .we may use
clustering to discover classes within the data.
 Semi-supervised:That make use of both labeled and

unlabeled examples when learning a model.


 Active Learning: lets users play an active role in the

learning process.

May 31, 2023 Data Mining: Concepts and Techniques 46


Contd..

May 31, 2023 Data Mining: Concepts and Techniques 47


Database systems and data
warehouse systems
 Data mining can make good use of scalable database
technologies to achieve high efficiency and scalability on
large data sets.
 Data mining tasks can be used to extend the capability of
existing database systems to satisfy advanced users'
sophisticated data analysis requirements.
 A data warehouse integrates data originating from
multiple sources and various timeframes. It consolidates
data in multidimensional space to form partially
materialized data cubes.
 The data cube model not only facilitates OLAP in
multidimensional databases but also promotes
multidimensional data mining .

May 31, 2023 Data Mining: Concepts and Techniques 48


Information Retrieval
 Information retrieval (IR) is the science of
searching for documents or information in
documents.
 Information retrieval assumes that
 the data under search are unstructured
 the queries are formed mainly by keywords, which do
not have complex structures .
 Increasingly large amounts of text and multimedia
data have been accumulated and made available
online due to the fast growth of the Web.
 Their effective search and analysis have raised
many challenging issues in data mining.

May 31, 2023 Data Mining: Concepts and Techniques 49


Chapter 1. Introduction
 Why Data Mining?
 What Is Data Mining?
 A Multi-Dimensional View of Data Mining
 What Kind of Data Can Be Mined?
 What Kinds of Patterns Can Be Mined?
 What Technology Are Used?
 What Kind of Applications Are Targeted?
 Major Issues in Data Mining
 A Brief History of Data Mining and Data Mining Society
 Summary
50
Major Issues in Data Mining (1)

 Mining Methodology
 Mining various and new kinds of knowledge
 DM tasks may use the same database in different ways and require
the development of numerous data mining techniques.
 Due to the diversity of applications, new mining tasks continue to
emerge, making data mining a dynamic and fast-growing field.
 For example, for effective knowledge discovery in information
networks, integrated clustering and ranking may lead to the
discovery of high-quality clusters and object ranks in large networks.
 Mining knowledge in multi-dimensional space
 We can search for interesting patterns among combinations of
dimensions (attributes) at varying levels of abstraction.
 Mining knowledge in cube space can substantially enhance the power
and flexibility of data mining.

51
Contd..
 Data mining: An interdisciplinary effort
 The power of data mining can be substantially enhanced by
integrating new methods from multiple disciplines.
 Ex: Text mining, Bug mining.
 Boosting the power of discovery in a networked environment
 Semantic links across multiple data objects can be used to advantage
in data mining.
 Knowledge derived in one set of objects can be used to boost the
discovery of knowledge in a “related” or semantically linked set of
objects.
 Ex: Web, files or documents.

 Handling noise, uncertainty, and incompleteness of data


 Errors and noise may confuse the data mining process, leading to the

derivation of erroneous patterns.

May 31, 2023 Data Mining: Concepts and Techniques 52


Contd..
 Data cleaning, data preprocessing, outlier detection and
removal, and uncertainty reasoning are examples of techniques
that need to be integrated with the data mining process.
 Pattern evaluation and pattern- or constraint-guided mining
 Not all the patterns generated by data mining processes are interesting.
 By using interestingness measures or user-specified constraints to guide
the discovery process, we may generate more interesting patterns and
reduce the search space.
 User Interaction
 Interactive mining
 Build flexible user interfaces facilitate user interaction with the
system.
 Allow users to dynamically change the focus of search .

May 31, 2023 Data Mining: Concepts and Techniques 53


Contd..
 Incorporation of background knowledge
 Information regarding the domain under study should be
incorporated into the discovery process to guide the search.
 Presentation and visualization of data mining results
 It requires the system to adopt expressive knowledge

representations, user-friendly interfaces, and visualization


techniques.
 Ad hoc data mining and DMQL
 Query languages (e.g., SQL) have played an important role in

flexible searching because they allow users to pose ad hoc


queries.
 High-level data mining query languages will give users the

freedom to define ad hoc data mining tasks.


 Facilitate specification of the relevant sets of data for analysis, the
domain knowledge, the kinds of knowledge to be mined, and the
conditions and constraints to be enforced on the discovered patterns.

May 31, 2023 Data Mining: Concepts and Techniques 54


Major Issues in Data Mining (2)

 Efficiency and Scalability


 Efficiency and scalability of data mining algorithm
 The running time of a data mining algorithm must be predictable,
short, and acceptable by applications.
 Efficiency, scalability, performance, optimization, and the ability to
execute in real time are key criteria that drive the development of
many new data mining algorithms.
 Parallel, distributed, stream, and incremental mining methods
 The wide distribution of data, and the computational complexity of
some data mining methods are factors that motivate the
development of parallel and distributed data-intensive mining
algorithms.
 Partition the data into pieces search for patterns in parallel and then
merge all patterns.
 Incremental mining incorporates new data updates without having to
mine the entire data

55
Contd..
 Diversity of data types
 Handling complex types of data
 It is unrealistic to expect one data mining system to mine all kinds of
data, given the diversity of data types and the different goals of data
mining.
 Domain- or application-dedicated data mining systems are being
constructed for in-depth mining of specific kinds of data.
 Mining dynamic, networked, and global data repositories
 The discovery of knowledge from different sources of structured,
semi-structured, or unstructured yet interconnected data with diverse
data semantics poses great challenges to data mining.
 Mining such gigantic, interconnected information networks may
help disclose many more patterns and knowledge in
heterogeneous data sets

May 31, 2023 Data Mining: Concepts and Techniques 56


Contd..
 Data mining and society
 Social impacts of data mining
 How can we use data mining technology to benefit society?
 How can we guard against its misuse?
 The improper disclosure or use of data and the potential violation of
individual privacy and data protection rights are areas of concern that
need to be addressed.
 Privacy-preserving data mining
 It poses the risk of disclosing an individual's personal information.
 Studies on privacy-preserving data publishing and data mining are
ongoing.
 The philosophy is to observe data sensitivity and preserve people's
privacy while performing successful data mining.
 Invisible data mining
 Incorporating data mining into their components to improve

their functionality and performance. This is done often


unbeknownst to the user. Ex: Buying patterns of customers
May 31, 2023 Data Mining: Concepts and Techniques 57

You might also like