0% found this document useful (0 votes)

67 views57 pages

Data Mining

The document provides an overview of data mining concepts and techniques. It discusses how the explosive growth of data has created a need for data mining to extract useful knowledge and patterns. It describes data mining as the process of discovering interesting and non-trivial patterns or knowledge from large amounts of data. This involves data cleaning, integration, transformation, selection, mining, pattern evaluation and knowledge presentation. The document also outlines different types of data that can be mined, including relational databases and data warehouses.

Uploaded by

Rama Krishna Badiguntla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views57 pages

Data Mining

Uploaded by

Rama Krishna Badiguntla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 57

Data Mining:

Concepts and Techniques

(3rd ed.)

— Chapter 1 —

1
Chapter 1. Introduction
 Why Data Mining?
 What Is Data Mining?
 A Multi-Dimensional View of Data Mining
 What Kind of Data Can Be Mined?
 What Kinds of Patterns Can Be Mined?
 What Technology Are Used?
 What Kind of Applications Are Targeted?
 Major Issues in Data Mining
 A Brief History of Data Mining and Data Mining Society
 Summary
2
Why Data Mining?

 The Explosive Growth of Data: from terabytes to petabytes

 Data collection and data availability
 Automated data collection tools, database systems, Web,
computerized society
 Major sources of abundant data
 Business: Web, e-commerce, transactions, stocks, …
 Science: Remote sensing, bioinformatics, scientific simulation, …
 Society and everyone: news, digital cameras, YouTube
 We are drowning in data, but starving for knowledge!
 “Necessity is the mother of invention”—Data mining—Automated
analysis of massive data sets

3
Evolution of Database Technology
 1960s:
 Data collection, database creation, IMS and network DBMS
 1970s:
 Relational data model, relational DBMS implementation
 1980s:
 RDBMS, advanced data models (extended-relational, OO, deductive, etc.)
 Application-oriented DBMS (spatial, scientific, engineering, etc.)
 1990s:
 Data mining, data warehousing, multimedia databases, and Web databases
 2000s
 Stream data management and mining
 Data mining and its applications
 Web technology (XML, data integration) and global information systems

4
May 31, 2023 Data Mining: Concepts and Techniques 5
May 31, 2023 Data Mining: Concepts and Techniques 6
Contd..
 This is data rich but information poor situation.
 Due to tremendous data they have far exceeded human
ability for comprehension without powerful tools.
 Important decisions are often based not on data stored in
data repository but on decision maker’s intuition.
 Efforts have been made to develop expert system and
knowledge-based technologies rely on users to input
knowledge into knowledge bases.
 But this procedure is prone to errors and is time
consuming.
 So systematic development of data mining tools are
required that can turn data tombs into golden nuggets of
knowledge.

May 31, 2023 Data Mining: Concepts and Techniques 7

Chapter 1. Introduction
 Why Data Mining?
 What Is Data Mining?
 A Multi-Dimensional View of Data Mining
 What Kind of Data Can Be Mined?
 What Kinds of Patterns Can Be Mined?
 What Technology Are Used?
 What Kind of Applications Are Targeted?
 Major Issues in Data Mining
 A Brief History of Data Mining and Data Mining Society
 Summary
8
What Is Data Mining?

 Data mining (knowledge discovery from data)

 Extraction of interesting (non-trivial, implicit, previously
unknown and potentially useful) patterns or knowledge from
huge amount of data
 Alternative names
 Knowledge discovery (mining) in databases (KDD), knowledge
extraction, data/pattern analysis, data archeology, data
dredging, information harvesting, business intelligence, etc.
 Watch out: Is everything “data mining”?
 Simple search and query processing

9
Knowledge Discovery (KDD) Process
 This is a view from typical
database systems and data
warehousing communities Pattern Evaluation
 Data mining plays an essential role
in the knowledge discovery process
Data Mining

Task-relevant Data

Data Selection
Warehouse
Data Cleaning

Data Integration

Databases
10
 Data Cleaning: Remove noise and inconsistent data.
 Data Integration: Multiple data sources may be
combined.
 Data selection: Data relevant to analysis task are
retrieved from the database.
 Data transformation: Consolidated into forms
appropriate for mining .
 Data mining:process where intelligent methods are
applied to extract data patterns.
 Pattern Evaluation: Identify truly interesting
patterns based on measures.
 Knowledge presentation: Visualization techniques
are used to present mined knowledge to users.
May 31, 2023 Data Mining: Concepts and Techniques 11
Example: A Web Mining Framework

 Web mining usually involves

 Data cleaning
 Data integration from multiple sources
 Warehousing the data
 Data cube construction
 Data selection for data mining
 Data mining
 Presentation of the mining results
 Patterns and knowledge to be used or stored into
knowledge-base

12
Chapter 1. Introduction
 Why Data Mining?
 What Is Data Mining?
 A Multi-Dimensional View of Data Mining
 What Kind of Data Can Be Mined?
 What Kinds of Patterns Can Be Mined?
 What Technology Are Used?
 What Kind of Applications Are Targeted?
 Major Issues in Data Mining
 A Brief History of Data Mining and Data Mining Society
 Summary
13
Chapter 1. Introduction
 Why Data Mining?
 What Is Data Mining?
 A Multi-Dimensional View of Data Mining
 What Kind of Data Can Be Mined?
 What Kinds of Patterns Can Be Mined?
 What Technology Are Used?
 What Kind of Applications Are Targeted?
 Major Issues in Data Mining
 A Brief History of Data Mining and Data Mining Society
 Summary
14
Database Data
 Collection of interrelated data known as a
database and a set of software programs to
manage and access the data.
 Relational database: collection of tables,each of
which is assigned a unique name.
 Each table consists of set of attributes and a large
set of tuples.
 Each tuple is identified by unique key and
described by set of attribute values.
 ER data model is often constructed for relational
databases.
Example: AllElectronics store

May 31, 2023 Data Mining: Concepts and Techniques 15

All Electronics store
 Relational tables are customer,item,employee and
branch.
Customer (cust_id, address, age, occupation, annual_income,
credit_information, category…)
Item (item_id,brand,category,type,price,place_made,supplier,
….)
Employee (empl_id,name,category,group,salary,commission,..)
Branch (branch_id,name,address,..)
Purchases (trans_id,cust_id,emp_id,date,time,method_paid,amount)
Items_sold (trans_id,item_id,qty)
Wotks_at ( emp_id,branch_id)

 Data can be accessed by database queries written

in relational query language.
 Query will be transformed into set of relational
operations and is optimized for efficient processing.
May 31, 2023 Data Mining: Concepts and Techniques 16
Contd..
 Mining relational databases we can search for
trends or data patterns.
 Ex:
 Analyze customer data to predict risk of new customers
based on income,age etc;
 Detect deviations
 Relational databases are commonly available and
richest information repositories.

May 31, 2023 Data Mining: Concepts and Techniques 17

Data Warehouses
 Repository of information collected from multiple sources,
stored under a unified schema, residing at a single site.
 Constructed via a process of data cleaning, integration,
transformation, loading, periodic data refreshing.
 Data organised around major subjects and provide
summarized information from a historical perspective.
 It is a multidimensional data structure called a data cube ,
in which each dimension correspond to set of attributes and
each cell stores value of aggregate measure.
 Provides a multidimensional view of data and allows the
precomputation and fast access of summarized data.
 Data warehouse tools support data analysis additional tools
needed for in-depth analysis.
 Multidimensional data mining allow exploration of multiple
combination of dimensions at varying levels of granularity.

May 31, 2023 Data Mining: Concepts and Techniques 18

Contd..

May 31, 2023 Data Mining: Concepts and Techniques 19

Framework of data warehouse

May 31, 2023 Data Mining: Concepts and Techniques 20

Contd..

May 31, 2023 Data Mining: Concepts and Techniques 21

Transactional Data
 Each record in a transactional database captures a
transaction.
 Ex: Customer purchase.
 A transaction typically includes a unique transaction
identity number (trans_ID) and a list of the items making
up the transaction, such as the items purchased in the
transaction.
 A transactional database may have additional tables,
which contain other information related to the
transactions, such as item description, information about
the salesperson or the branch.
 As an analyst of AllElectronics, you may ask,“Which items
sold well together?” This kind of market basket data
analysis would enable to bundle groups of items
together as a strategy for boosting sales.

May 31, 2023 Data Mining: Concepts and Techniques 22

Contd..

May 31, 2023 Data Mining: Concepts and Techniques 23

Other kinds of data
 There are many other kinds of data that have versatile forms
and structures and rather different semantic meanings.
 Such kinds of data can be seen in many applications
 time-related or sequence data (e.g., historical records,
stock exchange data, and time-series and biological
sequence data)
 data streams (e.g., video surveillance and sensor data,
which are continuously transmitted)
 spatial data (e.g., maps)
 Engineering design data (e.g., the design of buildings)
 Hypertext and multimedia data
 Graph and networked data (e.g., social and information

networks)
 Web (a huge, widely distributed information repository
made available by the Internet)
May 31, 2023 Data Mining: Concepts and Techniques 24
Contd..
 Various kinds of knowledge can be mined from these kinds of
data.
 Temporal data: we can mine banking data for changing
trends.
 Stock exchange data can be mined to uncover trends that
could help you plan investment strategies
 Computer network data streams to detect intrusions based

on the anomaly of message flows.

 With spatial data, we may look for patterns that describe
changes in metropolitan poverty rates based on city
distances from major highways.
 By mining text data, such as literature on data mining from
the past ten years, we can identify the evolution of hot
topics in the field.
 By mining user comments on products we can assess

customer sentiments and understand how well a product is

embraced by a market.
May 31, 2023 Data Mining: Concepts and Techniques 25
Contd..
 From multimedia data, we can mine images to identify objects
and classify them by assigning semantic labels or tags.
 By mining video data of a hockey game, we can detect video
sequences corresponding to goals.
 Web mining can help us learn
 About the distribution of information on the WWW

 Characterize and classify web pages

 Uncover web dynamics and the association

 Other relationships among different web pages, users,

communities, and web-based activities.

 Mining multiple data sources of complex data often leads to
fruitful findings due to the mutual enhancement and
consolidation of such multiple sources.
 It is also challenging because of the difficulties in data cleaning
and data integration, as well as the complex interactions among
the multiple sources of such data.
May 31, 2023 Data Mining: Concepts and Techniques 26
Chapter 1. Introduction
 Why Data Mining?
 What Is Data Mining?
 A Multi-Dimensional View of Data Mining
 What Kind of Data Can Be Mined?
 What Kinds of Patterns Can Be Mined?
 What Technology Are Used?
 What Kind of Applications Are Targeted?
 Major Issues in Data Mining
 A Brief History of Data Mining and Data Mining Society
 Summary
27
Data Mining Functions
 There are a number of data mining functionalities.
 characterization and discrimination

 the mining of frequent patterns, associations, and

correlations
 classification and regression
 clustering analysis
 outlier analysis
 Data mining functionalities are used to specify the kinds of
patterns to be found in data mining tasks.
 The tasks can be classified into two categories: descriptive and
predictive.
 Descriptive mining tasks characterize properties of the data in a
target data set.
 Predictive mining tasks perform induction on the current data in
order to make predictions.

May 31, 2023 Data Mining: Concepts and Techniques 28

Class/Concept Description
 Data entries can be associated with classes or concepts.
 Ex: Classes of items for sale include computers
and printers
Concepts of customers include bigSpenders
and budgetSpenders.

 These descriptions can be derived using

 Data characterization, by summarizing the data of the
class under study in general terms.
 Data discrimination, by comparison of the target class
with one or a set of comparative classes
 by Both data characterization and discrimination.

May 31, 2023 Data Mining: Concepts and Techniques 29

Contd..
 Data characterization is a summarization of the general
characteristics or features of a target class of data.
 Data characterization methods
 data summaries based on statistical measures

 data cube-based OLAP roll-up operation

 attribute-oriented induction technique

 The output of data characterization can be presented in

various forms.
 pie charts, bar charts, curves, multidimensional data

cubes, and multidimensional tables, including

crosstabs.
 generalized relations or in rule form.
 Summarize the characteristics of customers who spend
more than $5000 a year at AllElectronics.

May 31, 2023 Data Mining: Concepts and Techniques 30

Contd..
 Data discrimination is a comparison of the general
features of the target class data objects against the
general features of objects from one or multiple
contrasting classes.
 The target and contrasting classes can be specified by a
user, and the corresponding data objects can be retrieved
through database queries.
 compare the general features of software products with
sales that increased by 10% last year against those with
sales that decreased by at least 30% during the same
period.
 The forms of output presentation are similar to those for
characteristic descriptions.
 Discrimination descriptions expressed in the form of rules
are referred to as discriminant rules.

May 31, 2023 Data Mining: Concepts and Techniques 31

Data Mining Function: (2) Association
and Correlation Analysis
 Frequent patterns (or frequent itemsets)
 What items are frequently purchased together in your
Walmart?
 Frequent sub-sequence, frequent sub structure .
 Ex:laptop and digital camera followed by memory card.
 Graph,trees.
 Association, correlation
 A typical association rule
 Diaper  Beer [0.5%, 75%] (support, confidence)
 Are strongly associated items also strongly correlated?
 Multidimension association rule
 Minimum support threshold, confidence threshold.

32
Data Mining Function: (3) Classification

 Classification and label prediction

 Construct models (functions) based on some training examples
 Describe and distinguish classes or concepts for future prediction
 E.g., classify countries based on (climate), or classify cars
based on (gas mileage)
 Predict some unknown class labels
 Typical methods
 Decision trees, naïve Bayesian classification, support vector
machines, neural networks, rule-based classification, pattern-
based classification, logistic regression, …
 Typical applications:
 Credit card fraud detection, direct marketing, classifying stars,
diseases, web-pages, …

33
Contd..
 The derived model may be represented in
 Classification rules(IF-THEN)
 Decision trees.
 Mathematical Formulae or neural networks.

May 31, 2023 Data Mining: Concepts and Techniques 34

Contd..
 A neural network is typically a collection of neuron-like processing
units with weighted connections between the units.
 Classification predicts categorical (discrete, unordered) labels,
regression models, continuous-valued functions.
 Regression is used to predict missing or unavailable numerical data
values rather than (discrete) class labels.
 Regression analysis is a statistical methodology that is most often
used for numeric prediction, although other methods exist as well.
 Regression also encompasses the identification of distribution trends
based on the available data.
 Relevance analysis which attempts to identify attributes that are
significantly relevant to the classification and regression process.
 Ex:
 Classify items in a store based on responses to a sales campaign as mild
response,good response,no respone.
 Predict amount of revenue that each item will generate during upcoming
sale based on previous sale.

May 31, 2023 Data Mining: Concepts and Techniques 35

Data Mining Function: (4) Cluster Analysis

 Unsupervised learning (i.e., Class label is unknown)

 Group data to form new categories (i.e., clusters), e.g.,
cluster houses to find distribution patterns.
 Clustering analyzes data objects without consulting class
labels.
 In many cases, class-labeled data may simply not exist at
the beginning. Clustering can be used to generate class
labels for a group of data.
 The objects are clustered or grouped based on the
principle of maximizing the intraclass similarity and
minimizing the interclass similarity.

36
Contd..
 Clusters of objects are formed so that objects within a
cluster have high similarity in comparison to one another,
but are rather dissimilar to objects in other clusters.
 Each cluster so formed can be viewed as a class of
objects, from which rules can be derived.
 Clustering can also facilitate taxonomy formation, that is,
the organization of observations into a hierarchy of classes
that group similar events together.
 Ex: To identify homogeneous subpopulation of customers.

 Customer data Vs
Customer location

May 31, 2023 Data Mining: Concepts and Techniques 37

Data Mining Function: (5) Outlier Analysis
 Outlier analysis
 Outlier: A data object that does not comply with the general
behavior of the data
 Noise or exception? ― One person’s garbage could be another
person’s treasure
 Methods:
 Statistical test
 by product of clustering or regression analysis
 Density-based methods
 Useful in fraud detection, rare events analysis

38
Are all patterns interesting
 Only a small fraction of the patterns potentially
generated would actually be of interest to a given
user.
 A pattern is interesting if it is
 easily understood by humans,
 valid on new or test data with some degree of
certainty
 potentially useful, and
 novel.
 A pattern is also interesting if it validates a hypothesis
that the user sought to confirm.
 An interesting pattern represents knowledge.

May 31, 2023 Data Mining: Concepts and Techniques 39

Objective measures of pattern
interestingnes
 Support: Percentage of transactions from a transaction database that
the given rule satisfies.
 Confidence: the degree of certainty of the detected association.

 Each interestingness measure is associated with a threshold, which

may be controlled by the user. For example, rules that do not satisfy a
confidence threshold of, say, 50% can be considered uninteresting.
 Accuracy and coverage for classification rules.
 Accuracy tells us the percentage of data that are correctly classified by a rule.
 Coverage is similar to support, in that it tells us the percentage of data to which a
rule applies.
 They are often insufficient unless combined with subjective measures
that reflect a particular user's needs and interests.

May 31, 2023 Data Mining: Concepts and Techniques 40

Subjective interestingness
measures
 Based on user beliefs in the data.
 These measures find patterns interesting if the
patterns are unexpected (contradicting a user's
belief) or offer strategic information on which the
user can act.
 In the latter case, such patterns are referred to
as actionable. For example, patterns like “a large
earthquake often follows a cluster of small
quakes” .
 Patterns that are expected can be interesting if
they confirm a hypothesis that the user wishes to
validate or they resemble a user's hunch.

May 31, 2023 Data Mining: Concepts and Techniques 41

Contd..
 DM generate all interesting patterns?
 It is often unrealistic and inefficient for data mining systems to

generate all possible patterns.

 Instead, user-provided constraints and interestingness measures

should be used to focus the search.

 Generate only interesting patterns?
 Progress has been made in this direction; however, such

optimization remains a challenging issue in data mining.

 Measures of pattern interestingness are essential for the efficient
discovery of patterns by target users.
 Such measures can be used after the data mining step to rank the
discovered patterns according to their interestingness, filtering out the
uninteresting ones.
 More important, such measures can be used to guide and constrain
the discovery process, improving the search efficiency by pruning
away subsets of the pattern space that do not satisfy pre-specified
interestingness constraints.

May 31, 2023 Data Mining: Concepts and Techniques 42

May 31, 2023 Data Mining: Concepts and Techniques 44

Statistics
 Statistics studies the collection, analysis, interpretation or
explanation, and presentation of data. Data mining has an
inherent connection with statistics.
 Statistical models are widely used to model data and data
classes.
 Alternatively, data mining tasks can be built on top of statistical
models.
 Statistical methods can be used to summarize or describe a
collection of data.
 Statistical methods can also be used to verify data mining
results.
 Inferential statistics (or predictive statistics) is used to draw
inferences about the process or population under investigation.
 Many statistical methods have high complexity in computation.
 When such methods are applied on large data sets, algorithms
should be carefully designed and tuned to reduce the
computational cost.
May 31, 2023 Data Mining: Concepts and Techniques 45
Machine Learning
 Machine learning investigates how computers can learn (or
improve their performance) based on data.
 classic problems in machine learning that are highly related
to data mining
 Supervised Learning:The supervision in the learning

comes from the labeled examples in the training data set.

 Unsupervised :The learning process is unsupervised since

the input examples are not class labeled .we may use
clustering to discover classes within the data.
 Semi-supervised:That make use of both labeled and

unlabeled examples when learning a model.

 Active Learning: lets users play an active role in the

learning process.

May 31, 2023 Data Mining: Concepts and Techniques 46

Contd..

May 31, 2023 Data Mining: Concepts and Techniques 47

Database systems and data
warehouse systems
 Data mining can make good use of scalable database
technologies to achieve high efficiency and scalability on
large data sets.
 Data mining tasks can be used to extend the capability of
existing database systems to satisfy advanced users'
sophisticated data analysis requirements.
 A data warehouse integrates data originating from
multiple sources and various timeframes. It consolidates
data in multidimensional space to form partially
materialized data cubes.
 The data cube model not only facilitates OLAP in
multidimensional databases but also promotes
multidimensional data mining .

May 31, 2023 Data Mining: Concepts and Techniques 48

Information Retrieval
 Information retrieval (IR) is the science of
searching for documents or information in
documents.
 Information retrieval assumes that
 the data under search are unstructured
 the queries are formed mainly by keywords, which do
not have complex structures .
 Increasingly large amounts of text and multimedia
data have been accumulated and made available
online due to the fast growth of the Web.
 Their effective search and analysis have raised
many challenging issues in data mining.

May 31, 2023 Data Mining: Concepts and Techniques 49

 Mining Methodology
 Mining various and new kinds of knowledge
 DM tasks may use the same database in different ways and require
the development of numerous data mining techniques.
 Due to the diversity of applications, new mining tasks continue to
emerge, making data mining a dynamic and fast-growing field.
 For example, for effective knowledge discovery in information
networks, integrated clustering and ranking may lead to the
discovery of high-quality clusters and object ranks in large networks.
 Mining knowledge in multi-dimensional space
 We can search for interesting patterns among combinations of
dimensions (attributes) at varying levels of abstraction.
 Mining knowledge in cube space can substantially enhance the power
and flexibility of data mining.

51
Contd..
 Data mining: An interdisciplinary effort
 The power of data mining can be substantially enhanced by
integrating new methods from multiple disciplines.
 Ex: Text mining, Bug mining.
 Boosting the power of discovery in a networked environment
 Semantic links across multiple data objects can be used to advantage
in data mining.
 Knowledge derived in one set of objects can be used to boost the
discovery of knowledge in a “related” or semantically linked set of
objects.
 Ex: Web, files or documents.

 Handling noise, uncertainty, and incompleteness of data

 Errors and noise may confuse the data mining process, leading to the

derivation of erroneous patterns.

May 31, 2023 Data Mining: Concepts and Techniques 52

Contd..
 Data cleaning, data preprocessing, outlier detection and
removal, and uncertainty reasoning are examples of techniques
that need to be integrated with the data mining process.
 Pattern evaluation and pattern- or constraint-guided mining
 Not all the patterns generated by data mining processes are interesting.
 By using interestingness measures or user-specified constraints to guide
the discovery process, we may generate more interesting patterns and
reduce the search space.
 User Interaction
 Interactive mining
 Build flexible user interfaces facilitate user interaction with the
system.
 Allow users to dynamically change the focus of search .

May 31, 2023 Data Mining: Concepts and Techniques 53

Contd..
 Incorporation of background knowledge
 Information regarding the domain under study should be
incorporated into the discovery process to guide the search.
 Presentation and visualization of data mining results
 It requires the system to adopt expressive knowledge

representations, user-friendly interfaces, and visualization

techniques.
 Ad hoc data mining and DMQL
 Query languages (e.g., SQL) have played an important role in

flexible searching because they allow users to pose ad hoc

queries.
 High-level data mining query languages will give users the

freedom to define ad hoc data mining tasks.

 Facilitate specification of the relevant sets of data for analysis, the
domain knowledge, the kinds of knowledge to be mined, and the
conditions and constraints to be enforced on the discovered patterns.

May 31, 2023 Data Mining: Concepts and Techniques 54

Major Issues in Data Mining (2)

 Efficiency and Scalability

 Efficiency and scalability of data mining algorithm
 The running time of a data mining algorithm must be predictable,
short, and acceptable by applications.
 Efficiency, scalability, performance, optimization, and the ability to
execute in real time are key criteria that drive the development of
many new data mining algorithms.
 Parallel, distributed, stream, and incremental mining methods
 The wide distribution of data, and the computational complexity of
some data mining methods are factors that motivate the
development of parallel and distributed data-intensive mining
algorithms.
 Partition the data into pieces search for patterns in parallel and then
merge all patterns.
 Incremental mining incorporates new data updates without having to
mine the entire data

55
Contd..
 Diversity of data types
 Handling complex types of data
 It is unrealistic to expect one data mining system to mine all kinds of
data, given the diversity of data types and the different goals of data
mining.
 Domain- or application-dedicated data mining systems are being
constructed for in-depth mining of specific kinds of data.
 Mining dynamic, networked, and global data repositories
 The discovery of knowledge from different sources of structured,
semi-structured, or unstructured yet interconnected data with diverse
data semantics poses great challenges to data mining.
 Mining such gigantic, interconnected information networks may
help disclose many more patterns and knowledge in
heterogeneous data sets

May 31, 2023 Data Mining: Concepts and Techniques 56

Contd..
 Data mining and society
 Social impacts of data mining
 How can we use data mining technology to benefit society?
 How can we guard against its misuse?
 The improper disclosure or use of data and the potential violation of
individual privacy and data protection rights are areas of concern that
need to be addressed.
 Privacy-preserving data mining
 It poses the risk of disclosing an individual's personal information.
 Studies on privacy-preserving data publishing and data mining are
ongoing.
 The philosophy is to observe data sensitivity and preserve people's
privacy while performing successful data mining.
 Invisible data mining
 Incorporating data mining into their components to improve

their functionality and performance. This is done often

unbeknownst to the user. Ex: Buying patterns of customers
May 31, 2023 Data Mining: Concepts and Techniques 57

ML Tech Neo Study
No ratings yet
ML Tech Neo Study
146 pages
DWDM-Unit 2 CH-1
No ratings yet
DWDM-Unit 2 CH-1
36 pages
DWDM 01 Introduction
No ratings yet
DWDM 01 Introduction
43 pages
Data Mining - Lecture 2
No ratings yet
Data Mining - Lecture 2
33 pages
4 Data Mining & Preprocessing L 11,12,13,14,15,16
No ratings yet
4 Data Mining & Preprocessing L 11,12,13,14,15,16
100 pages
File 1705310604 0009750 Unit-1b
No ratings yet
File 1705310604 0009750 Unit-1b
46 pages
1. Introduction to Data Mining
No ratings yet
1. Introduction to Data Mining
23 pages
Data Mining and Machine Learning Notes by Niraj
No ratings yet
Data Mining and Machine Learning Notes by Niraj
34 pages
Data Mining: Concepts and Techniques: - Chapter 1
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 1
29 pages
Chapter 1
No ratings yet
Chapter 1
35 pages
Data Warehousing Data Mining Lecture Notes On UNIT 1
No ratings yet
Data Warehousing Data Mining Lecture Notes On UNIT 1
22 pages
1 Intro
No ratings yet
1 Intro
29 pages
Data Mining Introduction Unit III
No ratings yet
Data Mining Introduction Unit III
48 pages
Data Mining: Concepts and Techniques: - Chapter 1
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 1
29 pages
April 25, 2019 Data Mining: Concepts and Techniques
No ratings yet
April 25, 2019 Data Mining: Concepts and Techniques
21 pages
DWDM Unit-2
No ratings yet
DWDM Unit-2
75 pages
1-Intro
No ratings yet
1-Intro
50 pages
chap1_DMBI_Jan_April2022 (1)
No ratings yet
chap1_DMBI_Jan_April2022 (1)
46 pages
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 1
No ratings yet
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 1
34 pages
PPT 1
No ratings yet
PPT 1
34 pages
Data Mining: Concepts and Techniques: - Introduction
No ratings yet
Data Mining: Concepts and Techniques: - Introduction
44 pages
Data Mining: Concepts and Techniques: - Introduction
No ratings yet
Data Mining: Concepts and Techniques: - Introduction
43 pages
Data Mining
No ratings yet
Data Mining
48 pages
Debahuti Mishra Asst - Prof. Dept - of Computer Sc. & Engg. ITER, SOA University Bhubaneswar
No ratings yet
Debahuti Mishra Asst - Prof. Dept - of Computer Sc. & Engg. ITER, SOA University Bhubaneswar
37 pages
Lingma Acheson Department of Computer and Information Science, IUPUI Linglu@iupui - Edu
No ratings yet
Lingma Acheson Department of Computer and Information Science, IUPUI Linglu@iupui - Edu
46 pages
Data Mining: Concepts and Techniques: - Chapter 1
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 1
29 pages
4 - Data Mining & Preprocessing - L - 11,12,13,14,15,16
No ratings yet
4 - Data Mining & Preprocessing - L - 11,12,13,14,15,16
80 pages
Data Mining Concepts and Techniques
50% (2)
Data Mining Concepts and Techniques
136 pages
Chapter-1: October 27, 2020 Data Mining: Concepts and Techniques 1
No ratings yet
Chapter-1: October 27, 2020 Data Mining: Concepts and Techniques 1
35 pages
Intro Data Mining
No ratings yet
Intro Data Mining
30 pages
01intro (Autosaved)
No ratings yet
01intro (Autosaved)
43 pages
Chapter 1 - Introduction
No ratings yet
Chapter 1 - Introduction
36 pages
Module - 1 - DM
No ratings yet
Module - 1 - DM
52 pages
Data Mining: Concepts and Techniques: - Chapter 1 - Introduction
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 1 - Introduction
53 pages
Data Mining: Concepts and Techniques: Jiawei Han and Micheline Kamber
0% (1)
Data Mining: Concepts and Techniques: Jiawei Han and Micheline Kamber
58 pages
Data Mining: Concepts and Techniques: - Chapter 1 - Introduction
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 1 - Introduction
36 pages
Chapter 1. Introduction: December 8, 2021 Data Mining: Concepts and Techniques
No ratings yet
Chapter 1. Introduction: December 8, 2021 Data Mining: Concepts and Techniques
58 pages
Data Mining: Concepts and Techniques: - Introduction
No ratings yet
Data Mining: Concepts and Techniques: - Introduction
34 pages
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 1
No ratings yet
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 1
37 pages
Data Mining: Concepts and Techniques: - Chapter 1 - Introduction
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 1 - Introduction
33 pages
Unit - Introduction - : Data Mining: Concepts and Techniques
No ratings yet
Unit - Introduction - : Data Mining: Concepts and Techniques
56 pages
UNIT I DBMI
No ratings yet
UNIT I DBMI
35 pages
Unit-1 Introduction To Data Mining
No ratings yet
Unit-1 Introduction To Data Mining
85 pages
Data Mining Mod1
No ratings yet
Data Mining Mod1
128 pages
4 - Data Mining & Preprocessing - L - 11,12,13,14,15,16
No ratings yet
4 - Data Mining & Preprocessing - L - 11,12,13,14,15,16
100 pages
DM Unit1
No ratings yet
DM Unit1
40 pages
Data Mining:: - Chapter 1 - Introduction
No ratings yet
Data Mining:: - Chapter 1 - Introduction
48 pages
Chapter1 Introduction
No ratings yet
Chapter1 Introduction
42 pages
Chapter 1
No ratings yet
Chapter 1
55 pages
May 14, 2015 Data Mining: Concepts and Techniques
No ratings yet
May 14, 2015 Data Mining: Concepts and Techniques
29 pages
Data Mining: Concepts and Techniques: - Chapter 1 - by K.Purushotam Naidu
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 1 - by K.Purushotam Naidu
27 pages
Data Mining: Concepts and Techniques: - Chapter 1 - Introduction
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 1 - Introduction
52 pages
Introduction To Data Mining 1604
No ratings yet
Introduction To Data Mining 1604
32 pages
Chapitre 1
No ratings yet
Chapitre 1
22 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
31 pages
Data Mining
No ratings yet
Data Mining
29 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
48 pages
data mining 1
No ratings yet
data mining 1
39 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
25 pages
Data Mining for Beginners: A Programmer’s Guide
From Everand
Data Mining for Beginners: A Programmer’s Guide
Agasti Khatri
No ratings yet
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
1B Introduction-to-Machine-Learning
No ratings yet
1B Introduction-to-Machine-Learning
8 pages
Deep Learning Course File
No ratings yet
Deep Learning Course File
56 pages
Subject Stream Prediction A Machine Learning Approach To Select The Suitable Subject Stream For Senior Secondary Students in Sri Lanka
No ratings yet
Subject Stream Prediction A Machine Learning Approach To Select The Suitable Subject Stream For Senior Secondary Students in Sri Lanka
8 pages
AI Notes
No ratings yet
AI Notes
8 pages
3MTT Onboarding Learning Resources
No ratings yet
3MTT Onboarding Learning Resources
31 pages
ISP560 Group Project (new)
No ratings yet
ISP560 Group Project (new)
2 pages
Crop Recommendation System Using ML Algorithms Final-2
No ratings yet
Crop Recommendation System Using ML Algorithms Final-2
9 pages
Ensuring AI Safety in Autonomous Vehicles: A Framework Based on ISO PAS 8800
No ratings yet
Ensuring AI Safety in Autonomous Vehicles: A Framework Based on ISO PAS 8800
33 pages
Building Chatbots in Python Chapter2 PDF
No ratings yet
Building Chatbots in Python Chapter2 PDF
41 pages
Idsa Reviewer
No ratings yet
Idsa Reviewer
4 pages
PHI LEARNING Computer Science IT Engineering Electrical Electronics Mechanical Civil Chemical Metallurgy and Agricultural Catalogue 2015 PDF
No ratings yet
PHI LEARNING Computer Science IT Engineering Electrical Electronics Mechanical Civil Chemical Metallurgy and Agricultural Catalogue 2015 PDF
316 pages
Classification - KNN
No ratings yet
Classification - KNN
8 pages
Introduction To Deep Learning: Suresh Jaganathan
No ratings yet
Introduction To Deep Learning: Suresh Jaganathan
73 pages
19bit0368 Capstone Final Review
No ratings yet
19bit0368 Capstone Final Review
48 pages
Vishnu's Resume
No ratings yet
Vishnu's Resume
1 page
Instant ebooks textbook Handbook of Petroleum Geoscience: Exploration, Characterization, and Exploitation of Hydrocarbon Reservoirs Soumyajit Mukherjee download all chapters
100% (3)
Instant ebooks textbook Handbook of Petroleum Geoscience: Exploration, Characterization, and Exploitation of Hydrocarbon Reservoirs Soumyajit Mukherjee download all chapters
76 pages
AAN 2024 Day 5
No ratings yet
AAN 2024 Day 5
40 pages
Lesson Plan Class 9 Introduction to AI
100% (1)
Lesson Plan Class 9 Introduction to AI
3 pages
SSN Ieee
No ratings yet
SSN Ieee
3 pages
5-Convolutional Neural Network
No ratings yet
5-Convolutional Neural Network
43 pages
Master Thesis Example Computer Science
100% (2)
Master Thesis Example Computer Science
5 pages
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
No ratings yet
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
47 pages
Thesis Defense
No ratings yet
Thesis Defense
31 pages
04 - Neural Networks PDF
No ratings yet
04 - Neural Networks PDF
46 pages
Resource-Constrained Machine Learning For ADAS: A Systematic Review
No ratings yet
Resource-Constrained Machine Learning For ADAS: A Systematic Review
26 pages
Environmental Sound Classificationwith Convolutional Neural Networks
No ratings yet
Environmental Sound Classificationwith Convolutional Neural Networks
6 pages
AI Learning Journey
No ratings yet
AI Learning Journey
11 pages
M.SC - Data Science AY 2019 2020
No ratings yet
M.SC - Data Science AY 2019 2020
64 pages
Project Report Gr. No. 10_2023-24 (1)
No ratings yet
Project Report Gr. No. 10_2023-24 (1)
47 pages

Data Mining

Uploaded by

Data Mining

Uploaded by

Data Mining:

Concepts and Techniques

 The Explosive Growth of Data: from terabytes to petabytes

May 31, 2023 Data Mining: Concepts and Techniques 7

 Data mining (knowledge discovery from data)

 Web mining usually involves

May 31, 2023 Data Mining: Concepts and Techniques 15

 Data can be accessed by database queries written

May 31, 2023 Data Mining: Concepts and Techniques 17

May 31, 2023 Data Mining: Concepts and Techniques 18

May 31, 2023 Data Mining: Concepts and Techniques 19

May 31, 2023 Data Mining: Concepts and Techniques 20

May 31, 2023 Data Mining: Concepts and Techniques 21

May 31, 2023 Data Mining: Concepts and Techniques 22

May 31, 2023 Data Mining: Concepts and Techniques 23

on the anomaly of message flows.

customer sentiments and understand how well a product is

 Characterize and classify web pages

 Uncover web dynamics and the association

 Other relationships among different web pages, users,

communities, and web-based activities.

 the mining of frequent patterns, associations, and

May 31, 2023 Data Mining: Concepts and Techniques 28

 These descriptions can be derived using

May 31, 2023 Data Mining: Concepts and Techniques 29

 data cube-based OLAP roll-up operation

 attribute-oriented induction technique

 The output of data characterization can be presented in

cubes, and multidimensional tables, including

May 31, 2023 Data Mining: Concepts and Techniques 30

May 31, 2023 Data Mining: Concepts and Techniques 31

 Classification and label prediction

May 31, 2023 Data Mining: Concepts and Techniques 34

May 31, 2023 Data Mining: Concepts and Techniques 35

 Unsupervised learning (i.e., Class label is unknown)

May 31, 2023 Data Mining: Concepts and Techniques 37

May 31, 2023 Data Mining: Concepts and Techniques 39

 Each interestingness measure is associated with a threshold, which

May 31, 2023 Data Mining: Concepts and Techniques 40

May 31, 2023 Data Mining: Concepts and Techniques 41

generate all possible patterns.

should be used to focus the search.

optimization remains a challenging issue in data mining.

May 31, 2023 Data Mining: Concepts and Techniques 42

May 31, 2023 Data Mining: Concepts and Techniques 44

comes from the labeled examples in the training data set.

unlabeled examples when learning a model.

May 31, 2023 Data Mining: Concepts and Techniques 46

May 31, 2023 Data Mining: Concepts and Techniques 47

May 31, 2023 Data Mining: Concepts and Techniques 48

May 31, 2023 Data Mining: Concepts and Techniques 49

 Handling noise, uncertainty, and incompleteness of data

derivation of erroneous patterns.

May 31, 2023 Data Mining: Concepts and Techniques 52

May 31, 2023 Data Mining: Concepts and Techniques 53

representations, user-friendly interfaces, and visualization

flexible searching because they allow users to pose ad hoc

freedom to define ad hoc data mining tasks.

May 31, 2023 Data Mining: Concepts and Techniques 54

 Efficiency and Scalability

May 31, 2023 Data Mining: Concepts and Techniques 56

their functionality and performance. This is done often

You might also like