0% found this document useful (0 votes)

38 views

Ch-4 Data Mining Knowledge Representation Premitives

Data mining involves extracting knowledge from large amounts of data. A data mining query specifies the task, including the relevant data, type of knowledge to be mined, background knowledge, and interestingness measures. The relevant data can be selected from databases or data warehouses using conditions. Background knowledge includes concept hierarchies that allow discovering patterns at different levels of abstraction. Interestingness measures estimate the simplicity, certainty, utility, and novelty of patterns to filter uninteresting ones. Discovered patterns are presented using various visualizations like rules, tables, charts, and trees.

Uploaded by

Satyam Shaw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views

Ch-4 Data Mining Knowledge Representation Premitives

Uploaded by

Satyam Shaw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 16

Ch-4: DATA MINING

PRIMITIVES
• Data Mining:

Data Miningrefers to extracting on mining

knowledge from large amount of data.
• Data Mining Primitives:

A data mining task can be specified in the form of a data

mining query which is input to the data mining system
• A mining query is defined in terms of the following

 Task-Relevant Data

 The Kind Of Knowledge to be Mined

 Background Knowledge : Concept Hierarchies

 Interestingness Measures

 Presentation and Visualization of Discovered Pattern

TASK-RELEVANT DATA

• The set of task relevant data can be collected a relational query(SQL

and DMQL) involving operation like selection , projection , join
and aggregation.
• The data collection process results in a new data relation called the
initial data relation.
• The initial relation may or may not correspond to a physical relation
in the database.
• Virtual relation are called views in the field of databases, the set of
task-relevant data for data mining is called a minable view.
• The task-relevant data can be specified by providing the following
information:
 The names of the database or data warehouse to be used

 The names of the tables or data cubes containing the

relevant
data
 Condition for selection the relevant data

 The relevant attributes or dimensions

 The data retrieved be grouped by certain attributes ,

such as
“grouped by data”
• The set of task relevant data can be specified by condition based

data filtering ,slicing or dicing of the data cube

• For eg : A concept hierarchy on item that specifies that “home

entertainment ” is at a higher concept level , composed of the lower

concept level {“TV”,”CD player ”, ” VCR”} can be used in the

collection of the task-relevant data.

THE KIND OF KNOWLEDGE TO BE MINED

• The kinds of knowledge include concept description

(characterization , discrimination ), association , classification ,
prediction , clustering , and evolution analysis.
• These templates or metapatterns can be used to guide the discovery
process.
• For eg :
age(X ,”30…39”) ^ income (X,”40K…49K”) =>buys (X,”VCR”)

[2.2%,60%]
BACKGROUND KNOWLEDGE : CONCEPT
HIERARCHIES

• Background knowledge is information about the domain to be

mined that can be useful in the discovery process.
• Background knowledge known as concept hierarchies. concept
hierarchies allows the discovery of knowledge at multiple levels of
abstraction.
• concept hierarchies defines a sequence of mappings from a set of
low-level concept to higher-level .
Concept hierarchy
• concept hierarchies is represented as a set of nodes organized in a
tree , where each node , in itself , represents a concept.
• There are four types of concept hierarchies :

 Schema hierarchies
 Set grouping hierarchies

 Operation-derived hierarchies

 Rule –based hierarchies.

• Schema hierarchies : is a total or partial order among attributes in the
database schema.
street < city < state < country
• Set grouping hierarchies : organizes a values for a given attribute or
dimension into groups of constants or range values.
{young , middle-age) C all (age)
{20…39} C young
{40…59} C middle-aged
• Operation-derived hierarchies : include the decoding of
information-
encoded string , information extraction from complex data objects.
login-name < department < university < country forming a email
address.
• Rule –based hierarchies : set of rules and is evaluated dynamically based
on the current database data and the rule definition.
low_profit_margin(X) <= price( X,P1) ^ cost (X,P2) ^ (( P1-P2)
<
$50)
INTERESTINGNESS MEASURES
• The number of uninteresting patterns returned by the process. This can
be achieved by specifying interestingness measure that estimate the
 simplicity,

 certainty ,

 utility and

 novelty.

• Each measure is associated with a threshold that can be controlled by the

user.
• SIMPLICITY:

Simplicity can be viewed as functions of the pattern

structure defined in terms of the pattern size in bits or the number of
attributes or operators appearing in the pattern. for eg: rule length.
• CERTAINTY:

Each discovery pattern should have a measure of certainty

associated with it that assesses the validity or trustworthiness of the
pattern. A certainty measure for associated rules of the form
“A=>B”, where A and B are set of items, is confidence.

confidence(A=>B)=
#_tuples_containing_both_A_and_B

#_tuples_containing_A
• UTILITY:
It can be estimated by a utility function such as support. The
support of an associated pattern refers to the percentage of task-relevant
data tuples for which the pattern is true .for associated rules of the form
“A=>B” where A and B are set of items,
support(A=>B) = #_tuples_containing_both_A_and_B
total_#_of_tuples
• NOVELTY:
It contribute new information or increased performed to the given
pattern set. Novelty is removed redundant patterns. For eg: a data
exception may be considered novel in it differs from that based on
statistical model or user beliefs.
location(X,”CANADA”) => buys( X,”SONY_TV”) [8%, 70%]
PRESENTATION AND VISUALIZATION OF
DISCOVERED PATTERNS

• Data mining system should be able to display the discovery patterns

in multiple patterns such as rules, tables, crosstabs, pie charts,
decision tree, cubes, or other visual representations .
• Data mining system should employ concept hierarchies to
implement drill-down and roll-up operation. So that users may
discovery patterns at multiple levels of abstraction.
• In addition pivoting, slicing and dicing operation ,the user in
viewing generalized data and knowledge from different perspective.
Various form of presenting and visualizing the
discovered pattern

Cks 2
No ratings yet
Cks 2
8 pages
Data Mining Primitives, Languages and System Architecture
No ratings yet
Data Mining Primitives, Languages and System Architecture
26 pages
Data Mining Primitives, Languages and System Architecture
No ratings yet
Data Mining Primitives, Languages and System Architecture
64 pages
Data Mining Primitives, Languages and System Architecture
No ratings yet
Data Mining Primitives, Languages and System Architecture
64 pages
Data Mining Primitives
No ratings yet
Data Mining Primitives
39 pages
U1 - Data Mining Task Primitives
No ratings yet
U1 - Data Mining Task Primitives
4 pages
Data Mining Task Primitives and Major Issues
No ratings yet
Data Mining Task Primitives and Major Issues
18 pages
Data Minning Problem
No ratings yet
Data Minning Problem
8 pages
CH 4
No ratings yet
CH 4
30 pages
Unit-2 data Mining
No ratings yet
Unit-2 data Mining
23 pages
Data Mining: Concepts and Techniques: - Chapter 4
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 4
29 pages
Data Mining: Concepts and Techniques: - Chapter 4
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 4
24 pages
Data Mining-2-1
No ratings yet
Data Mining-2-1
12 pages
DWM Module 2
No ratings yet
DWM Module 2
122 pages
Assignment 1
No ratings yet
Assignment 1
11 pages
2.1 DM Primitives22
No ratings yet
2.1 DM Primitives22
12 pages
Data Mining
No ratings yet
Data Mining
27 pages
4chap4 BM
No ratings yet
4chap4 BM
24 pages
3-Data Mining Task Primitives-19-12-2024
No ratings yet
3-Data Mining Task Primitives-19-12-2024
8 pages
Mining Frequent Patterns, Association and Correlations
No ratings yet
Mining Frequent Patterns, Association and Correlations
42 pages
BCA-404: Data Mining and Data Ware Housing
No ratings yet
BCA-404: Data Mining and Data Ware Housing
19 pages
Chapter-1 - Introduction To Data Mining
No ratings yet
Chapter-1 - Introduction To Data Mining
10 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
22 pages
Unit-4 DWM
No ratings yet
Unit-4 DWM
73 pages
CS1004 DWM 2marks 2013
No ratings yet
CS1004 DWM 2marks 2013
22 pages
UNIT-3 Data Mining Primitives, Languages, and System Architectures
No ratings yet
UNIT-3 Data Mining Primitives, Languages, and System Architectures
27 pages
Major issues in DM
No ratings yet
Major issues in DM
5 pages
Primitives
100% (1)
Primitives
3 pages
Data Warehousing & Data Mining Syllabus Subject Code:56055 L:4 T/P/D:0 Credits:4 Int. Marks:25 Ext. Marks:75 Total Marks:100
No ratings yet
Data Warehousing & Data Mining Syllabus Subject Code:56055 L:4 T/P/D:0 Credits:4 Int. Marks:25 Ext. Marks:75 Total Marks:100
52 pages
Data Mining Notes UNIT I
No ratings yet
Data Mining Notes UNIT I
21 pages
Data Mining: An Overview From A Database Perspective
No ratings yet
Data Mining: An Overview From A Database Perspective
30 pages
Cs1004: Data Warehousing and Mining Two Marks Questions and Answers Unit I
No ratings yet
Cs1004: Data Warehousing and Mining Two Marks Questions and Answers Unit I
31 pages
Data Mining 1 2 and 3
No ratings yet
Data Mining 1 2 and 3
20 pages
Web Mining - Lec1 2
No ratings yet
Web Mining - Lec1 2
62 pages
Chapter 1. Introduction
No ratings yet
Chapter 1. Introduction
323 pages
Dmbi
No ratings yet
Dmbi
9 pages
Week1-2
No ratings yet
Week1-2
24 pages
Unit 1..
No ratings yet
Unit 1..
27 pages
DM - MOD - 1 Part I
No ratings yet
DM - MOD - 1 Part I
9 pages
CH 4
No ratings yet
CH 4
58 pages
Unit 1 DMDW
No ratings yet
Unit 1 DMDW
57 pages
Unit-1 Notes (1)
No ratings yet
Unit-1 Notes (1)
24 pages
AIML-HC Mod 02
No ratings yet
AIML-HC Mod 02
65 pages
Solutions To DM I MID (A)
100% (1)
Solutions To DM I MID (A)
19 pages
DWDM R13 Unit 1 PDF
No ratings yet
DWDM R13 Unit 1 PDF
10 pages
Introduction To Data Mining Techniques: Dr. Rajni Jain
No ratings yet
Introduction To Data Mining Techniques: Dr. Rajni Jain
11 pages
2-Introduction To Data Mining, Steps in Data Mining Process-31-07-2024
No ratings yet
2-Introduction To Data Mining, Steps in Data Mining Process-31-07-2024
77 pages
6asso ST
No ratings yet
6asso ST
77 pages
Data Mining Summaries PDF
No ratings yet
Data Mining Summaries PDF
22 pages
Data Mining & Data Warehousing
No ratings yet
Data Mining & Data Warehousing
84 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
94 pages
Data Mining
No ratings yet
Data Mining
57 pages
DM Unit 1 PDF
No ratings yet
DM Unit 1 PDF
9 pages
DM Unit2(Part1)
No ratings yet
DM Unit2(Part1)
19 pages
Data Collection: Six Sigma Thinking, #1
From Everand
Data Collection: Six Sigma Thinking, #1
Sumeet Savant
No ratings yet
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
The Future of Search
From Everand
The Future of Search
Andres J. Clary
No ratings yet
Business Statistics I Essentials
From Everand
Business Statistics I Essentials
Louise Clark
5/5 (5)
Statistical Classification: Fundamentals and Applications
From Everand
Statistical Classification: Fundamentals and Applications
Fouad Sabry
No ratings yet
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
From Everand
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
César Pérez López
No ratings yet
Data Structures and Algorithm
From Everand
Data Structures and Algorithm
Knowledge Flow
No ratings yet
FRR Whitepaper ENG 6-8-22
No ratings yet
FRR Whitepaper ENG 6-8-22
40 pages
MR20H40 / MR25H40: Features
No ratings yet
MR20H40 / MR25H40: Features
30 pages
Exception Handling
No ratings yet
Exception Handling
7 pages
Advanced Web Designing
No ratings yet
Advanced Web Designing
96 pages
Visual Code C++ Compiler - Google Search
No ratings yet
Visual Code C++ Compiler - Google Search
3 pages
Introduction to Engineering and Scientific Computing with Python 1st Edition David E. Clough - The full ebook version is available, download now to explore
100% (1)
Introduction to Engineering and Scientific Computing with Python 1st Edition David E. Clough - The full ebook version is available, download now to explore
74 pages
Intelligent and Multipurpose Smart Pole
No ratings yet
Intelligent and Multipurpose Smart Pole
11 pages
WDT01 Introduction
No ratings yet
WDT01 Introduction
89 pages
IT-020-4:2013 Computer System Administration Title: Mobile
100% (1)
IT-020-4:2013 Computer System Administration Title: Mobile
7 pages
Finnish Startup Permit Eligibility Statement Request 1 Instructions
No ratings yet
Finnish Startup Permit Eligibility Statement Request 1 Instructions
5 pages
Pakiza Offer Letter
No ratings yet
Pakiza Offer Letter
5 pages
Soft Computing Soft Computing
No ratings yet
Soft Computing Soft Computing
215 pages
Reading Comprehension 1
0% (1)
Reading Comprehension 1
12 pages
Telco Customer Churn Prediction Project Report
No ratings yet
Telco Customer Churn Prediction Project Report
40 pages
5-Review of DBMS Techniques - Normalization-09-01-2024
No ratings yet
5-Review of DBMS Techniques - Normalization-09-01-2024
62 pages
Fla Unit 4
No ratings yet
Fla Unit 4
103 pages
SQA Finals
No ratings yet
SQA Finals
36 pages
Geoscience Knowledge Graph in The Big Data Era
No ratings yet
Geoscience Knowledge Graph in The Big Data Era
11 pages
MikroTik Loop Protect
No ratings yet
MikroTik Loop Protect
8 pages
Configuration Manual
No ratings yet
Configuration Manual
30 pages
BPSC Computer Question 2023 A
No ratings yet
BPSC Computer Question 2023 A
48 pages
CANoe FactSheet EN 01 PDF
No ratings yet
CANoe FactSheet EN 01 PDF
2 pages
Big Data Analytics in Cloud Computing
No ratings yet
Big Data Analytics in Cloud Computing
8 pages
OS Exam - AL ICT
No ratings yet
OS Exam - AL ICT
5 pages
Lab 1.1
No ratings yet
Lab 1.1
8 pages
Perception of Students Towards Facebook
No ratings yet
Perception of Students Towards Facebook
59 pages
Virtual Numbers From United States Receive SMS Online
No ratings yet
Virtual Numbers From United States Receive SMS Online
1 page
Advanced Linux 3D Graphics Programming
No ratings yet
Advanced Linux 3D Graphics Programming
640 pages
Q Underwriting Motor Trade Road Risks Policy Wording
No ratings yet
Q Underwriting Motor Trade Road Risks Policy Wording
39 pages

Ch-4 Data Mining Knowledge Representation Premitives

Uploaded by

Ch-4 Data Mining Knowledge Representation Premitives

Uploaded by

Ch-4: DATA MINING

Data Miningrefers to extracting on mining

A data mining task can be specified in the form of a data

 The Kind Of Knowledge to be Mined

 Background Knowledge : Concept Hierarchies

 Presentation and Visualization of Discovered Pattern

• The set of task relevant data can be collected a relational query(SQL

 The names of the tables or data cubes containing the

 The relevant attributes or dimensions

 The data retrieved be grouped by certain attributes ,

data filtering ,slicing or dicing of the data cube

• For eg : A concept hierarchy on item that specifies that “home

entertainment ” is at a higher concept level , composed of the lower

concept level {“TV”,”CD player ”, ” VCR”} can be used in the

collection of the task-relevant data.

• The kinds of knowledge include concept description

• Background knowledge is information about the domain to be

 Rule –based hierarchies.

• Each measure is associated with a threshold that can be controlled by the

Simplicity can be viewed as functions of the pattern

Each discovery pattern should have a measure of certainty

• Data mining system should be able to display the discovery patterns

You might also like