0% found this document useful (0 votes)

113 views

Data Mining Query Languages

The document discusses three approaches to integrating data mining capabilities with database querying: DMQL, MSQL, and OLE DB for DM. DMQL introduces a data mining query language for relational databases that specifies relevant data, background knowledge, rules to discover, and thresholds. MSQL focuses on association rule querying and generation, allowing selective rule retrieval and pruning. OLE DB for DM aims to define, populate, and query mining models within SQL databases.

Uploaded by

Jámès Kõstã

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

113 views

Data Mining Query Languages

Uploaded by

Jámès Kõstã

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Data Mining Query

Languages
Kristen LeFevre
April 19, 2004
With Thanks to Zheng Huang and Lei Chen
Outline

 Introduce the problem of querying

data mining models
 Overview of three different solutions

and their contributions

 Topic for Discussion: What would an

ideal solution support?

Problem Description
 You guys are armed with two powerful tools
 Database management systems
 Efficient and effective data mining algorithms
and frameworks
 Generally, this work asks:
 “How can we merge the two?”
 “How can we integrate data mining more
closely with traditional database systems,
particularly querying?”
Three Different Answers
 DMQL: A Data Mining Query
Language for Relational Databases
(Han et al, Simon Fraser University)
 Integrating Data Mining with SQL
Databases: OLE DB for Data Mining
(Netz et al, Microsoft)
 MSQL: A Query Language for
Database Mining (Imielinski &
Virmani, Rutgers University)
Some Common Ground
 Create and manipulate data mining models
through a SQL-based interface (“Command-
driven” data mining)
 Abstract away the data mining particulars
 Data mining should be performed on data in
the database (should not need to export to
a special-purpose environment)
 Approaches differ on what kinds of models
should be created, and what operations we
should be able to perform
DMQL
 Commands specify the following:
 The set of data relevant to the data mining
task (the training set)
 The kinds of knowledge to be discovered
• Generalized relation
• Characteristic rules
• Discriminant rules
• Classification rules
• Association rules
DMQL

 Commands Specify the following:

 Background knowledge
• Concept hierarchies based on attribute
relationships, etc.
 Various thresholds
• Minimum support, confidence, etc.
DMQL
 Syntax
use database <database_name>
Specify background
knowledge
{use hierarchy <hierarchy_name> for
Specify rules to be
<attribute>}
discovered <rule_spec>
Relevant attributes or
aggregations related to <attr_or_agg_list>
Collect the set of from <relation(s)>
relevant data to mine
[where <conditions>]
[order by <order list>]
Specify threshold {with [<kinds of>] threshold =
parameters
<threshold_value> [for <attribute(s)>]}
DMQL
 Syntax <rule_spec>
find classification rules [as <rule_name>]
[according to <attributes>]

Find association rules [as <rule_name>]

generalize data [into <relation_name>]

others
DMQL
use database Hospital
find association rules as Heart_Health
related to Salary, Age, Smoker,
Heart_Disease
from Patient_Financial f, Patient_Medical m
where f.ID = m.ID and m.age >= 18
with support threshold = .05
with confidence threshold = .7
DMQL

 DMQL provides a display in

command to view resulting rules, but
no advanced way to query them
 Suggests that a GUI interface might

aid in the presentation of these results

in different forms (charts, graphs, etc.)
MSQL

 Focus on Association Rules

 Seeks to provide a language both to

selectively generate rules, and

separately to query the rule base
 Expressive rule generation language,

and techniques for optimizing some

commands
MSQL
 Get-Rules and Select-Rules Queries
 Get-Rules operator generates rules over
elements of argument class C, which satisfy
conditions described in the “where” clause
[Project Body, Consequent,
confidence, support]
GetRules(C) [as R1]
[into <rulebase_name>]
[where <conds>]
[sql-group-by clause]
[using-clause]
MSQL
 <conds> may contain a number of
conditions, including:
 restrictions on the attributes in the body or
consequent
in, has, and is are rule • “rule.body HAS {(Job = ‘Doctor’}”
subset, superset,
and equality • “rule1.consequent IN rule2.body”
respectively • “rule.consequent IS {Age = *}”
 pruning conditions (restrict by support,
confidence, or size)
 Stratified or correlated subqueries
MSQL
GetRules(Patients)
where Body has {Age = *}
and Support > .05 and Confidence > .7
and not exists ( GetRules(Patients)
Support > .05 and
Confidence > .7
and R2.Body HAS R1.Body)

Retrieve all rules with descriptors of the form “Age = x” in the body,
except when there is a rule with equal or greater support and
confidence with a rule containing a superset of the descriptors in
the body
MSQL
GetRules(C) R1
where <pruning-conds>
correlated and not exists ( GetRules(C) R2
where <same pruning-conds>
and R2.Body HAS R1.Body)

GetRules(C) R1
where <pruning-conds>
and consequent is {(X=*)}
stratified and consequent in (SelectRules(R2)
where consequent is {(X=*)}
MSQL
 Nested Get-Rules Queries and their
optimization
 Stratified(non-corrolated) queries are
evaluated “bottom-up.” The subquery is
evaluated first, and replaced with its results
in the outer query.
 Correlated queries are evaluated either top-
down or bottom-up (like “loop-unfolding”),
and there are rules for choosing between the
two options
MSQL
GetRules(Patients)
where Body has {Age = *}
and Support > .05 and Confidence > .7
and not exists ( GetRules(Patients)
Support > .05 and
Confidence > .7
and R2.Body HAS R1.Body)
MSQL

Top-Down Evaluation
GetRules(Patients)
where Body has {Age = *}
and Support > .05 and Confidence > .7

For each rule produced by the outer, evaluate the

inner
not exists ( GetRules(Patients)
Support > .05 and Confidence > .7
and R2.Body HAS R1.Body)
MSQL

Bottom-Up Evaluation
not exists ( GetRules(Patients)
Support > .05 and Confidence > .7
and R2.Body HAS R1.Body)

For each rule produced by the inner, evaluate the

outer
GetRules(Patients)
where Body has {Age = *}
and Support > .05 and Confidence > .7
MSQL
 Choosing between the two
 In general, evaluate the expression with more
restrictive conditions first
 Heuristic rules
• Evaluate the query with higher support threshold first
• Next consider confidence threshold
Meant to prevent
• A (length = x) expression is in general more restrictive
unconstrained than (length > x), which is more restrictive than (length <
queries from being x)
evaluated first • “Body IS (constant expression)” is more restrictive than
“Body HAS”, which is more restrictive than “Body IN”
• Next consider “Consequent IN” expressions
• Descriptors of for (A = a) are more restrictive than
wildcards such as (A = *)
OLE DB for DM
 An extension to the OLE DB interface for
Microsoft SQL Server
 Seeks to support the following ideas:
 Define a model by specifying the set of
attributes to be predicted, the attributes used
for the prediction, and the algorithm
 Populate the model using the training data
None of the  Predict attributes for new data using the
others
seemed to populated model
support this  Browse the mining model (not fully
addressed because it varies a lot by model
type)
OLE DB for DM
 Defining a Mining Model
 Identify the set of data attributes to be
predicted, the set of attributes to be used for
prediction, and the algorithm to be used for
building the model
 Populating the Model
 Pullthe information into a single rowset
using views, and train the model using the
data and algorithm specified
 Supports complex objects, so rowset may be
hierarchical (see paper for more complex
examples)
OLE DB for DM

 Using the mining model to predict

 Defines a new operator prediction join.
A model may be used to make
predictions on datasets by taking the
prediction join of the mining model
and the data set.
OLE DB for DM
CREATE MINING MODEL [Heart_Health Prediction]
[ID] Int Key,
[Age] Int,
[Smoker] Int,
[Salary] Double discretized,
[HeartAttack] Int PREDICT, %Prediction column
USING [Decision_Trees_101]

Identifies the source columns for the training

data, the column to be predicted, and the data
mining algorithm.
OLE DB for DM
INSERT INTO [Heart_Health Prediction]
([ID], [Age], [Smoker], [Salary])
SELECT [ID], [Age], [Smoker], [Salary] FROM
Patient_Medical M, Patient_Financial F
WHERE M.ID = F.ID

The INSERT represents using a tuple for

training the model (not actually inserting it into
the rowset).
OLE DB for DM
SELECT t.[ID],
[Heart_Health Prediction].[HeartAttack]
FROM [Heart_Health Prediction]
PREDICTION JOIN (
SELECT [ID], [Age], [Smoker], [Salary]
FROM Patient_Medical M, Patient_Financial F
WHERE M.ID = F.ID) as t
ON [Heart_Health Prediction].Age = t.Age AND
[Heath_Health Prediction].Smoker = t.Smoker
AND [Heart_Health Prediction].Salary =
t.Salary

Prediction join connects the model and an actual data

table to make predictions
Key Ideas

 Important to have an API for creating

and manipulating data mining models
 The data is already in the DBMS, so it

makes sense to do the data mining

where the data is
 Applications already use SQL, so a

SQL extension seems logical

Key Ideas
 Need a method for defining data mining
models, including algorithm specification,
specification of various parameters, and
training set specification (DMQL, MSQL,
ODBDM)
 Need a method of querying the models
(MSQL)
 Need a way of using the data mining model
to interact with other data in the database,
for purposes such as prediction (ODBDM)
Discussion Topic:
What Functionality would
and Ideal Solution
Support?

T Eal 52 Verb Tenses Poster - Ver - 1
No ratings yet
T Eal 52 Verb Tenses Poster - Ver - 1
3 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
UFED Link Analysis Trial Guide1.6
No ratings yet
UFED Link Analysis Trial Guide1.6
13 pages
CHPT 11.2 Powerpoint
100% (1)
CHPT 11.2 Powerpoint
59 pages
Icde 02
No ratings yet
Icde 02
15 pages
DBMS Support of The Data Mining
No ratings yet
DBMS Support of The Data Mining
54 pages
4chap4 BM
No ratings yet
4chap4 BM
24 pages
FFFFFFFFFFFFFFFFFFFF
No ratings yet
FFFFFFFFFFFFFFFFFFFF
17 pages
Data Mining: Prof Jyotiranjan Hota
No ratings yet
Data Mining: Prof Jyotiranjan Hota
17 pages
Introduction To Data Mining Instructors Solution Manual 1st ed. Edition Tan download
100% (1)
Introduction To Data Mining Instructors Solution Manual 1st ed. Edition Tan download
48 pages
Data Mining What Is Data Mining?
No ratings yet
Data Mining What Is Data Mining?
11 pages
Aim:Write DMQL Queries For Datasets.: Characterization
No ratings yet
Aim:Write DMQL Queries For Datasets.: Characterization
4 pages
LNAI 2682 Declarative Data Mining Using SQL3 1st Edition by Hasan Jamil ISBN 9783540224792 354022479X - The ebook with rich content is ready for you to download
100% (9)
LNAI 2682 Declarative Data Mining Using SQL3 1st Edition by Hasan Jamil ISBN 9783540224792 354022479X - The ebook with rich content is ready for you to download
60 pages
LNAI 2682 Declarative Data Mining Using SQL3 1st Edition by Hasan Jamil ISBN 9783540224792 354022479X instant download
100% (1)
LNAI 2682 Declarative Data Mining Using SQL3 1st Edition by Hasan Jamil ISBN 9783540224792 354022479X instant download
52 pages
9
No ratings yet
9
6 pages
Score: Context-Oriented Structured and Unstructured Information Integration
No ratings yet
Score: Context-Oriented Structured and Unstructured Information Integration
35 pages
Data Mining: Concepts and Techniques: - Chapter 4
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 4
29 pages
Anusha, C._ Vinay, S.K._ Pooja Raj, H.J._ Ranganatha, S. - [Institution of Engineering and Technology National Conference on Challenges in Research & Technology in the Coming Decades Na (2013, Institution of Eng
No ratings yet
Anusha, C._ Vinay, S.K._ Pooja Raj, H.J._ Ranganatha, S. - [Institution of Engineering and Technology National Conference on Challenges in Research & Technology in the Coming Decades Na (2013, Institution of Eng
5 pages
Data Mining Syllabus and Question
No ratings yet
Data Mining Syllabus and Question
6 pages
2-select-optimization
No ratings yet
2-select-optimization
23 pages
Winsem2012-13 Cp0535 Modqst Model QP
No ratings yet
Winsem2012-13 Cp0535 Modqst Model QP
4 pages
Introduction To Data Mining Instructors Solution Manual 1st ed. Edition Tan - Get the ebook instantly with just one click
100% (1)
Introduction To Data Mining Instructors Solution Manual 1st ed. Edition Tan - Get the ebook instantly with just one click
40 pages
Micro Oledb
No ratings yet
Micro Oledb
22 pages
Data Mining Query Language
0% (1)
Data Mining Query Language
7 pages
Micro Oledb
No ratings yet
Micro Oledb
22 pages
Detailed Notes for Semester Examinations_ Advanced
No ratings yet
Detailed Notes for Semester Examinations_ Advanced
7 pages
Appendix A
No ratings yet
Appendix A
21 pages
Unit III: Concept Description: Characterization and Comparison
No ratings yet
Unit III: Concept Description: Characterization and Comparison
53 pages
Study_of_machine_learning_algorithms_for_special_disease_prediction_using_principal_of_component_ana
No ratings yet
Study_of_machine_learning_algorithms_for_special_disease_prediction_using_principal_of_component_ana
6 pages
WK 6 Nearest Neighbor Classifier and Bayesian Classifier 12-05-2021
No ratings yet
WK 6 Nearest Neighbor Classifier and Bayesian Classifier 12-05-2021
23 pages
Data Mining: Concepts and Techniques: April 30, 2012
No ratings yet
Data Mining: Concepts and Techniques: April 30, 2012
64 pages
DM - MOD - 1 Part II
No ratings yet
DM - MOD - 1 Part II
14 pages
Data Stream Management
No ratings yet
Data Stream Management
46 pages
III Yr B.Tech. - Computer Science & Engineering/Information Technology Data Mining
No ratings yet
III Yr B.Tech. - Computer Science & Engineering/Information Technology Data Mining
2 pages
4.0 - Lession 6- BI mining
No ratings yet
4.0 - Lession 6- BI mining
77 pages
DSTBD_10-DMClassification-ENG
No ratings yet
DSTBD_10-DMClassification-ENG
160 pages
UNIT-4 DMDW
No ratings yet
UNIT-4 DMDW
8 pages
Data Mining 101
No ratings yet
Data Mining 101
50 pages
Handling Continuous Attributes: Different Kinds of Rules
No ratings yet
Handling Continuous Attributes: Different Kinds of Rules
33 pages
Lec 1
No ratings yet
Lec 1
48 pages
SQL Server 2008 For Business Intelligence: UTS Short Course
No ratings yet
SQL Server 2008 For Business Intelligence: UTS Short Course
43 pages
United States Patent: Muras Et Al. (10) Patent N0.: (45) Date of Patent
No ratings yet
United States Patent: Muras Et Al. (10) Patent N0.: (45) Date of Patent
11 pages
CH02 Data Mining A Closer Look
No ratings yet
CH02 Data Mining A Closer Look
34 pages
Data Mining Introductiondifferent
No ratings yet
Data Mining Introductiondifferent
83 pages
Data Mining-2-1
No ratings yet
Data Mining-2-1
12 pages
Chapter 9
No ratings yet
Chapter 9
5 pages
Review
No ratings yet
Review
18 pages
Chapter 5
No ratings yet
Chapter 5
45 pages
Data Mining Classification: Alternative Techniques: Lecture Notes For Chapter 5 Introduction To Data Mining
No ratings yet
Data Mining Classification: Alternative Techniques: Lecture Notes For Chapter 5 Introduction To Data Mining
44 pages
Lecture Notes For Chapter 1: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 1: by Tan, Steinbach, Kumar
34 pages
An Application of Apriori Algorithm
No ratings yet
An Application of Apriori Algorithm
7 pages
15-optimization (1)
No ratings yet
15-optimization (1)
8 pages
CIS527: Data Warehousing, Filtering, and Mining: Fall 2004, CIS, Temple University
No ratings yet
CIS527: Data Warehousing, Filtering, and Mining: Fall 2004, CIS, Temple University
50 pages
Data Mining Primitives, Languages and System Architecture
No ratings yet
Data Mining Primitives, Languages and System Architecture
26 pages
Practical File
No ratings yet
Practical File
17 pages
Data Mining and Its Application and Usage in Medicine: by Radhika
No ratings yet
Data Mining and Its Application and Usage in Medicine: by Radhika
63 pages
comp 414 revision
No ratings yet
comp 414 revision
9 pages
⇶Data Mining--2
No ratings yet
⇶Data Mining--2
16 pages
LNAI 2682 Towards a Logic Query Language for Data Mining 1st Edition by Fosca Giannotti, Giuseppe Manco, Franco Turini ISBN 9783540224792 354022479Xpdf download
100% (3)
LNAI 2682 Towards a Logic Query Language for Data Mining 1st Edition by Fosca Giannotti, Giuseppe Manco, Franco Turini ISBN 9783540224792 354022479Xpdf download
53 pages
Chap5 Alternative Classifi1
No ratings yet
Chap5 Alternative Classifi1
67 pages
Lecture Notes For Chapter 5 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 5 Introduction To Data Mining: by Tan, Steinbach, Kumar
88 pages
A Data Pre Processing
No ratings yet
A Data Pre Processing
7 pages
SQL Interview Success From Beginner To Pro
From Everand
SQL Interview Success From Beginner To Pro
Shana
No ratings yet
DBMS Lab Manual
From Everand
DBMS Lab Manual
Jitendra Patel
1.5/5 (3)
Introduction & Definition of Investigation (India)
No ratings yet
Introduction & Definition of Investigation (India)
2 pages
Union and Intersection Problem in AI
No ratings yet
Union and Intersection Problem in AI
1 page
Table of Content Salient Features
No ratings yet
Table of Content Salient Features
1 page
Monkey Banana Problem in AI1
No ratings yet
Monkey Banana Problem in AI1
1 page
Introduction To Parallel Processing
No ratings yet
Introduction To Parallel Processing
23 pages
Artificial Intelligence Lab Manual
100% (1)
Artificial Intelligence Lab Manual
13 pages
Simple Satellite Network Simulation Using OMNET++ 5
No ratings yet
Simple Satellite Network Simulation Using OMNET++ 5
4 pages
Cryptograhpy: Crypptography:-It Is An Art To Make The Documents, Information and Data More Secure While It
No ratings yet
Cryptograhpy: Crypptography:-It Is An Art To Make The Documents, Information and Data More Secure While It
2 pages
Network Programming Lab: Submit by Guided by
No ratings yet
Network Programming Lab: Submit by Guided by
28 pages
MATLAB Programs, For 7th Sem CSE Students - Tushar Kant
No ratings yet
MATLAB Programs, For 7th Sem CSE Students - Tushar Kant
4 pages
A Project Report ON: Student Fees Management System
No ratings yet
A Project Report ON: Student Fees Management System
19 pages
03 - Process Flow Diagram
100% (2)
03 - Process Flow Diagram
18 pages
Formula C1 Wordlist
No ratings yet
Formula C1 Wordlist
114 pages
Section Plane
No ratings yet
Section Plane
38 pages
Codo Occ 15KV 200a
No ratings yet
Codo Occ 15KV 200a
4 pages
A Kidnapped Santa Claus
100% (1)
A Kidnapped Santa Claus
10 pages
Xpediter
No ratings yet
Xpediter
24 pages
12725/siddaganga Exp Second Sitting (2S)
No ratings yet
12725/siddaganga Exp Second Sitting (2S)
3 pages
91 BOOKS LIST ACC NUMBERWISE
No ratings yet
91 BOOKS LIST ACC NUMBERWISE
27 pages
Natural Gas (Methane) : Safety Data Sheet
No ratings yet
Natural Gas (Methane) : Safety Data Sheet
19 pages
A Study On Private Equity in India
No ratings yet
A Study On Private Equity in India
21 pages
Critical Appreciation On My Mistress Eye
No ratings yet
Critical Appreciation On My Mistress Eye
2 pages
ChatGPT_MyLearning on Grammar and Parsing Algorithm in Compiler Construction
No ratings yet
ChatGPT_MyLearning on Grammar and Parsing Algorithm in Compiler Construction
119 pages
1st Sem English - Question Bank With Study Material
No ratings yet
1st Sem English - Question Bank With Study Material
40 pages
Closing The Efficiency Gap Between Synchronous and Network-Agnostic Consensus
No ratings yet
Closing The Efficiency Gap Between Synchronous and Network-Agnostic Consensus
60 pages
RCCD, RCB, Elcb Circiut Breaker
No ratings yet
RCCD, RCB, Elcb Circiut Breaker
11 pages
DOJ's Motion To Stay - 10.28.19
No ratings yet
DOJ's Motion To Stay - 10.28.19
9 pages
Peerj Cs 2310
No ratings yet
Peerj Cs 2310
55 pages
Ansaldi Yveline Urinary Tract Infections in Pregnancy
No ratings yet
Ansaldi Yveline Urinary Tract Infections in Pregnancy
27 pages
Over Head Lines
No ratings yet
Over Head Lines
25 pages
Phrasal Verbs
No ratings yet
Phrasal Verbs
26 pages
Engineering Mathematics Syllabus 1st Year
No ratings yet
Engineering Mathematics Syllabus 1st Year
3 pages
NAPKIN FOLDING
No ratings yet
NAPKIN FOLDING
15 pages
Philippine Nursing Licensure Examination
100% (1)
Philippine Nursing Licensure Examination
32 pages
Tel-Air Eng PDF
No ratings yet
Tel-Air Eng PDF
4 pages
Perception Student Booklet
No ratings yet
Perception Student Booklet
17 pages
Lecture 06 - Inventories
No ratings yet
Lecture 06 - Inventories
41 pages

Data Mining Query Languages

Uploaded by

Data Mining Query Languages

Uploaded by

Data Mining Query

 Introduce the problem of querying

and their contributions

ideal solution support?

 Commands Specify the following:

Find association rules [as <rule_name>]

generalize data [into <relation_name>]

 DMQL provides a display in

aid in the presentation of these results

 Focus on Association Rules

selectively generate rules, and

and techniques for optimizing some

For each rule produced by the outer, evaluate the

For each rule produced by the inner, evaluate the

 Using the mining model to predict

Identifies the source columns for the training

The INSERT represents using a tuple for

Prediction join connects the model and an actual data

 Important to have an API for creating

makes sense to do the data mining

SQL extension seems logical

You might also like