0% found this document useful (1 vote)
439 views

Data Mining Query Language

The document discusses the design of a Data Mining Query Language (DMQL) that allows users to mine different types of knowledge from relational databases and data warehouses. DMQL adopts an SQL-like syntax and can be integrated with relational query languages. The document outlines the syntax of DMQL, including how to specify data, the types of knowledge that can be mined (characterization, discrimination, association, classification), concept hierarchy definitions, pattern visualization, and provides an example DMQL query. It also briefly discusses other data mining languages and standardization efforts.

Uploaded by

Mukesh
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
439 views

Data Mining Query Language

The document discusses the design of a Data Mining Query Language (DMQL) that allows users to mine different types of knowledge from relational databases and data warehouses. DMQL adopts an SQL-like syntax and can be integrated with relational query languages. The document outlines the syntax of DMQL, including how to specify data, the types of knowledge that can be mined (characterization, discrimination, association, classification), concept hierarchy definitions, pattern visualization, and provides an example DMQL query. It also briefly discusses other data mining languages and standardization efforts.

Uploaded by

Mukesh
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

h.

l;u't [-',

DATA MINING QUERY LANGUAGES


DMOL-A Oata tvtinine Q

'+

Data mining language must be designed to facilitate flexible and effective knowledge discovery.

+ 4 *
'S
',&

Having a query language for data mining may help standardize the development of

platforms for data mining systems. gut designed a language is challenging because data mining covers a wide spectrum of
tasks and each task has different requirement. Hence, the design of a language requires deep understanding of the limitations and

underlying mechanism of the various kinds of tasks.


So...how would you design an efficient query language???
Based on the primitives discussed earlier.

+ +
,t.

DMQL allows mining of different kinds of knowledge from relational databases and data warehouses at multiple levels of abstraction

Adopts SQL-like syntax


languages

,'*. Hence, can be easily integrated with relational query


Defined in BNF grammar

o o

[ ] represents 0 or one occurrence

{ } represents 0 or more occurrences

.,$ Words in sans serif represent keywords


A DMQL can provide the ability to support ad-hoc and interactive data mining
By providing a standardized language like SQL

' .

Hope to achieve a similar effect like that SQL has on relational database

Foundation for system development and evolution

2
. I
Design

Facilitate information exchange, technology transfer, commercialization


and wide acceptance

D
.4x Syntax

DMQL is designed with the primitives described as follows:

'* *
'l*

Syntax

for DMQL for specification oftask-relevont dota


hi

the kind of knowledge to be mined


con

cept

erarchy specification

'&. pottern presentotion and visualizotion * Putting it all together - o DMQL query

Syntax of DMQL

,/ ./ ./

(DMQL) ;;= (pMQL-Stotement);{(DMQL-Statement)

(DMQL_Stotement) ;;= (pota_Mining_Stotement)

| (Concept_Hierorchy_Definition-Statement)

(V is ua I i zoti

n-o

d-P

re se

ntati o n )
use

Doto_Mining_Stotement)

database(dotabase_nome) | use data warehouse (doto_worehouse_name) {use hierorchy (hierorchy_nome) for (attribute_or-dimension)}

::=

(Mine-Knowledge-Specification)
attri b ute-o r-d i me n si o n-l ist) from ( re I oti o n (s) /c u be ( s ) ) [where (condition)] [order by (order_list) [group by
(

in relevance to

(grouping-list)] [hoving

(condition)]

{with [(interest_meosure_nome)] threshold = (threshold_volue) ffor (attribute(s))l]


Mine_Knowtedge_Specificotion) ;;= (Mine-Char)

./ ./ ,/

| (Mine-Desc) | (Mine-Assoc) | (Mine-Closs)

(Mine_Char) ::= mine characteristics [as (pattern_nome)] analyze (meosure(s))


(Mine_Desc) ::= mine comparison [as

(pattern-name)] lor (target-closs)where

(torget_condition)
analyze (meosure(s))

{versus (contrast-closs_i) where (contrast-condition-i)l

,/ ./

Mine-Assoc) Mine_Closs)

::= mine ossociation [as (pottern-name)] [motching (metopottern)]

::= mine classification [as (pottern-name)] analyze i me n s i o n ) ( cl a ssify i n g-ott ri b ute -or-d

7,

,/

(Concept_Hierorchy_Definition-statemeittl

::=

(attribute_or_dimension)] as (hierarchy_description)
[for

define hierorchy (hierorchy-nonte) on (relotion_or_cube_or_hierarchy)

[where (condition)]

./ ./

(Visuolization_and_Presentotion) ::= display as (resultJorm)

| {(Multilevel_Manipulation)}

(Multilevel_Monipulation)

::= I drill down on (ottribute_or_dimension)


I d rop ( att ri b ute_o
r_d i m e nsi o n )

roll up on (ottribute_or_dimension) | odd (attribute_or_dimension)

DMQL-Svntax for task-relevant data specification

. . . . . . .

Nomes of the relevont database or doto warehouse, conditions ond relevant attributes or

dimensions must be specified


use ddtabase <dotabase_nome) or use dota worehouse <data_worehouse_name)

from <relation(s)/cube(s)t [where condition] inrelevdnceto<attribute or dimension listt


order by torder_list> group by <grouping_list> hoving <conditiont

Svntax for specifvine the kind of knowledee to be mined

Characterization

Mine_Knowledge_Specification
m i ne ch a ro

::=

cteri sti cs [ospattern-na me] anolyze measure{s)

o o o

Specifies that characteristic descriptions are to be mined

Analyze specifies aggregate measures


Example: mine characteristics as customerPurchasing analyze count%

4.

Discrimination
M
in

e-Kn ow

ed

ge-S Pe cifi coti o n : :=

mine comporison [as pattern-name] for target-class where target-condition {versus contrast-class-i where confidst-condition-i} analYze measure(s)

''' ' .

o given target closs of obiects Specifies thot discriminant descriptions ore to be mined, compore with one or more contrasting c/osses (thus referred to os comparison)

Andlyze specifies oggregote meosures avg(t.price) >= 5L00 Example: mine comporison as purchose Groups for big Spenders where versus budget Spenders where avg(l'price) < 5100 onalyze count

/
o

Association
Mine-Knowledge-specification ::= mine associations [as pattern-namel

r o o o

[matching(metaPattern)]

Specifies the mining of patterns of association

can provide templates (metapattern) with the matching clause


W) and Q(X, Y; =2 Example: mine associations as buyingHabits matching P(X: customer,
buys (X,Z)

/
o

Classification
Mine-Knowledge-specification ::=
m
o
i

ne

cl

o ssifi cqti o n Iospatte rn-na me]

no lyzeclassifyi ng-attri bute-or-di me nsion

. . .

Specifies that patterns for data classification are

to be mined

to the values Analyze clause specifies that classification is performed according of


(cl assifyi

ng-attri bute-or-d

me nsion)

a class (such as For categorical attributes or dimensions, each value represents low-risk, medium risk, high risk)

5
I '
For numeric attributes, each class defined by a range (such as 20-39,40-59, 6089 for age) Example: mine classifications as classifyCustomerCreditRating analyze credit

rating

To specifv what concept hierarchies


use
h ie ra

to use

rchy <hierarchy> for <attribute_or_dimension>

We use different syntax to define different type of hierarchies

o o

schema hierarchies

define hierarchy time_hierarchy on date as [date, monthquarter, year]

set-groupinghierarchies

define hierarchy age-hierarchy for age on customer

as

. o o o o

levell: {young, middle_aged, seniorl < level0: all level2: {2O, ...,39} < levelli young level2: {4O, ...,59} < levell: middle_aged level2: {60, ..., 89} < levell: senior

operation-derived hierarchies
as

Definehierarchyage_hierarchy for age on customer

{age_category (1), ...,age_category(5)} := cluster(default, age, 5) <all(age)

o
Def
i

rule-basedhierarchies
h i e ra rc

hyprof it_ma rgin_h iera rchyo

item

o o o

level_l: low_profit_margin< level_O: all o if (price - cost)< $50 level_l: medium-profit_margin<level_0: all o if ((price - cost) > $SO1 and ((price - cost) <= $250)) level_l: high_profit_margin< level_0: all o if (price - cost) > $250

Syntax for pattern oresentation and visualization specification

We have syntax which allows users to specify the display of discovered patterns in one or more forms

6,
display as <result_form>

ResultJorm = Rules, tables, crosstabs, pie or bar charts, decision trees, cubes, cunres, or surfaces
To
M
u

facilitate interactive viewing at different concept level, the following syntax is defined:
lti level_Ma
n
i

pu lati

on'.'.=

rol I u p o nallribute-or_d ime nsion

I d ri I I dow n onattribute_or_dimension I dropattri


b

addattribute_or-dimension

ute_o r_d i me

nsi o n

used ata ba seAll Electronics_d b

usehiera rchylocation_hierarchy

for B.address

mine cha racteristics ascustomerPurchasing analyze count% in relevance to C.age,l.type, l.place-made

from customer C, item l, purchases P, items-sold

S,

works-at W, branch

wherel.item_lD = S.item-lD and S.trans-lD = P.trans-lD andP.cust-lD = C.cust-lD and P.method-paid = "AmEx"
andP.empl_lD = W.empl_lD and W.branch-lD = B.branch-lD and B.address = "Canada" and

l.prico= 100
with noise threshold
displayas table
= 0.05

/
.'*

Other Data Minine Laneuaees & Standardization Efforts

Association rule language specifications

o o

MSQL (lmielinski& Virmani'99)

MineRule (MeoPsaila and Ceri'96)

7
o *
OLEDB

Query flocks based on Datalog slntax (Tsur et al'98)

for DM (Microsoft'2000)
Based on OLE, OLE DB, OLE DB for OLAP

o o + o o + +

lntegrating DBMS, data warehouse and data mining

CRISP-DM (CRoss-lndustry Standard Process

for Data Mining)

Providing a platform and process structure for effective data mining


Emphasizing on deploying data mining technology to solve business problems

Other Data Mining Languages & Standardization Efforts


Association rule language specifications

o o o
"a!
OTEDB

MSQL (lmielinski& Virmani'99)

MineRule (MeoPsaila and Ceri'96) Query flocks based on Datalog syntax (Tsur et al'98)

for DM {Microsoft'20OO} and recently DMX (Microsoft SQ[server 2005)


Based on OLE, OLE DB, OLE DB for OLAP, C#

o o + o

lntegrating DBMS, data warehouse and data mining

DMMI (Data Mining Mark-up Language) by DMG (www.dmg.org)


Providing a platform and process structure for effective data mining

Hierarchy Specification
A hierarchy is a root member of an alternate hierarchy, which is always at generation2 of a dimension. Member value expressions are not allowed as hierarchy arguments.
Alternate hierarchies are applicable to aggregate storage databases only.
The dimension of the hierarchy argument passed to a function must match the dimension of the other arguments passed to the function. If they do not match, an error is return and the query is
aborted.

urN++7

You might also like