Data Mining Query Languages
Data Mining Query Languages
Languages
Kristen LeFevre
April 19, 2004
With Thanks to Zheng Huang and Lei Chen
Outline
others
DMQL
use database Hospital
find association rules as Heart_Health
related to Salary, Age, Smoker,
Heart_Disease
from Patient_Financial f, Patient_Medical m
where f.ID = m.ID and m.age >= 18
with support threshold = .05
with confidence threshold = .7
DMQL
Retrieve all rules with descriptors of the form “Age = x” in the body,
except when there is a rule with equal or greater support and
confidence with a rule containing a superset of the descriptors in
the body
MSQL
GetRules(C) R1
where <pruning-conds>
correlated and not exists ( GetRules(C) R2
where <same pruning-conds>
and R2.Body HAS R1.Body)
GetRules(C) R1
where <pruning-conds>
and consequent is {(X=*)}
stratified and consequent in (SelectRules(R2)
where consequent is {(X=*)}
MSQL
Nested Get-Rules Queries and their
optimization
Stratified(non-corrolated) queries are
evaluated “bottom-up.” The subquery is
evaluated first, and replaced with its results
in the outer query.
Correlated queries are evaluated either top-
down or bottom-up (like “loop-unfolding”),
and there are rules for choosing between the
two options
MSQL
GetRules(Patients)
where Body has {Age = *}
and Support > .05 and Confidence > .7
and not exists ( GetRules(Patients)
Support > .05 and
Confidence > .7
and R2.Body HAS R1.Body)
MSQL
Top-Down Evaluation
GetRules(Patients)
where Body has {Age = *}
and Support > .05 and Confidence > .7
Bottom-Up Evaluation
not exists ( GetRules(Patients)
Support > .05 and Confidence > .7
and R2.Body HAS R1.Body)