Mining Class Comparisions and Mining Descriptive Statistical Measures
Mining Class Comparisions and Mining Descriptive Statistical Measures
Neha Sharma
ME(I.T),3rd Sem
Roll.NO-463
MINING CLASS COMPARISONS:DISCRIMINATING BETWEEN
DIFFERENT CLASSES
use Big_University_DB
mine comparison as “grad_vs_undergrad_students”
in relevance to name, gender, major, birth_place, birth_date, residence,
phone#, gpa
for “graduate_students”
where status in “graduate”
versus “undergraduate_students”
where status in “undergraduate”
analyze count%
from student
Name Gender Major Birth-Place Birth_date Residence Phone # GPA
Jim M CS Vancouver,BC, 8-12-76 3511 Main St., 687-4598 3.67
Woodman Canada Richmond
Scott M CS Montreal, Que, 28-7-75 345 1st Ave., 253-9106 3.70
Lachance Canada Richmond
Laura Lee F Physics Seattle, WA, USA 25-8-70 125 Austin Ave., 420-5232 3.83
… … … … … Burnaby … …
…
count(q
i 1
a Ci )
Count distribution between graduate and undergraduate students for a generalized tuple
X , graduate_ student( X )
birth_ country( X ) "Canada"age_ range( X ) "25 30"gpa( X ) " good" [d : 30%]
– where 90/(90+210) = 30%
– sufficient
• Quantitative description rule
X, target_cla ss(X)
condition 1(X) [t : w1, d : w 1] ... condition n(X) [t : wn, d : w n]
– necessary and sufficient
12/7/21 12
Example: Quantitative Description Rule
Location/item TV Computer Both_items
Both_ 200 20% 100% 800 80% 100% 1000 100% 100%
regions
Crosstab showing associated t-weight, d-weight values and total number (in thousands) of TVs and
computers sold at AllElectronics in 1998
X, Europe(X)
(item(X) " TV" ) [t : 25%, d : 40%] (item(X) " computer" ) [t : 75%, d : 30%]
wi
• Median: A holistic measure i 1
• Variance
1 n 1 1 2
2
s i ( x x ) 2
i
x 2
x
i
n 1 i 1 n 1 n