0% found this document useful (0 votes)
42 views9 pages

Zatona

The document discusses data mining and the knowledge discovery process. It includes questions about data mining and its purpose, the steps in the knowledge discovery process, the differences between classification and clustering with examples, decision trees including information gain calculations, and single linkage clustering.

Uploaded by

Ahmed Yousry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views9 pages

Zatona

The document discusses data mining and the knowledge discovery process. It includes questions about data mining and its purpose, the steps in the knowledge discovery process, the differences between classification and clustering with examples, decision trees including information gain calculations, and single linkage clustering.

Uploaded by

Ahmed Yousry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Data mining revision :











Q1 : What’s data mining and why data mining ?


Q2 : What’s The Knowledge Discovery Process:

An Outline of the Steps of the KDD Process


Q3 : difference between classification and clustering with example

Examples
Classification Task
Q4 : decision tree

age Blood hemoglobin

<=30 31…40 >40 High Medium Low

No yes No yes No yes No yes No yes No yes


2 3 4 0 3 2 3 1 4 2 2 2

Info age(D) = 2+3 / 14 ((-3/5)log2(3/5)) - ((2/5)log2(2/5)) Info blood hem(D) = 4/14((-3/4) log2(3/4))- ((1/4) log2(1/4))

4+0 / 14 ((-0/4)log2(0/4)) - ((4/4)log2(4/4)) 6/14((-4/6) log2(4/6))- ((2/6) log2(2/6))

3+2 / 14 ((-2/5)log2(2/5)) - ((3/5)log2(3/5)) = 0.694 bits 4/14((-2/4) log2(2/4))- ((2/4) log2(2/4)) = 0.91104 bits

Gender CBCR

M F Fair Excellent

No yes No yes No yes No yes


3 4 6 1 1
3 2 2
Info gain student(D) = 7/14((-4/7)log 2(4/7)) - ((3/7)log 2(3/7)) Info gain credit_rating (D) == 8/14((-6/8)log 2(6/8)) - ((2/8)log 2(2/8))

+ 7/14 ((-6/7)log 2(6/7))- ((1/7)log 2(1/7)) = 0.7884 bits. + 6/14 ((-3/6)log 2(3/6))- ((3/6)log 2(3/6)) = 0.892 bits.
𝒗
|𝑫𝒋 | |𝑫𝒋 |
𝑺𝒑𝒍𝒊𝒕𝑰𝒏𝒇𝒐𝑨 (𝑫) = − ෍ × 𝒍𝒐𝒈𝟐 ( )
|𝑫| |𝑫|
𝒋=𝟏

SplitInfo blood hem

SplitInfo student

SplitInfo blood hem

Age Blood h Gender CBCR

Gain 0.246 0.029 0.151 0.048

Split Info 1.5774 1.577 1 0.9852

0.246 1.5774 =

0.029 / 1.577 =

0.151 / 1 =

0.048 0.9852
age

<=30 31…40 >40

?? Yes ??

Age Blood h Gender CBCR

Gain <= 30 0.057 0.97 0.02

Split Info 0.993 0.97 0.97

GainRatio 0.0574 1 0.0206

Age

<=30 31…40 >40

Gender Yes ??

male fmale

yes no
Age

<=30 31…40 >40

gender Yes CBCR

male Fmale excellent fair

yes No No
Yes
Q5 : Single linkage

295
268
255
219
138

You might also like