0% found this document useful (0 votes)
23 views

UN Data minig

Uploaded by

reyya1243
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

UN Data minig

Uploaded by

reyya1243
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24

UNIT-3

General Approach to Classification


“How does classification work?” Data
classification is a two-step process,
• learning step (where a classification model is
constructed)
• classification step (where the model is used to
predict class labels for given data).
The process is shown for the loan application
data of Figure
The data classification process:
• (a) Learning: Training data are analyzed by a
classification algorithm. Here, the class label
attribute is loan decision, and the learned
model or classifier is represented in the form of
classification rules.

• (b) Classification: Test data are used to estimate


the accuracy of the classification rules. If the
accuracy is considered acceptable, the rules can
be applied to the classification of new data
tuples.
• supervised learning: the class label of each
training tuple is provided.

• unsupervised learning: the class label of each


training tuple is not known, and the number
or set of classes to be learned may not be
known in advance. we could use clustering to
try to determine “groups of like tuples,”.
Decision Tree Induction
• A decision tree is a flowchart-like tree
structure, where each internal node (nonleaf
node) denotes a test on an attribute, each
branch represents an outcome of the test, and
each leaf node (or terminal node) holds a class
label. The topmost node in a tree is the root
node.
• A decision tree for the concept buys computer,
indicating whether an AllElectronics customer
is likely to purchase a computer. Each internal
(nonleaf) node represents a test on an
attribute. Each leaf node represents a class
(either buys computer = yes or buys computer
= no).
Attribute Selection Measures
• Information Gain:
ID3 uses information gain as its attribute
selection measure
The expected information needed to classify a
tuple in D is given by
• How much more information would we still
need (after the partitioning) to arrive at an
exact classification? This amount is measured
by
• Information gain is defined as the difference
between the original information requirement
(i.e., based on just the proportion of classes)
and the new requirement (i.e., obtained after
partitioning on A). That is
Step-by-Step Decision Tree Induction on the
"Buy Computer" Example
• The "Buy Computer" dataset is given as:
Step 1: Entropy of the Entire Dataset S
• Formula for Entropy:
In the dataset:
There are 10 instances.
• 5 instances are "Yes" (i.e., customers buy a
computer).
• 5 instances are "No" (i.e., customers do not
buy a computer).
• The probability for each class is:
Using these values in the entropy formula:
Step 2: Information Gain for Each Attribute

• Formula for Information Gain:


Step 3: Information Gain for Attribute: Age

• The possible values for Age are "Young," "Middle-


aged," and "Senior."
• We will:
– Split the dataset into three subsets based on the values
of Age.
– Calculate the entropy for each subset.
– Use the information gain formula to calculate the gain.
• Subset 1: Age = Young
• Subset of instances where Age = "Young":
• 4 instances are "No".
• 1 instance is "Yes".
• The entropy of this subset is:
• Subset 2: Age = Middle-aged
Subset of instances where Age = "Middle-aged":

• 2 instances are "Yes".


• 0 instances are "No" (this subset is pure).
The entropy of this subset is:
• Subset 3: Age = Senior
Subset of instances where Age = "Senior":

• 3 instances are "Yes".


• 1 instance is "No".
The entropy of this subset is:
Step 4: Calculate Weighted Average Entropy
for Age
• Now, we calculate the weighted average
entropy for the attribute Age. The formula is:
Step 5: Information Gain for Age
• Finally, we calculate the information gain for
the attribute Age:

You might also like