We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24
UNIT-3
General Approach to Classification
“How does classification work?” Data classification is a two-step process, • learning step (where a classification model is constructed) • classification step (where the model is used to predict class labels for given data). The process is shown for the loan application data of Figure The data classification process: • (a) Learning: Training data are analyzed by a classification algorithm. Here, the class label attribute is loan decision, and the learned model or classifier is represented in the form of classification rules.
• (b) Classification: Test data are used to estimate
the accuracy of the classification rules. If the accuracy is considered acceptable, the rules can be applied to the classification of new data tuples. • supervised learning: the class label of each training tuple is provided.
• unsupervised learning: the class label of each
training tuple is not known, and the number or set of classes to be learned may not be known in advance. we could use clustering to try to determine “groups of like tuples,”. Decision Tree Induction • A decision tree is a flowchart-like tree structure, where each internal node (nonleaf node) denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node (or terminal node) holds a class label. The topmost node in a tree is the root node. • A decision tree for the concept buys computer, indicating whether an AllElectronics customer is likely to purchase a computer. Each internal (nonleaf) node represents a test on an attribute. Each leaf node represents a class (either buys computer = yes or buys computer = no). Attribute Selection Measures • Information Gain: ID3 uses information gain as its attribute selection measure The expected information needed to classify a tuple in D is given by • How much more information would we still need (after the partitioning) to arrive at an exact classification? This amount is measured by • Information gain is defined as the difference between the original information requirement (i.e., based on just the proportion of classes) and the new requirement (i.e., obtained after partitioning on A). That is Step-by-Step Decision Tree Induction on the "Buy Computer" Example • The "Buy Computer" dataset is given as: Step 1: Entropy of the Entire Dataset S • Formula for Entropy: In the dataset: There are 10 instances. • 5 instances are "Yes" (i.e., customers buy a computer). • 5 instances are "No" (i.e., customers do not buy a computer). • The probability for each class is: Using these values in the entropy formula: Step 2: Information Gain for Each Attribute
• Formula for Information Gain:
Step 3: Information Gain for Attribute: Age
• The possible values for Age are "Young," "Middle-
aged," and "Senior." • We will: – Split the dataset into three subsets based on the values of Age. – Calculate the entropy for each subset. – Use the information gain formula to calculate the gain. • Subset 1: Age = Young • Subset of instances where Age = "Young": • 4 instances are "No". • 1 instance is "Yes". • The entropy of this subset is: • Subset 2: Age = Middle-aged Subset of instances where Age = "Middle-aged":
• 2 instances are "Yes".
• 0 instances are "No" (this subset is pure). The entropy of this subset is: • Subset 3: Age = Senior Subset of instances where Age = "Senior":
• 3 instances are "Yes".
• 1 instance is "No". The entropy of this subset is: Step 4: Calculate Weighted Average Entropy for Age • Now, we calculate the weighted average entropy for the attribute Age. The formula is: Step 5: Information Gain for Age • Finally, we calculate the information gain for the attribute Age: