We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10
Entropy
• A decision tree is built top-down from a root node and
involves partitioning the data into subsets that contain instances with similar values (homogenous). • ID3 algorithm uses entropy to calculate the homogeneity of a sample. If the sample is completely homogeneous the entropy is zero and if the sample is an equally divided it has entropy of one. Entropy- example To build a decision tree, we need to calculate two types of entropy using frequency tables- Entropy using the frequency table of one attribute Entropy using the frequency table of two attributes Information Gain • The information gain is based on the decrease in entropy after a dataset is split on an attribute. Constructing a decision tree is all about finding attribute that returns the highest information gain (i.e., the most homogeneous branches).
• Step 1: Calculate entropy of the target.
Information Gain- example Step 2: The dataset is then split on the different attributes. The entropy for each branch is calculated. Then it is added proportionally, to get total entropy for the split. The resulting entropy is subtracted from the entropy before the split. The result is the Information Gain, or decrease in entropy. Information Gain- example Step 3: Choose attribute with the largest information gain as the decision node, divide the dataset by its branches and repeat the same process on every branch. Information Gain- example • Step 4a: A branch with entropy of 0 is a leaf node. Information Gain- example Step 4b: A branch with entropy more than 0 needs further splitting. Q.1 Define Entropy?