0% found this document useful (0 votes)
71 views

Department of Electronics & Telecommunications Engineering: ETEL71A-Machine Learning and AI

The document describes implementing the ID3 decision tree algorithm to classify a dataset on social media purchases, including preprocessing the data, fitting a decision tree classifier to the training data, using the trained model to predict test data labels, and evaluating the model's accuracy. Code is provided to split data, scale features, train a decision tree, predict test labels, and calculate accuracy and a confusion matrix. The results show high classification accuracy from the decision tree model on this dataset.

Uploaded by

Shrey Dixit
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views

Department of Electronics & Telecommunications Engineering: ETEL71A-Machine Learning and AI

The document describes implementing the ID3 decision tree algorithm to classify a dataset on social media purchases, including preprocessing the data, fitting a decision tree classifier to the training data, using the trained model to predict test data labels, and evaluating the model's accuracy. Code is provided to split data, scale features, train a decision tree, predict test labels, and calculate accuracy and a confusion matrix. The results show high classification accuracy from the decision tree model on this dataset.

Uploaded by

Shrey Dixit
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Department of Electronics & Telecommunications

Engineering
ETEL71A-Machine Learning and AI
Class: BE
Name : Adya Kastwar
UID: 2016120024
Sem: VII
Experiment: Decision Tree (ID3) algorithm

Objective: Write Python program to demonstrate the working of the decision tree based ID3
algorithm by using appropriate data set for building the decision tree and apply this
knowledge to classify a new sample.
Outcomes:
1. Find entropy of data and follow steps of the algorithm to construct a tree.
2. Representation of hypothesis using decision tree.
3. Apply Decision Tree algorithm to classify the given data.
4. Interpret the output of Decision Tree.

System Requirements:
Linux OS with Python and libraries or R or windows with MATLAB

Task 1: Describe the ID3 algorithm:


a. Note down different Decision Tree algorithms and understand the steps of ID3 algorithm.
b. Solve the algorithm and form a hypothesis in note book for the following ‘Family Dogs
characteristics’ training set. Verify the ‘Characteristic’ for the testing set and say whether the
target concept has been learnt successfully.
Decision trees are supervised learning algorithms used for both, classification and regression tasks.
Decision trees are assigned to the information based learning algorithms which use different
measures of information gain for learning.
The main idea of decision trees is to find those descriptive features which contain the most
"information" regarding the target feature and then split the dataset along the values of these
features such that the target feature values for the resulting subdatasets are as pure as possible. The
descriptive feature which leaves the target feature most purely is said to be the most informative
one. This process of finding the "most informative" feature is done until we accomplish a stopping
criteria where we then finally end up in so called leaf nodes.
Assumptions we make while using Decision tree :
 At the beginning, we consider the whole training set as the root.
 Attributes are assumed to be categorical for information gain and for gini index, attributes
are assumed to be continuous.
 On the basis of attribute values records are distributed recursively.
 We use statistical methods for ordering attributes as root or internal node.
Pseudocode :
1. Find the best attribute and place it on the root node of the tree.
2. Now, split the training set of the dataset into subsets. While making the subset make
sure that each subset of training dataset should have the same value for an
attribute.
3. Find leaf nodes in all branches by repeating 1 and 2 on each subset.
Entropy is the measure of uncertainty of a random variable, it characterizes the impurity of an
arbitrary collection of examples. The higher the entropy the more the information content.
 The entropy typically changes when we use a node in a decision tree to partition the training
instances into smaller subsets. Information gain is a measure of this change in entropy.
 Sklearn supports “entropy” criteria for Information Gain and if we want to use Information
Gain method in sklearn then we have to mention it explicitly.

Task 2: Implement the algorithm in python/R/Matlab to classify the dataset.

Dataset: online database on purchasrs made based on social media ads


Code:
#Decision Tree Classification
#Adya kastwar
#Importing libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

#Importing the dataset


dataset = pd.read_csv('Social_Network_Ads.csv')
#Creating seperate matrices for features and outputs
X = dataset.iloc[:80, [2, 3]].values
y = dataset.iloc[:80, 4].values

#Splitting the dataset into the training and test set


from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

#Feature scaling so as to bring values of attributes to the same scale to avoid errors
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

#Fitting decision tree classification to the train set


from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier(criterion = 'entropy', random_state = 0)
classifier.fit(X_train, y_train)

#Using trained model to predict test set results


y_pred = classifier.predict(X_test)

#results as accuracy and confusion matrix


from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
print("The confusion matrix is : ", cm)
print("The prediction accuracy is: ",classifier.score(X_test,y_test)*100,"%")

Output :
Conclusion:
 High accuracy of classification was obtained using decision tree based on entropy.
 As the number of examples increased from 80 to 400, the prediction accuracy increased
greatly.

You might also like