ML Exp8 C36
ML Exp8 C36
TUS3F202128 C36
PART A
(PART A: TO BE REFFERED BY STUDENTS)
Experiment No. 8
A.1 Aim:
To implement CART.
A.2 Prerequisite:
Python Basic Concepts
A.3 Outcome:
Students will be able to implement decision tree using CART.
A.4 Theory:
CART (Classification and Regression Tree) is a variation of the decision tree algorithm. It
can handle both classification and regression tasks. Scikit-Learn uses the Classification and
Regression Tree (CART) algorithm to train Decision Trees (also called “growing” trees).
CART was first produced by Leo Breiman, Jerome Friedman, Richard Olshen, and Charles
Stone in 1984.
CART Algorithm
CART is a predictive algorithm used in Machine learning and it explains how the target
variable’s values can be predicted based on other matters. It is a decision tree where each fork
is split into a predictor variable and each node has a prediction for the target variable at the
end.
In the decision tree, nodes are split into sub-nodes on the basis of a threshold value of an
attribute. The root node is taken as the training set and is split into two by considering the
best attribute and threshold value. Further, the subsets are also split using the same logic.
This continues till the last pure sub-set is found in the tree or the maximum number of leaves
possible in that growing tree.
The CART algorithm works via the following process:
• The best split point of each input is obtained.
• Based on the best split points of each input in Step 1, the new “best” split point is
identified.
• Split the chosen input according to the “best” split point.
• Continue splitting until a stopping rule is satisfied or no further desirable splitting is
available.
CART algorithm uses Gini Impurity to split the dataset into a decision tree .It does that by
searching for the best homogeneity for the sub nodes, with the help of the Gini index
criterion.
Prathmesh Gaikwad
TUS3F202128 C36
Program:
# Importing the required packages
import numpy as np
import pandas as pd
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
clf_entropy.fit(X_train, y_train)
return clf_entropy
# Driver code
def main():
# Building Phase
data = importdata()
X, Y, X_train, X_test, y_train, y_test = splitdataset(data)
clf_gini = train_using_gini(X_train, X_test, y_train)
clf_entropy = tarin_using_entropy(X_train,
X_test, y_train)
# Operational Phase
print("Results Using Gini Index:")
# Prediction using gini
y_pred_gini = prediction(X_test, clf_gini)
cal_accuracy(y_test, y_pred_gini)
print("Results Using Entropy:")
# Prediction using entropy
Output:
Prathmesh Gaikwad
TUS3F202128 C36
PART B
(PART B: TO BE COMPLETED BY STUDENTS)
cv_results['test_f1'].mean()
cv_results['test_roc_auc'].mean()
Prathmesh Gaikwad
TUS3F202128 C36
# 6. Feature Importance
def plot_importance(model, features, num=len(X), save=False):
feature_imp = pd.DataFrame({'Value': model.feature_importances_, 'Feature':
features.columns})
plt.figure(figsize=(10, 10))
sns.set(font_scale=1)
sns.barplot(x="Value", y="Feature", data=feature_imp.sort_values(by="Value",
ascending=False)[0:num])
plt.title('Features')
plt.tight_layout()
plt.show()
if save:
plt.savefig('importances.png')
plot_importance(cart_final, X, 15)
# 7. Analyzing Model Complexity with Learning Curves
train_score, test_score = validation_curve(
cart_final, X=X, y=y,
param_name='max_depth',
param_range=range(1, 11),
scoring="roc_auc",
cv=10)
mean_train_score = np.mean(train_score, axis=1)
mean_test_score = np.mean(test_score, axis=1)
plt.plot(range(1, 11), mean_train_score,
label="Training Score", color='b')
plt.plot(range(1, 11), mean_test_score,
label="Validation Score", color='g')
plt.tight_layout()
plt.legend(loc='best')
plt.show()
# 8. Extracting Decision Rules
tree_rules = export_text(cart_model, feature_names=list(X.columns))
print(tree_rules)
B.4 Conclusion:
Hence, we successfully studied and implemented CART.