0% found this document useful (0 votes)

25 views

ML Exp8 C36

Uploaded by

Prathmesh Gaikwad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views

ML Exp8 C36

Uploaded by

Prathmesh Gaikwad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 18

Prathmesh Gaikwad

TUS3F202128 C36

PART A
(PART A: TO BE REFFERED BY STUDENTS)

Experiment No. 8

A.1 Aim:
To implement CART.

A.2 Prerequisite:
Python Basic Concepts

A.3 Outcome:
Students will be able to implement decision tree using CART.

A.4 Theory:

CART (Classification and Regression Tree) is a variation of the decision tree algorithm. It
can handle both classification and regression tasks. Scikit-Learn uses the Classification and
Regression Tree (CART) algorithm to train Decision Trees (also called “growing” trees).
CART was first produced by Leo Breiman, Jerome Friedman, Richard Olshen, and Charles
Stone in 1984.
CART Algorithm
CART is a predictive algorithm used in Machine learning and it explains how the target
variable’s values can be predicted based on other matters. It is a decision tree where each fork
is split into a predictor variable and each node has a prediction for the target variable at the
end.
In the decision tree, nodes are split into sub-nodes on the basis of a threshold value of an
attribute. The root node is taken as the training set and is split into two by considering the
best attribute and threshold value. Further, the subsets are also split using the same logic.
This continues till the last pure sub-set is found in the tree or the maximum number of leaves
possible in that growing tree.
The CART algorithm works via the following process:
• The best split point of each input is obtained.
• Based on the best split points of each input in Step 1, the new “best” split point is
identified.
• Split the chosen input according to the “best” split point.
• Continue splitting until a stopping rule is satisfied or no further desirable splitting is
available.

CART algorithm uses Gini Impurity to split the dataset into a decision tree .It does that by
searching for the best homogeneity for the sub nodes, with the help of the Gini index
criterion.
Prathmesh Gaikwad
TUS3F202128 C36

Gini index/Gini impurity

The Gini index is a metric for the classification tasks in CART. It stores the sum of squared
probabilities of each class. It computes the degree of probability of a specific variable that is
wrongly being classified when chosen randomly and a variation of the Gini coefficient. It
works on categorical variables, provides outcomes either “successful” or “failure” and hence
conducts binary splitting only.

The degree of the Gini index varies from 0 to 1,

• Where 0 depicts that all the elements are allied to a certain class, or only one class
exists there.
• The Gini index of value 1 signifies that all the elements are randomly distributed
across various classes, and
• A value of 0.5 denotes the elements are uniformly distributed into some classes.

Mathematically, we can write Gini Impurity as follows:

where pi is the probability of an object being classified to a particular class.

Prathmesh Gaikwad
TUS3F202128 C36

Program:
# Importing the required packages
import numpy as np
import pandas as pd
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report

# Function importing Dataset

def importdata():
balance_data =
pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-' +
'databases/balance-scale/balance-
scale.data',
sep=',', header=None)
# Printing the dataset shape
print("Dataset Length: ", len(balance_data))
print("Dataset Shape: ", balance_data.shape)
# Printing the dataset obseravtions print (balance_data.head())
return balance_data

# Function to split the dataset

def splitdataset(balance_data):
# Separating the target
variable X =
balance_data.values[:, 1:5] Y =
balance_data.values[:, 0]

# Splitting the dataset into train and test

X_train, X_test, y_train, y_test = train_test_split(X, Y,
test_size=0.3, random_state=
100)

return X, Y, X_train, X_test, y_train, y_test

# Function to perform training with giniIndex.

def train_using_gini(X_train, X_test, y_train):
# Creating the classifier object
clf_gini = DecisionTreeClassifier(criterion="gini",
random_state=100, max_depth=3,
min_samples_leaf=5)
# Performing training
clf_gini.fit(X_train, y_train)
return clf_gini

# Function to perform training with entropy.

def tarin_using_entropy(X_train, X_test, y_train):
# Decision tree with entropy
clf_entropy = DecisionTreeClassifier(
criterion="entropy", random_state=100, max_depth=3,
min_samples_leaf=5)
# Performing training
Prathmesh Gaikwad
TUS3F202128 C36

clf_entropy.fit(X_train, y_train)
return clf_entropy

# Function to make predictions

def prediction(X_test, clf_object):
# Predicton on test with giniIndex
y_pred = clf_object.predict(X_test)
print("Predicted values:")
print(y_pred)
return y_pred

# Function to calculate accuracy

def cal_accuracy(y_test, y_pred):
print("Confusion Matrix: ", confusion_matrix(y_test, y_pred))
print("Accuracy : ", accuracy_score(y_test, y_pred) * 100)
print("Report : ", classification_report(y_test, y_pred))

# Driver code
def main():
# Building Phase
data = importdata()
X, Y, X_train, X_test, y_train, y_test = splitdataset(data)
clf_gini = train_using_gini(X_train, X_test, y_train)
clf_entropy = tarin_using_entropy(X_train,
X_test, y_train)
# Operational Phase
print("Results Using Gini Index:")
# Prediction using gini
y_pred_gini = prediction(X_test, clf_gini)
cal_accuracy(y_test, y_pred_gini)
print("Results Using Entropy:")
# Prediction using entropy

y_pred_entropy = prediction(X_test, clf_entropy)

cal_accuracy(y_test, y_pred_entropy)
# Calling main function

if name == " main ":

main()
Prathmesh Gaikwad
TUS3F202128 C36

Output:
Prathmesh Gaikwad
TUS3F202128 C36

PART B
(PART B: TO BE COMPLETED BY STUDENTS)

Roll No: BE-C36 Name: Prathmesh Krishna Gaikwad

Class: BE-Comps Batch: C2
Date of Experiment: 22/08/2023 Date of Submission: 22/08/2023
Grade:

B.1 Software Code written by student:

# Modeling using CART
import warnings
import joblib
#import pydotplus
import numpy as np
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
from sklearn.tree import DecisionTreeClassifier, export_graphviz, export_text
from sklearn.metrics import classification_report, roc_auc_score
from sklearn.model_selection import train_test_split, GridSearchCV, cross_validate,
validation_curve
#from skompiler import skompile
#Reading the dataset
df = pd.read_csv("diabetes.csv")
# The first 5 observation units of the data set were accessed.
df.head()
y = df["Outcome"]
X = df.drop(["Outcome"], axis=1)
# Model
cart_model = DecisionTreeClassifier(random_state=17).fit(X, y)
#y_pred for Confusion Matrix :
y_pred = cart_model.predict(X)
Prathmesh Gaikwad
TUS3F202128 C36

#y_prob for AUC:

y_prob = cart_model.predict_proba(X)[:, 1]
# Confusion matrix
print(classification_report(y, y_pred))
# AUC
roc_auc_score(y, y_prob)
# Evaluation of Success with the Holdout Method
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=85)
cart_model = DecisionTreeClassifier(random_state=17).fit(X_train, y_train)
# Train Error
y_pred = cart_model.predict(X_train)
y_prob = cart_model.predict_proba(X_train)[:, 1]
print(classification_report(y_train, y_pred))
roc_auc_score(y_train, y_prob)
# Test Error
y_pred = cart_model.predict(X_test)
y_prob = cart_model.predict_proba(X_test)[:, 1]
print(classification_report(y_test, y_pred))
roc_auc_score(y_test, y_prob)
# Evaluation of Success with Cross Validation
cart_model = DecisionTreeClassifier(random_state=17).fit(X, y)
cv_results = cross_validate(cart_model,
X, y,
cv=10,
scoring=["accuracy", "f1", "roc_auc"])
cv_results['test_accuracy'].mean()
cv_results['test_f1'].mean()
cv_results['test_roc_auc'].mean()
# Hyperparameter Optimization with GridSearchCV
cart_model.get_params()
# Hyperparameter set to search:
Prathmesh Gaikwad
TUS3F202128 C36

cart_params = {'max_depth': range(1, 11),

"min_samples_split": range(2, 20)}
# GridSearchCV
cart_best_grid = GridSearchCV(cart_model,
cart_params,
cv=5,
n_jobs=-1,
verbose=True).fit(X, y)
# Best hyper parameter values:
cart_best_grid.best_params_
# Best score:
cart_best_grid.best_score_
random = X.sample(1, random_state=45)
print(random)
cart_best_grid.predict(random)
# 5. Final Model
cart_final = DecisionTreeClassifier(**cart_best_grid.best_params_,
random_state=17).fit(X, y)
cart_final.get_params()
# Another way to assign the best parameters to the model:
cart_final = cart_model.set_params(**cart_best_grid.best_params_).fit(X, y)
# CV error of final model:
cv_results = cross_validate(cart_final,
X, y,
cv=10,
scoring=["accuracy", "f1", "roc_auc"])
cv_results['test_accuracy'].mean()

cv_results['test_f1'].mean()

cv_results['test_roc_auc'].mean()
Prathmesh Gaikwad
TUS3F202128 C36

# 6. Feature Importance
def plot_importance(model, features, num=len(X), save=False):
feature_imp = pd.DataFrame({'Value': model.feature_importances_, 'Feature':
features.columns})
plt.figure(figsize=(10, 10))
sns.set(font_scale=1)
sns.barplot(x="Value", y="Feature", data=feature_imp.sort_values(by="Value",
ascending=False)[0:num])
plt.title('Features')
plt.tight_layout()
plt.show()
if save:
plt.savefig('importances.png')
plot_importance(cart_final, X, 15)
# 7. Analyzing Model Complexity with Learning Curves
train_score, test_score = validation_curve(
cart_final, X=X, y=y,
param_name='max_depth',
param_range=range(1, 11),
scoring="roc_auc",
cv=10)
mean_train_score = np.mean(train_score, axis=1)
mean_test_score = np.mean(test_score, axis=1)
plt.plot(range(1, 11), mean_train_score,
label="Training Score", color='b')
plt.plot(range(1, 11), mean_test_score,
label="Validation Score", color='g')

plt.title("Validation Curve for CART")

plt.xlabel("Number of max_depth")
plt.ylabel("AUC")
Prathmesh Gaikwad
TUS3F202128 C36

plt.tight_layout()
plt.legend(loc='best')
plt.show()
# 8. Extracting Decision Rules
tree_rules = export_text(cart_model, feature_names=list(X.columns))
print(tree_rules)

B.2 Input and Output:

Prathmesh Gaikwad
TUS3F202128 C36
Prathmesh Gaikwad
TUS3F202128 C36
Prathmesh Gaikwad
TUS3F202128 C36
Prathmesh Gaikwad
TUS3F202128 C36
Prathmesh Gaikwad
TUS3F202128 C36

B.3 Observations and learning:

This algorithm is widely used in making Decision Trees through Classification and
Regression. Decision Trees are widely used in data mining to create a model that predicts the
value of a target based on the values of many input variables (or independent variables).

B.4 Conclusion:
Hence, we successfully studied and implemented CART.

B.5 Question of Curiosity (Handwritten any 3)

1. Explain and write CART algorithm for drawing decision trees.
2. How does CART differ from the other decision tree algorithms?
3. What are the main advantages of using CART for classification and regression techniques?
4. What are the key steps involved in implementing CART in python?
5. How does CART handle categorical and numerical features in the data?
6. What are the techniques or approaches for visualizing and interpreting CART trees?
Prathmesh Gaikwad
TUS3F202128 C36
Prathmesh Gaikwad
TUS3F202128 C36
Prathmesh Gaikwad
TUS3F202128 C36

Programs Lab Bca
No ratings yet
Programs Lab Bca
16 pages
MLA Lab 6:-Implementation of Decision Tree
No ratings yet
MLA Lab 6:-Implementation of Decision Tree
16 pages
Lecture 7.2 - DTC Algorithm Implementation
No ratings yet
Lecture 7.2 - DTC Algorithm Implementation
7 pages
Ai Combined Update
No ratings yet
Ai Combined Update
274 pages
sklearn
No ratings yet
sklearn
141 pages
Udacity Machine Learning Analysis Supervised Learning
100% (1)
Udacity Machine Learning Analysis Supervised Learning
504 pages
Decision TREE
100% (1)
Decision TREE
3 pages
23ucc542_ml9
No ratings yet
23ucc542_ml9
6 pages
Data Science Cheatsheet
No ratings yet
Data Science Cheatsheet
4 pages
ML Concepts: 1. Parametric Vs Non-Parametric Models:: Examples: Linear, Logistic, SVM
No ratings yet
ML Concepts: 1. Parametric Vs Non-Parametric Models:: Examples: Linear, Logistic, SVM
34 pages
ML Practical 205160694034
No ratings yet
ML Practical 205160694034
33 pages
Perform Prediction Using Regression Algorithm: Ex No: 1 Date
No ratings yet
Perform Prediction Using Regression Algorithm: Ex No: 1 Date
13 pages
Assignment I 2024
No ratings yet
Assignment I 2024
2 pages
Data Mining Assignment No. 1
No ratings yet
Data Mining Assignment No. 1
7 pages
Week 7 Laboratory Activity
No ratings yet
Week 7 Laboratory Activity
12 pages
PCA2-1
No ratings yet
PCA2-1
26 pages
Aiml Ex 4-7
No ratings yet
Aiml Ex 4-7
8 pages
MLT 1 - 7 Kanish
No ratings yet
MLT 1 - 7 Kanish
24 pages
MlLabManualdocx 2024 09 04 22 02 58
No ratings yet
MlLabManualdocx 2024 09 04 22 02 58
19 pages
5b Python Implementation of Decision Tree
No ratings yet
5b Python Implementation of Decision Tree
7 pages
Pytorch (Tabular) - Regression
No ratings yet
Pytorch (Tabular) - Regression
13 pages
Choosing Model and Tuning
No ratings yet
Choosing Model and Tuning
20 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
ML Lab Manual PDF
No ratings yet
ML Lab Manual PDF
9 pages
ML2
No ratings yet
ML2
7 pages
Types of Pruning Techniques
No ratings yet
Types of Pruning Techniques
10 pages
Supple Maximizing Performance in Cs CuBiCl
No ratings yet
Supple Maximizing Performance in Cs CuBiCl
5 pages
2023-24_ML_NOTES_2
No ratings yet
2023-24_ML_NOTES_2
16 pages
C2W3 Lab 01 Model Evaluation and Selection
No ratings yet
C2W3 Lab 01 Model Evaluation and Selection
21 pages
C2W3_Lab_01_Model_Evaluation_and_Selection
No ratings yet
C2W3_Lab_01_Model_Evaluation_and_Selection
21 pages
What Is Decision Tree?: ISM Implementation of Decision Tree Submitted By: Sagiruddin Akthar 19mcmc28
No ratings yet
What Is Decision Tree?: ISM Implementation of Decision Tree Submitted By: Sagiruddin Akthar 19mcmc28
4 pages
Project 1
No ratings yet
Project 1
4 pages
AI Lab M.Tech
No ratings yet
AI Lab M.Tech
29 pages
ML New record (5)
No ratings yet
ML New record (5)
51 pages
Title: Implementation of Decision Tree Classification: Department of Computer Science and Engineering
No ratings yet
Title: Implementation of Decision Tree Classification: Department of Computer Science and Engineering
8 pages
Machine Learning (Se204A) Lab Manual
No ratings yet
Machine Learning (Se204A) Lab Manual
27 pages
ml lab programs 2
No ratings yet
ml lab programs 2
16 pages
ML MANUAL WITH OUTPUTS (2)
No ratings yet
ML MANUAL WITH OUTPUTS (2)
30 pages
LAB-4 Report
No ratings yet
LAB-4 Report
21 pages
6
No ratings yet
6
7 pages
ml using python programs
No ratings yet
ml using python programs
12 pages
Q3-Copy1: Pandas PD Numpy NP CSV
No ratings yet
Q3-Copy1: Pandas PD Numpy NP CSV
7 pages
M6 - Model Overfitting
No ratings yet
M6 - Model Overfitting
30 pages
ML Unit 2
No ratings yet
ML Unit 2
37 pages
Assignment 3
No ratings yet
Assignment 3
3 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
6 - Steps of The Classification Algorithm in Supervised Learning
No ratings yet
6 - Steps of The Classification Algorithm in Supervised Learning
15 pages
Data Mining Journal 4 Kashan
No ratings yet
Data Mining Journal 4 Kashan
8 pages
23ucc554aiml
No ratings yet
23ucc554aiml
5 pages
Guide
No ratings yet
Guide
24 pages
Business Data Mining Week 11
No ratings yet
Business Data Mining Week 11
15 pages
CART Algorithm
No ratings yet
CART Algorithm
65 pages
LAB MANUAL For Machine Learning
No ratings yet
LAB MANUAL For Machine Learning
15 pages
Chapter04 - Getting Started With Neural Networks
No ratings yet
Chapter04 - Getting Started With Neural Networks
9 pages
ANN_EXPERIENTIAL_LEARNING
No ratings yet
ANN_EXPERIENTIAL_LEARNING
43 pages
DM Lab 04
No ratings yet
DM Lab 04
6 pages
Presented by Elden 18mca514
No ratings yet
Presented by Elden 18mca514
15 pages
AIH_Lab2
No ratings yet
AIH_Lab2
10 pages
ML Exp5 C36
No ratings yet
ML Exp5 C36
18 pages
ML Exp1 C36
No ratings yet
ML Exp1 C36
13 pages
Re RL Nfa
No ratings yet
Re RL Nfa
7 pages
252211se Exp4 C37
No ratings yet
252211se Exp4 C37
13 pages
Terna Engineering College: LAB Manual Part A
No ratings yet
Terna Engineering College: LAB Manual Part A
7 pages
CG EXP12 C28 Mini Project
No ratings yet
CG EXP12 C28 Mini Project
9 pages
Java Exp6 c28
No ratings yet
Java Exp6 c28
10 pages
Terna Engineering College: LAB Manual Part A
No ratings yet
Terna Engineering College: LAB Manual Part A
10 pages
LAB Manual Part A: Experiment No.04
No ratings yet
LAB Manual Part A: Experiment No.04
12 pages

ML Exp8 C36

Uploaded by

ML Exp8 C36

Uploaded by

Prathmesh Gaikwad

Gini index/Gini impurity

The degree of the Gini index varies from 0 to 1,

Mathematically, we can write Gini Impurity as follows:

where pi is the probability of an object being classified to a particular class.

# Function importing Dataset

# Function to split the dataset

# Splitting the dataset into train and test

return X, Y, X_train, X_test, y_train, y_test

# Function to perform training with giniIndex.

# Function to perform training with entropy.

# Function to make predictions

# Function to calculate accuracy

y_pred_entropy = prediction(X_test, clf_entropy)

if name == " main ":

Roll No: BE-C36 Name: Prathmesh Krishna Gaikwad

B.1 Software Code written by student:

#y_prob for AUC:

cart_params = {'max_depth': range(1, 11),

plt.title("Validation Curve for CART")

B.2 Input and Output:

B.3 Observations and learning:

B.5 Question of Curiosity (Handwritten any 3)

You might also like