0% found this document useful (0 votes)

29 views4 pages

Decision Tree

The document discusses decision tree classifiers and includes code to implement one. It imports necessary libraries, loads diabetes patient data from a CSV file, defines functions to calculate entropy and information gain, sets the features and label, and includes a recursive function to create the decision tree by splitting on the feature with the highest information gain at each node until it reaches pure leaf nodes or runs out of features.

Uploaded by

Dishant kumar yadav mhakhariya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views4 pages

Decision Tree

Uploaded by

Dishant kumar yadav mhakhariya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Decision tree classifier

Dishant Kumar Yadav 2021BCS0136

Implementation:

General Terms: Let us first discuss a few statistical concepts used in this post.

Entropy: The entropy of a dataset, is a measure the impurity, of the dataset Entropy can also be
thought, as a measure of uncertainty. We should try to minimize, the Entropy. The goal of
machine learning models is to reduce uncertainty or entropy, as far as possible.

Information Gain: Information gain, is a measure of, how much information, a feature gives us
about the classes. Decision Trees algorithm, will always try, to maximize information gain.
Feature, that perfectly partitions the data, should give maximum information. A feature, with the
highest Information gain, will be used for split first.

keyboard_arrow_down Import Libraries:

We are going to import NumPy and the pandas library.

# Import the required libraries

import pandas as pd
import numpy as np

from google.colab import files

uploaded = files.upload()

Choose Files diabetes11.csv

diabetes11.csv(text/csv) - 7491 bytes, last modified: 17/1/2024 - 100% done
Saving diabetes11.csv to diabetes11.csv

import shutil

# Assuming the file name is 'diabetes.csv'

shutil.move('diabetes11.csv', '/content/diabetes11.csv')
'/content/diabetes11.csv'

import os

# List files in the /content directory

os.listdir('/content')

['.config',
'diabetes (1).csv',
'diabetes.csv',
'diabetes11.csv',
'sample_data']

import pandas as pd

# Read the CSV file into a DataFrame

df = pd.read_csv('/content/diabetes11.csv')

# Display the first few rows of the DataFrame

df.head()
1 to 5 of 5 entries Filter
index Glucose BloodPressure diabetes
0 148 72 1
1 85 66 0
2 183 64 1
3 89 66 0
4 137 40 1
Show 25 per page

Like what you see? Visit the data table notebook to learn more about interactive tables.

Distributions

2-d distributions

Time series

# Define the calculate entropy function

def calculate_entropy(df_label):
classes,class_counts = np.unique(df_label,return_counts = True)
entropy_value = np.sum([(-class_counts[i]/np.sum(class_counts))*np.log2(class_counts
for i in range(len(classes))])
return entropy_value
# Define the calculate information gain function
def calculate_information_gain(dataset,feature,label):
# Calculate the dataset entropy
dataset_entropy = calculate_entropy(dataset[label])
values,feat_counts= np.unique(dataset[feature],return_counts=True)

# Calculate the weighted feature entropy # Call the ca

weighted_feature_entropy = np.sum([(feat_counts[i]/np.sum(feat_counts))*calculate_ent
==values[i]).dropna()[label]) for i in range(len(values))]
feature_info_gain = dataset_entropy - weighted_feature_entropy
return feature_info_gain
# Set the features and label
features = df.columns[:-1]
label = 'diabetes'
parent=None
features

Index(['Glucose', 'BloodPressure'], dtype='object')

import numpy as np

def create_decision_tree(dataset, df, features, label, parent=None):

datum = np.unique(df[label], return_counts=True)
unique_data = np.unique(dataset[label])

if len(unique_data) <= 1:
return unique_data[0]

elif len(dataset) == 0:
return unique_data[np.argmax(datum[1])]

elif len(features) == 0:
return parent

else:
parent = unique_data[np.argmax(datum[1])]

# call the calculate_information_gain function

item_values = [calculate_information_gain(dataset, feature, label) for feature in
optimum_feature = features[np.argmax(item_values)]

Decision Trees for Data Scientists
No ratings yet
Decision Trees for Data Scientists
28 pages
Experiment 8
No ratings yet
Experiment 8
4 pages
Aih Exp 2
No ratings yet
Aih Exp 2
8 pages
Exp 3 121a1047 Lavanya Kurup ML
No ratings yet
Exp 3 121a1047 Lavanya Kurup ML
4 pages
Data Mining Lab: Naive Bayes & Decision Trees
No ratings yet
Data Mining Lab: Naive Bayes & Decision Trees
4 pages
20MIS7095 (LAB 7) .Ipynb Colaboratory
No ratings yet
20MIS7095 (LAB 7) .Ipynb Colaboratory
4 pages
Experiment 8 ML Vtu
No ratings yet
Experiment 8 ML Vtu
4 pages
Experiment 2
No ratings yet
Experiment 2
17 pages
MLPPT 11 45
No ratings yet
MLPPT 11 45
31 pages
Lecture 15: Tree-Based Algorithms - Applied ML
No ratings yet
Lecture 15: Tree-Based Algorithms - Applied ML
17 pages
PyCaret Machine Learning Setup Guide
No ratings yet
PyCaret Machine Learning Setup Guide
12 pages
IT0089 TB391 Decision Tree RABE
No ratings yet
IT0089 TB391 Decision Tree RABE
6 pages
Decision Tree Classifier
No ratings yet
Decision Tree Classifier
3 pages
Experiment 8
No ratings yet
Experiment 8
14 pages
C2 W4 Lab 02 Tree Ensemble
No ratings yet
C2 W4 Lab 02 Tree Ensemble
16 pages
Diabetes - Test Report
No ratings yet
Diabetes - Test Report
62 pages
23 Ucc 554 Aiml
No ratings yet
23 Ucc 554 Aiml
5 pages
C2 W4 Lab 02 Tree Ensemble
No ratings yet
C2 W4 Lab 02 Tree Ensemble
10 pages
ML0101EN Clas Decision Trees Drug Py v1
No ratings yet
ML0101EN Clas Decision Trees Drug Py v1
12 pages
Data Mining Assignment No. 1
No ratings yet
Data Mining Assignment No. 1
7 pages
Pyhton 2
No ratings yet
Pyhton 2
8 pages
Data Mining Journal 4 Kashan
No ratings yet
Data Mining Journal 4 Kashan
8 pages
Decision Tree
No ratings yet
Decision Tree
2 pages
Decision Tree Algorithm in Healthcare AI
No ratings yet
Decision Tree Algorithm in Healthcare AI
10 pages
9 TH
No ratings yet
9 TH
3 pages
Bagging Codes
No ratings yet
Bagging Codes
1 page
Heart Disease Prediction with Decision Trees
No ratings yet
Heart Disease Prediction with Decision Trees
6 pages
ML LAb Task
No ratings yet
ML LAb Task
4 pages
Supervised Learning Classifier Guide
No ratings yet
Supervised Learning Classifier Guide
10 pages
DMBI
No ratings yet
DMBI
15 pages
PR 6
No ratings yet
PR 6
2 pages
Decision Tree Classification
No ratings yet
Decision Tree Classification
1 page
8.program Decisiontree
No ratings yet
8.program Decisiontree
15 pages
BCA Machine Learning Practical Guide
No ratings yet
BCA Machine Learning Practical Guide
59 pages
ML Lab Record2
No ratings yet
ML Lab Record2
42 pages
20BCE7620 AP2021228000397 Experiment-6 Removed
No ratings yet
20BCE7620 AP2021228000397 Experiment-6 Removed
19 pages
Building A Decision Tree Classifier From Scratch
No ratings yet
Building A Decision Tree Classifier From Scratch
10 pages
Lab Manual - MachineLearningLaboratory-DR - Vaishnavi
No ratings yet
Lab Manual - MachineLearningLaboratory-DR - Vaishnavi
71 pages
Decision Tree Based Id3 Algorithm
No ratings yet
Decision Tree Based Id3 Algorithm
2 pages
Decision Tree Basics for Students
No ratings yet
Decision Tree Basics for Students
29 pages
Naive Bayes Diabetes Prediction Model
No ratings yet
Naive Bayes Diabetes Prediction Model
17 pages
Heart Disease Prediction - Jupyter Notebook
100% (1)
Heart Disease Prediction - Jupyter Notebook
9 pages
Build a Decision Tree Classifier Guide
No ratings yet
Build a Decision Tree Classifier Guide
6 pages
K-Nearest Neighbors For Diabetes Prediction: Malik Yousaf (F2020019038) Ahsan Rauf (F2020019057)
No ratings yet
K-Nearest Neighbors For Diabetes Prediction: Malik Yousaf (F2020019038) Ahsan Rauf (F2020019057)
15 pages
Diabetes Prediction 1704256341
No ratings yet
Diabetes Prediction 1704256341
17 pages
Diabetes Case Study - Jupyter Notebook
100% (1)
Diabetes Case Study - Jupyter Notebook
10 pages
End To End Project Multiple Disease Detection Using ML - Nomidl
No ratings yet
End To End Project Multiple Disease Detection Using ML - Nomidl
24 pages
CS440: HW3
No ratings yet
CS440: HW3
7 pages
Prediction of Diabetes Using Machine Learning
No ratings yet
Prediction of Diabetes Using Machine Learning
4 pages
Your First Deep Learning Project in Python With Keras Step-By-Step
No ratings yet
Your First Deep Learning Project in Python With Keras Step-By-Step
229 pages
Ensemblediabetes - Ipynb - Colab
No ratings yet
Ensemblediabetes - Ipynb - Colab
4 pages
C2 W4 Decision Tree With Markdown
No ratings yet
C2 W4 Decision Tree With Markdown
14 pages
Decision Tree and Related Techniques For Classification in Scalation
No ratings yet
Decision Tree and Related Techniques For Classification in Scalation
12 pages
Stochastic and Overfitting
No ratings yet
Stochastic and Overfitting
19 pages
Decision Tree Code Explanation
No ratings yet
Decision Tree Code Explanation
4 pages
Practical 9 Decision Tree Classification
No ratings yet
Practical 9 Decision Tree Classification
24 pages
Federated Learning With Non-IID Data: Yue Zhao Meng Li Liangzhen Lai
No ratings yet
Federated Learning With Non-IID Data: Yue Zhao Meng Li Liangzhen Lai
12 pages
Class Assignment
No ratings yet
Class Assignment
8 pages
Protein 3D Structure & Folding Prediction
No ratings yet
Protein 3D Structure & Folding Prediction
17 pages
Lecture 2 - Barriers To Communication
No ratings yet
Lecture 2 - Barriers To Communication
11 pages
C Programming Problem Solutions
No ratings yet
C Programming Problem Solutions
60 pages

Decision Tree

Uploaded by

Decision Tree

Uploaded by

Decision tree classifier

Dishant Kumar Yadav 2021BCS0136

keyboard_arrow_down Import Libraries:

# Import the required libraries

from google.colab import files

Choose Files diabetes11.csv

# Assuming the file name is 'diabetes.csv'

# List files in the /content directory

# Read the CSV file into a DataFrame

# Display the first few rows of the DataFrame

# Define the calculate entropy function

# Calculate the weighted feature entropy # Call the ca

Index(['Glucose', 'BloodPressure'], dtype='object')

def create_decision_tree(dataset, df, features, label, parent=None):

# call the calculate_information_gain function

You might also like