0% found this document useful (0 votes)
4 views

Decision Tree

The document outlines a Python implementation of a Decision Tree classifier using the scikit-learn library to predict whether to play tennis based on weather conditions. It includes data preprocessing steps, model training, evaluation metrics such as accuracy and confusion matrix, and visualizes the decision tree. The model achieved an accuracy of 1.0 on the test set.

Uploaded by

angelin272004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Decision Tree

The document outlines a Python implementation of a Decision Tree classifier using the scikit-learn library to predict whether to play tennis based on weather conditions. It includes data preprocessing steps, model training, evaluation metrics such as accuracy and confusion matrix, and visualizes the decision tree. The model achieved an accuracy of 1.0 on the test set.

Uploaded by

angelin272004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

DECISION TREE

# Import necessary libraries


import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
import matplotlib.pyplot as plt

# Load the dataset


data = {
'Outlook': ['Sunny', 'Sunny', 'Overcast', 'Rainy', 'Rainy', 'Rainy', 'Overcast', 'Sunny', 'Sunny',
'Rainy', 'Sunny', 'Overcast', 'Overcast', 'Rainy'],
'Temperature': ['Hot', 'Hot', 'Hot', 'Mild', 'Cool', 'Cool', 'Cool', 'Mild', 'Cool', 'Mild', 'Mild',
'Mild', 'Hot', 'Mild'],
'Humidity': ['High', 'High', 'High', 'High', 'Normal', 'Normal', 'Normal', 'High', 'Normal',
'Normal', 'Normal', 'High', 'Normal', 'High'],
'Wind': ['Weak', 'Strong', 'Weak', 'Weak', 'Weak', 'Strong', 'Strong', 'Weak', 'Weak', 'Weak',
'Strong', 'Strong', 'Weak', 'Strong'],
'PlayTennis': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No']
}

# Convert the dictionary to a DataFrame


df = pd.DataFrame(data)

# Convert categorical variables to numerical using one-hot encoding


df = pd.get_dummies(df, columns=['Outlook', 'Temperature', 'Humidity', 'Wind'])

# Separate features and target variable


X = df.drop('PlayTennis', axis=1)
y = df['PlayTennis']
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize Decision Tree classifier


decision_tree = DecisionTreeClassifier()

# Train the model


decision_tree.fit(X_train, y_train)

# Make predictions on the testing set


y_pred = decision_tree.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

# Print confusion matrix


print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))

# Print classification report


print("\nClassification Report:")
print(classification_report(y_test, y_pred))

# Convert feature names Index to a list


feature_names = X.columns.tolist()

# Plot the decision tree


plt.figure(figsize=(12, 8))
plot_tree(decision_tree, feature_names=feature_names, class_names=['No', 'Yes'], filled=True)
plt.show()
Output
Accuracy: 1.0

Confusion Matrix:
[[1 0]
[0 2]]

Classification Report:
Precision recall f1-score support

No 1.00 1.00 1.00 1


Yes 1.00 1.00 1.00 2

accuracy 1.00 3
macro avg 1.00 1.00 1.00 3
weighted avg 1.00 1.00 1.00 3
Step by Step Explanation
1. Import Necessary Libraries:

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.tree import DecisionTreeClassifier, plot_tree

from sklearn.metrics import confusion_matrix, accuracy_score, classification_report

import matplotlib.pyplot as plt

Explanation:

 pandas: Library for data manipulation and analysis.


 train_test_split: Function to split the dataset into training and testing sets.
 DecisionTreeClassifier: Class for decision tree classification model.
 plot_tree: Function to visualize the decision tree.
 confusion_matrix, accuracy_score, classification_report: Functions to evaluate the model's
performance.
 matplotlib.pyplot: Library for plotting graphs.
2. Load the Dataset:

data = {

'Outlook': ['Sunny', 'Sunny', 'Overcast', 'Rainy', 'Rainy', 'Rainy', 'Overcast', 'Sunny', 'Sunny',
'Rainy', 'Sunny', 'Overcast', 'Overcast', 'Rainy'],

'Temperature': ['Hot', 'Hot', 'Hot', 'Mild', 'Cool', 'Cool', 'Cool', 'Mild', 'Cool', 'Mild', 'Mild',
'Mild', 'Hot', 'Mild'],

'Humidity': ['High', 'High', 'High', 'High', 'Normal', 'Normal', 'Normal', 'High', 'Normal',
'Normal', 'Normal', 'High', 'Normal', 'High'],

'Wind': ['Weak', 'Strong', 'Weak', 'Weak', 'Weak', 'Strong', 'Strong', 'Weak', 'Weak', 'Weak',
'Strong', 'Strong', 'Weak', 'Strong'],

'PlayTennis': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No']

Df=pd.DataFrame(data)

Explanation:

 We define a dictionary containing the "Play Tennis" dataset.


 Then we convert this dictionary to a pandas DataFrame.
3. Data Preprocessing:

df = pd.get_dummies(df, columns=['Outlook', 'Temperature', 'Humidity', 'Wind'])

X = df.drop('PlayTennis', axis=1)

y = df['PlayTennis']
Explanation:

 We use one-hot encoding to convert categorical variables into numerical format.


 X contains the features, and y contains the target variable.

4. Split Data into Training and Testing Sets:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Explanation:

 We split the dataset into training and testing sets using train_test_split function.
 We use 80% of the data for training and 20% for testing.

5. Initialize and Train Decision Tree Model:

decision_tree = DecisionTreeClassifier()

decision_tree.fit(X_train, y_train)

Explanation:

 We initialize a DecisionTreeClassifier object.


 Then we train the model using the training data.

6. Make Predictions and Evaluate Model:

y_pred = decision_tree.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)

conf_matrix = confusion_matrix(y_test, y_pred)

class_report = classification_report(y_test, y_pred)

Explanation:

 We make predictions on the testing data using predict method.


 Then we calculate accuracy using accuracy_score.
 We also compute the confusion matrix and classification report.

7. Print Model Evaluation Metrics:

print("Accuracy:", accuracy)

print("\nConfusion Matrix:")

print(conf_matrix)

print("\nClassification Report:")

print(class_report)
Explanation:

 We print the accuracy, confusion matrix, and classification report to evaluate the
model's performance.

8. Plot the Decision Tree:

plt.figure(figsize=(12, 8))

plot_tree(decision_tree, feature_names=X.columns, class_names=['No', 'Yes'], filled=True)

plt.show()

Explanation:

 Finally, we plot the decision tree using plot_tree function to visualize the model's
decision-making process.
 We specify feature names and class names for better interpretation of the tree.

**************************

You might also like