How to Extract Information from the Decision Rules in rpart Package?

Last Updated : 11 Jun, 2024

The rpart package in R is widely used for creating decision tree models. Decision trees are valuable because they provide a clear and interpretable set of rules for making predictions. Extracting and understanding these rules can offer insights into how the model makes decisions and which features are most important. This article will guide you through extracting information from the decision rules created by the rpart package in R.

Creating a Decision Tree

Before we can extract information from decision rules, we need to create a decision tree. For this example, we will use the iris dataset.

Install and Load Necessary Libraries

Ensure you have the rpart and rpart.plot packages installed and loaded.

install.packages("rpart")
install.packages("rpart.plot")
library(rpart)
library(rpart.plot)

Load the Dataset

Load the built-in iris dataset in R Programming Language.

data(iris)

Create a Decision Tree Model

Create a decision tree model using the rpart function.

set.seed(123)  # Set seed for reproducibility
tree_model <- rpart(Species ~ ., data = iris, method = "class")

Plot the Decision Tree

Visualize the decision tree using the rpart.plot function.

rpart.plot(tree_model)

Output:

Extract Information from the Decision Rules in rpart Package

Extracting Information from Decision Rules

To understand and extract the decision rules from the tree model, we can use various functions and methods.

Print the Detailed Summary of the Tree

The printcp function provides a detailed summary of the decision tree, including the complexity parameter and error rates.

printcp(tree_model)

Output:

Classification tree:
rpart(formula = Species ~ ., data = iris, method = "class")

Variables actually used in tree construction:
[1] Petal.Length Petal.Width 

Root node error: 100/150 = 0.66667

n= 150 

    CP nsplit rel error xerror     xstd
1 0.50      0      1.00   1.20 0.048990
2 0.44      1      0.50   0.76 0.061232
3 0.01      2      0.06   0.07 0.025833

Extract the Rules

The rpart package allows you to extract decision rules using the path.rpart function or by directly parsing the model.

# Extract rules from the tree model
rules <- path.rpart(tree_model, node = 1:tree_model$frame$n)

# Print the extracted rules
for (i in 1:length(rules)) {
  cat(paste("Rule for Node", i, ":\n"))
  cat(paste(rules[[i]], collapse = "\n"), "\n\n")
}

Output:

Rule for Node 1 :
root 

Rule for Node 2 :
root
Petal.Length< 2.45 

Rule for Node 3 :
root
Petal.Length>=2.45 

Rule for Node 4 :
root
Petal.Length>=2.45
Petal.Width< 1.75 

Rule for Node 5 :
root
Petal.Length>=2.45
Petal.Width>=1.75

Detailed Node Information

You can also extract detailed information about each node, including the split condition, number of observations, and predicted class.

# Extract detailed node information
tree_details <- as.data.frame(tree_model$frame)

# Display node details
print(tree_details)

Output:

           var   n  wt dev yval complexity ncompete nsurrogate    yval2.V1
1 Petal.Length 150 150 100    1       0.50        3          3  1.00000000
2       <leaf>  50  50   0    1       0.01        0          0  1.00000000
3  Petal.Width 100 100  50    2       0.44        3          3  2.00000000
6       <leaf>  54  54   5    2       0.00        0          0  2.00000000
7       <leaf>  46  46   1    3       0.01        0          0  3.00000000
     yval2.V2    yval2.V3    yval2.V4    yval2.V5    yval2.V6    yval2.V7
1 50.00000000 50.00000000 50.00000000  0.33333333  0.33333333  0.33333333
2 50.00000000  0.00000000  0.00000000  1.00000000  0.00000000  0.00000000
3  0.00000000 50.00000000 50.00000000  0.00000000  0.50000000  0.50000000
6  0.00000000 49.00000000  5.00000000  0.00000000  0.90740741  0.09259259
7  0.00000000  1.00000000 45.00000000  0.00000000  0.02173913  0.97826087
  yval2.nodeprob
1     1.00000000
2     0.33333333
3     0.66666667
6     0.36000000
7     0.30666667

tree_model$frame contains detailed information about each node in the decision tree, including variables used for splitting, number of observations, and more.

Visualize Important Splits

Plotting the variable importance can help you understand which variables are most influential in the decision-making process.

# Extract and plot variable importance
importance <- tree_model$variable.importance
barplot(importance, main = "Variable Importance", col = "lightblue", las = 2)

Output:

Convert the Tree to Rules

The rattle package can convert the decision tree into readable rules.

install.packages("rattle")
library(rattle)

# Convert the decision tree to rules
asRules(tree_model)

Output:

 Rule number: 2 [Species=setosa cover=50 (33%) prob=1.00]
   Petal.Length< 2.45

 Rule number: 7 [Species=virginica cover=46 (31%) prob=0.00]
   Petal.Length>=2.45
   Petal.Width>=1.75

 Rule number: 6 [Species=versicolor cover=54 (36%) prob=0.00]
   Petal.Length>=2.45
   Petal.Width< 1.75

The rattle package simplifies the decision tree into readable rules, facilitating easier interpretation.

Conclusion

Extracting and understanding decision rules from the rpart package in R is a valuable skill for interpreting decision tree models. By following the steps outlined in this article, you can create a decision tree, extract detailed decision rules, and gain insights into the model's decision-making process. This enhances the transparency and interpretability of your machine learning models, providing clearer insights for decision-making.

How to Extract Characters from a String in R

manojjai07nu

Improve

Article Tags :

How to Extract Information from the Decision Rules in rpart Package?

Creating a Decision Tree

Install and Load Necessary Libraries

Load the Dataset

Create a Decision Tree Model

Plot the Decision Tree

Extracting Information from Decision Rules

Print the Detailed Summary of the Tree

Extract the Rules

Detailed Node Information

Visualize Important Splits

Convert the Tree to Rules

Conclusion

Similar Reads

Thank You!

What kind of Experience do you want to share?