How to Extract Information from the Decision Rules in rpart Package?
Last Updated :
11 Jun, 2024
The rpart package in R is widely used for creating decision tree models. Decision trees are valuable because they provide a clear and interpretable set of rules for making predictions. Extracting and understanding these rules can offer insights into how the model makes decisions and which features are most important. This article will guide you through extracting information from the decision rules created by the rpart package in R.
Creating a Decision Tree
Before we can extract information from decision rules, we need to create a decision tree. For this example, we will use the iris dataset.
Install and Load Necessary Libraries
Ensure you have the rpart and rpart.plot packages installed and loaded.
R
install.packages("rpart")
install.packages("rpart.plot")
library(rpart)
library(rpart.plot)
Load the Dataset
Load the built-in iris dataset in R Programming Language.
R
Create a Decision Tree Model
Create a decision tree model using the rpart function.
R
set.seed(123) # Set seed for reproducibility
tree_model <- rpart(Species ~ ., data = iris, method = "class")
Plot the Decision Tree
Visualize the decision tree using the rpart.plot function.
R
Output:
Extract Information from the Decision Rules in rpart PackageExtracting Information from Decision Rules
To understand and extract the decision rules from the tree model, we can use various functions and methods.
Print the Detailed Summary of the Tree
The printcp function provides a detailed summary of the decision tree, including the complexity parameter and error rates.
R
Output:
Classification tree:
rpart(formula = Species ~ ., data = iris, method = "class")
Variables actually used in tree construction:
[1] Petal.Length Petal.Width
Root node error: 100/150 = 0.66667
n= 150
CP nsplit rel error xerror xstd
1 0.50 0 1.00 1.20 0.048990
2 0.44 1 0.50 0.76 0.061232
3 0.01 2 0.06 0.07 0.025833
Extract the Rules
The rpart package allows you to extract decision rules using the path.rpart function or by directly parsing the model.
R
# Extract rules from the tree model
rules <- path.rpart(tree_model, node = 1:tree_model$frame$n)
# Print the extracted rules
for (i in 1:length(rules)) {
cat(paste("Rule for Node", i, ":\n"))
cat(paste(rules[[i]], collapse = "\n"), "\n\n")
}
Output:
Rule for Node 1 :
root
Rule for Node 2 :
root
Petal.Length< 2.45
Rule for Node 3 :
root
Petal.Length>=2.45
Rule for Node 4 :
root
Petal.Length>=2.45
Petal.Width< 1.75
Rule for Node 5 :
root
Petal.Length>=2.45
Petal.Width>=1.75
Detailed Node Information
You can also extract detailed information about each node, including the split condition, number of observations, and predicted class.
R
# Extract detailed node information
tree_details <- as.data.frame(tree_model$frame)
# Display node details
print(tree_details)
Output:
var n wt dev yval complexity ncompete nsurrogate yval2.V1
1 Petal.Length 150 150 100 1 0.50 3 3 1.00000000
2 <leaf> 50 50 0 1 0.01 0 0 1.00000000
3 Petal.Width 100 100 50 2 0.44 3 3 2.00000000
6 <leaf> 54 54 5 2 0.00 0 0 2.00000000
7 <leaf> 46 46 1 3 0.01 0 0 3.00000000
yval2.V2 yval2.V3 yval2.V4 yval2.V5 yval2.V6 yval2.V7
1 50.00000000 50.00000000 50.00000000 0.33333333 0.33333333 0.33333333
2 50.00000000 0.00000000 0.00000000 1.00000000 0.00000000 0.00000000
3 0.00000000 50.00000000 50.00000000 0.00000000 0.50000000 0.50000000
6 0.00000000 49.00000000 5.00000000 0.00000000 0.90740741 0.09259259
7 0.00000000 1.00000000 45.00000000 0.00000000 0.02173913 0.97826087
yval2.nodeprob
1 1.00000000
2 0.33333333
3 0.66666667
6 0.36000000
7 0.30666667
tree_model$frame contains detailed information about each node in the decision tree, including variables used for splitting, number of observations, and more.
Visualize Important Splits
Plotting the variable importance can help you understand which variables are most influential in the decision-making process.
R
# Extract and plot variable importance
importance <- tree_model$variable.importance
barplot(importance, main = "Variable Importance", col = "lightblue", las = 2)
Output:
Extract Information from the Decision Rules in rpart PackageConvert the Tree to Rules
The rattle package can convert the decision tree into readable rules.
R
install.packages("rattle")
library(rattle)
# Convert the decision tree to rules
asRules(tree_model)
Output:
Rule number: 2 [Species=setosa cover=50 (33%) prob=1.00]
Petal.Length< 2.45
Rule number: 7 [Species=virginica cover=46 (31%) prob=0.00]
Petal.Length>=2.45
Petal.Width>=1.75
Rule number: 6 [Species=versicolor cover=54 (36%) prob=0.00]
Petal.Length>=2.45
Petal.Width< 1.75
The rattle package simplifies the decision tree into readable rules, facilitating easier interpretation.
Conclusion
Extracting and understanding decision rules from the rpart package in R is a valuable skill for interpreting decision tree models. By following the steps outlined in this article, you can create a decision tree, extract detailed decision rules, and gain insights into the model's decision-making process. This enhances the transparency and interpretability of your machine learning models, providing clearer insights for decision-making.
Similar Reads
How to Extract the Decision Rules from scikit-learn Decision-tree? You might have already learned how to build a Decision-Tree Classifier, but might be wondering how the scikit-learn actually does that. So, in this article, we will cover this in a step-by-step manner. You can run the code in sequence, for better understanding. Decision-Tree uses tree-splitting cri
4 min read
Testing Rules Generated by Rpart Package The rpart package in R is a powerful tool for constructing classification and regression trees. These trees are useful for various predictive modeling tasks. However, after generating a decision tree, it's important to test and validate the rules to ensure they are effective and generalizable. This
5 min read
How to Extract Characters from a String in R Strings are one of R's most commonly used data types, and manipulating them is essential in many data analysis and cleaning tasks. Extracting specific characters or substrings from a string is a crucial operation. In this article, weâll explore different methods to extract characters from a string i
4 min read
How to Specify Split in a Decision Tree in R Programming? Decision trees are versatile and widely used machine learning algorithms for both classification and regression tasks. A fundamental aspect of building decision trees is determining how to split the dataset at each node effectively. In this comprehensive guide, we will explore the theory behind deci
6 min read
How to Make a Tree Plot Using Caret Package in R Tree-based methods are powerful tools for both classification and regression tasks in machine learning. The caret package in R provides a consistent interface for training, tuning, and evaluating various machine learning models, including decision trees. In this article, we will walk through the ste
3 min read
How to Use RWeka Package on a Dataset? The RWeka package in R provides a convenient interface to the powerful machine-learning algorithms offered by the Weka library. Weka is a widely used suite of machine learning software that contains a collection of tools for data preprocessing, classification, regression, clustering, and visualizati
5 min read