Open In App

Decision Tree Regression using sklearn - Python

Last Updated : 09 May, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Decision Tree Regression is a method used to predict continuous values like prices or scores by using a tree-like structure. It works by splitting the data into smaller parts based on simple rules taken from the input features. These splits help reduce errors in prediction. At the end of each branch, called a leaf node the model gives a prediction usually the average value of that group. In the tree:

  • Decision Nodes (shown as diamonds) ask yes/no questions about the data, like “Is age greater than 50?”
  • Leaf Nodes (shown as rectangles) give the final predicted number based on the data that reached that point.
python
Workflow of Decision Tree Regression

Branches connect nodes and represent the outcome of a decision. For example if the answer to a condition is "Yes," you follow one branch; if "No," you follow another. In below example it shows a decision tree that evaluates the smallest of three numbers:  

Implementation of Decision Tree Regression

For example we want to predict house prices based on factors like size, location and age. A Decision Tree Regressor can split the data based on these features such as checking the location first, then the size and finally the age. This way it can accurately predicts the price by considering the most impactful factors first making it useful and easy to interpret.

Let's see the Step-by-Step implementation using scikit learn library in python - 

Step 1: Import the required libraries. 

Python
import numpy as np
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeRegressor, export_text
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

Step 2: Initialize and print the Dataset.

Here we create a synthetic dataset using numpy library to create a non linear dataset.

Python
# Generate synthetic dataset
np.random.seed(42)
X = np.sort(5 * np.random.rand(100, 1), axis=0)
y = np.sin(X).ravel() + np.random.normal(0, 0.1, X.shape[0])

# Visualize the dataset
plt.scatter(X, y, color='red', label='Data')
plt.title("Synthetic Dataset")
plt.xlabel("Feature")
plt.ylabel("Target")
plt.legend()
plt.show()

Output:

download6
Non-linear Data

Step 3: Split the Dataset

Split the dataset into train and test dataset.

Python
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

Step 4: Initialize the Decision Tree Regressor

Here we used DecisionTreeRegressor method from Sklearn python library to implement Decision Tree Regression.

Python
regressor = DecisionTreeRegressor(max_depth=4, random_state=42)

Step 5: Fit decision tree regressor to the dataset.

Python
regressor.fit(X_train, y_train)

Output:

DecisionTreeRegressor(max_depth=4, random_state=42)

Step 6: Predicting a new value.

Python
y_pred = regressor.predict(X_test)

mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.4f}")

Output:

Mean Squared Error: 0.0151

Step 7: Visualizing the result.

Python
X_grid = np.arange(min(X), max(X), 0.01)[:, np.newaxis]
y_grid_pred = regressor.predict(X_grid)
plt.figure(figsize=(10, 6))
plt.scatter(X, y, color='red', label='Data')
plt.plot(X_grid, y_grid_pred, color='blue', label='Model Prediction')
plt.title("Decision Tree Regression")
plt.xlabel("Feature")
plt.ylabel("Target")
plt.legend()
plt.show()

Output:

download7
Decision Tree Regression

Step 8: Export and Show the Tree Structure below

For better understanding how model made decision we used plot_tree to visualize it and interpret model working.

Python
from sklearn.tree import plot_tree

# Visualizing decision tree
plt.figure(figsize=(20, 10))
plot_tree(
    regressor,
    feature_names=["Feature"],
    filled=True,
    rounded=True,
    fontsize=10
)
plt.title("Decision Tree Structure")
plt.show()

Output: 

download8-
Visualized Decision Tree Regression

Decision Tree Regression is used for predicting continuous values effectively capturing non-linear patterns in data. Its tree-based structure makes model interpretability easy as we can tell why a decision was made and why we get this specific output. This information can further be used to fine tune model based on it flow of working.


Next Article

Similar Reads