The organization of hierarchical data is a unique challenge in the area of database management DBMS. Hierarchical structures are common in many fields, from organizations in charts to storage systems and categories of products.
Careful consideration of the database schema and the chosen storage model is required to effectively store and query hierarchical data in RDBMS relational databases.
In this article, we will go through the options available for storing hierarchical data in a relational database, exploring their advantages, disadvantages, and use cases. The options mainly are:
- Adjacency List Model
- Path Enumeration
- Nested Set Model
- Materialized Path Model
Before diving in, let's acknowledge these concepts:
What is a Relational Database?
A relational database is a type of database that organizes data into rows and columns which collectively form a table where the data points are related to each other RDBMS.
SQL queries aggregate data, aiding firms in business performance analysis, process optimization, and insight generation. They organize data by linking tables through primary and foreign keys, revealing interconnections.
Storing Hierarchical Data in a Database
Managing hierarchical data in relational databases presents a challenge due to the mismatch between the hierarchical structures and the tabular nature of relational databases. The Strategies are explained below:
1. Adjacency List Model
The Adjacency List Model is a simple and intuitive way to represent hierarchical data in relational databases. In this model, each record contains a reference to its parent record, forming a tree-like structure. For instance, an employee table might have a field referencing the manager's ID for each employee.
Example:
Here's a basic schema for an employee table using the Adjacency List Model:
CREATE TABLE Employee (
employee_id INT PRIMARY KEY,
name VARCHAR(100),
manager_id INT,
FOREIGN KEY (manager_id) REFERENCES Employee(employee_id)
);
- employee_id: Unique identifier for each employee.
- name: Name of the employee.
- manager_id: Reference to the manager's employee_id.
Let's populate the table with some sample data:
INSERT INTO Employee (employee_id, name, manager_id) VALUES
(1, 'John Doe', NULL), -- John Doe is the CEO, so he has no manager.
(2, 'Jane Smith', 1), -- Jane Smith reports to John Doe.
(3, 'Alice Johnson', 1), -- Alice Johnson also reports to John Doe.
(4, 'Bob Williams', 2), -- Bob Williams reports to Jane Smith.
(5, 'Emily Brown', 2); -- Emily Brown also reports to Jane Smith.
Now, let's query the data to see the organizational hierarchy in SQL:
SELECT
e.employee_id,
e.name,
COALESCE(m.name, 'CEO') AS manager
FROM
Employee e
LEFT JOIN
Employee m ON e.manager_id = m.employee_id;
This query retrieves each employee's ID, name, and the name of their manager. We use a LEFT JOIN to ensure that even employees without a manager (such as the CEO) are included in the results. The result of the query might look like this:
employee_id
| name
| manager
|
---|
1
| John Doe
| CEO
|
---|
2
| Jane Smith
| John Doe
|
---|
3
| Alice Johnson
| John Doe
|
---|
4
| Bob Williams
| Jane Smith
|
---|
5
| Emily Brown
| Jane Smith
|
---|
Pros:
- Simple to grasp and put into practice.
- Flexibility is the representation of asymmetrical hierarchies.
Cons:
- Inefficient querying and traversing of hierarchies might occur, particularly for deeply nested structures.
- Recursive queries, which can be intricate and resource-intensive, are frequently required.
2. Path Enumeration
In a relational database, path enumeration is a technique for storing hierarchical data in which the complete path of each node from the root node is recorded. With a specified path, this method makes it easier to get parent or child nodes; but, it may result in slower queries, particularly for large datasets.
Example:
Let us consider the scenario where we wish to depict a filesystem hierarchy in which every file or folder has a name, a unique identifier, and its whole path.
CREATE TABLE FileSystem (
node_id INT PRIMARY KEY,
name VARCHAR(100),
full_path VARCHAR(255)
);
- node_id: Unique identifier for each node.
- name: Name of the file or folder.
- full_path: Full path of the node in the filesystem.
Let's populate the table with some sample data:
INSERT INTO FileSystem (node_id, name, full_path) VALUES
(1, 'Root', '/'),
(2, 'Documents', '/Documents'),
(3, 'Images', '/Documents/Images'),
(4, 'File1.txt', '/Documents/File1.txt'),
(5, 'File2.txt', '/Documents/File2.txt'),
(6, 'File3.jpg', '/Documents/Images/File3.jpg');
Now, let's query the data to see the filesystem hierarchy:
SELECT * FROM FileSystem;
Output:
node_id
| name
| full_path
|
---|
1
| Root
|
/
|
---|
2
| Documents
| /Documents
|
---|
3
| Images
| /Documents/Images
|
---|
4
| File1.txt
| /Documents/File1.txt
|
---|
5
| File2.txt
| /Documents/File2.txt
|
---|
6
| File3.jpg
| /Documents/Images/File3.jpg
|
---|
As you can see, each node in the filesystem hierarchy has a unique identifier, a name, and its full path.
Querying for a specific node or its children becomes easier using the full_path column. For example, to retrieve all children of the "/Documents" folder, you can use:
SELECT * FROM FileSystem WHERE full_path LIKE '/Documents/%';
In essence, all children of the '/Documents' folder will be retrieved by this query, which will return any nodes whose complete path begins with '/Documents/'.
Path Enumeration makes retrieving hierarchical data easier, but it can make searches take longer, especially for structures with several levels of nesting. Furthermore, it could be necessary to update several rows to update the hierarchy, which could affect performance. Because of this, it's critical to carefully weigh the trade-offs when selecting a relational database storage format for hierarchical data.
Pros:
- Straightforward to implement.
- Simple to retrieve parent or child nodes given a specific path.
Cons:
- Retrieval and traversal of hierarchies can be slow and resource-intensive, especially for large datasets.
- Limited support for operations like subtree queries or reordering nodes.
3. Nested Set Model
With the Nested Set Model, two numbers—a left value and a right value—represent each node in the tree for storing hierarchical data in a relational database. The way these values are allocated makes it possible to query the hierarchical structure effectively, get subtrees, and perform operations like counting descendants.
Example:
Let us say we wish to depict a hierarchical organizational structure in which every employee has a name, a position within the hierarchy, and a unique identity.
Here's how we can create a table to represent this structure in SQLite:
CREATE TABLE Employee (
employee_id INTEGER PRIMARY KEY,
name TEXT,
left_value INTEGER,
right_value INTEGER
);
- employee_id: Unique identifier for each employee.
- name: Name of the employee.
- left_value: Left boundary value for the node in the nested set.
- right_value: Right boundary value for the node in the nested set.
Let's populate the table with some sample data:
INSERT INTO Employee (employee_id, name, left_value, right_value) VALUES
(1, 'John Doe', 1, 10),
(2, 'Jane Smith', 2, 5),
(3, 'Alice Johnson', 6, 9),
(4, 'Bob Williams', 3, 4),
(5, 'Emily Brown', 7, 8);
Now, let's query the data to see the organizational hierarchy:
SELECT * FROM Employee;
Output:
employee_id
| name
| left_value
| right_value
|
---|
1
| John Doe
|
1
|
10
|
---|
2
| Jane Smith
|
2
|
5
|
---|
3
| Alice Johnson
|
6
|
9
|
---|
4
| Bob Williams
|
3
|
4
|
---|
5
| Emily Brown
|
7
|
8
|
---|
Pros:
- Efficient for subtree retrieval and operations like counting descendants.
- Well-suited for hierarchies with frequent read operations.
Cons:
- Complex to maintain, especially when nodes are inserted or deleted.
- Queries involving updates to the hierarchy can be challenging and computationally expensive.
Materialized Path Model
Similar to path enumeration, the materialized path model stores the full path of each node, along with additional optimizations such as storing the depth of each node.
Pros:
- Simplifies querying and traversal of hierarchies.
- Supports operations like subtree retrieval and path-based queries efficiently.
Cons:
- May require additional storage space.
- Updates to the hierarchy can be complex, especially when nodes are moved or reorganized.
Conclusion
In conclusion, choosing the appropriate storage model for hierarchical data in a relational database depends on various factors, including the size and complexity of the hierarchy, the frequency of updates, and the types of queries that will be performed. While each storage model has its advantages and disadvantages, understanding the nuances of each approach is crucial for designing efficient and scalable database schemas that effectively manage hierarchical data.
Similar Reads
SQL Interview Questions
Are you preparing for a SQL interview? SQL is a standard database language used for accessing and manipulating data in databases. It stands for Structured Query Language and was developed by IBM in the 1970s, SQL allows us to create, read, update, and delete data with simple yet effective commands.
15+ min read
Non-linear Components
In electrical circuits, Non-linear Components are electronic devices that need an external power source to operate actively. Non-Linear Components are those that are changed with respect to the voltage and current. Elements that do not follow ohm's law are called Non-linear Components. Non-linear Co
11 min read
SQL Tutorial
SQL is a Structured query language used to access and manipulate data in databases. SQL stands for Structured Query Language. We can create, update, delete, and retrieve data in databases like MySQL, Oracle, PostgreSQL, etc. Overall, SQL is a query language that communicates with databases.In this S
11 min read
SQL Commands | DDL, DQL, DML, DCL and TCL Commands
SQL commands are crucial for managing databases effectively. These commands are divided into categories such as Data Definition Language (DDL), Data Manipulation Language (DML), Data Control Language (DCL), Data Query Language (DQL), and Transaction Control Language (TCL). In this article, we will e
7 min read
SQL Joins (Inner, Left, Right and Full Join)
SQL joins are fundamental tools for combining data from multiple tables in relational databases. Joins allow efficient data retrieval, which is essential for generating meaningful observations and solving complex business queries. Understanding SQL join types, such as INNER JOIN, LEFT JOIN, RIGHT JO
6 min read
Normal Forms in DBMS
In the world of database management, Normal Forms are important for ensuring that data is structured logically, reducing redundancy, and maintaining data integrity. When working with databases, especially relational databases, it is critical to follow normalization techniques that help to eliminate
8 min read
Class Diagram | Unified Modeling Language (UML)
A UML class diagram is a visual tool that represents the structure of a system by showing its classes, attributes, methods, and the relationships between them. It helps everyone involved in a projectâlike developers and designersâunderstand how the system is organized and how its components interact
12 min read
ACID Properties in DBMS
In the world of Database Management Systems (DBMS), transactions are fundamental operations that allow us to modify and retrieve data. However, to ensure the integrity of a database, it is important that these transactions are executed in a way that maintains consistency, correctness, and reliabilit
8 min read
Spring Boot Tutorial
Spring Boot is a Java framework that makes it easier to create and run Java applications. It simplifies the configuration and setup process, allowing developers to focus more on writing code for their applications. This Spring Boot Tutorial is a comprehensive guide that covers both basic and advance
10 min read
Backpropagation in Neural Network
Back Propagation is also known as "Backward Propagation of Errors" is a method used to train neural network . Its goal is to reduce the difference between the modelâs predicted output and the actual output by adjusting the weights and biases in the network.It works iteratively to adjust weights and
9 min read