Open In App

Indexing in Databases - Set 1

Last Updated : 06 May, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Indexing is a crucial technique used in databases to optimize data retrieval operations. It improves query performance by minimizing disk I/O operations, thus reducing the time it takes to locate and access data. Essentially, indexing allows the database management system (DBMS) to locate data more efficiently without having to scan the entire dataset.

Indexes are organized data structures that allow quick searching based on key values. When an index is created for a database table, it maintains a sorted order of key values along with pointers to the actual data rows. This process significantly reduces the number of disk accesses required to fulfill a query.

Structure of Index in Database
Structure of Index in Database

Attributes of Indexing

Several Important attributes of indexing affect the performance and efficiency of database operations:

1. Access Types: This refers to the type of access such as value-based search, range access, etc.

2. Access Time: It refers to the time needed to find a particular data element or set of elements.

3. Insertion Time: It refers to the time taken to find the appropriate space and insert new data.

4. Deletion Time: Time taken to find an item and delete it as well as update the index structure.

5. Space Overhead: It refers to the additional space required by the index.

Structure of Index in Database
Structure of Index in Database

File Organization in Indexing

File organization refers to how data and indexes are physically stored in memory or on disk. The following are the common types of file organizations used in indexing:

1. Sequential File Organization (Ordered Index File)

In this type of organization, the indices are based on a sorted ordering of the values. These are generally fast and a more traditional type of storing mechanism. These Ordered or Sequential file organizations might store the data in a dense or sparse format.

  • Dense Index: Every search key value in the data file corresponds to an index record. This method ensures that each key value has a reference to its data location.

    Example: If a table contains multiple entries for the same key, a dense index ensures that each key value has its own index record.
Dense Index
Dense Index
  • Sparse Index: The index record appears only for a few items in the data file. Each item points to a block as shown. To locate a record, we find the index record with the largest search key value less than or equal to the search key value we are looking for.

    Access Method: To locate a record, we find the index record with the largest key value less than or equal to the search key, and then follow the pointers sequentially.

    Access Cost: Accesses=log⁡2(n)+1\text{Accesses} = \log_2(n) + 1Accesses=log2​(n)+1, where nnn is the number of blocks involved in the index file.
Sparse Index
Sparse Index

2. Hash File Organization

In hash file organization, data is distributed across a range of buckets based on a hash function applied to the key values. The hash function maps each key to a particular bucket, where the corresponding data can be located.

Types of Indexing Methods

There are different types of indexing techniques, each optimized for specific use cases.

1. Clustered Indexing

Clustered indexing is a technique where multiple related records are stored together in the same file. This helps reduce the cost of searching because related data is kept close to each other. Clustered indexing is especially useful when multiple tables or records need to be frequently joined. Storing related records together makes this process faster and more efficient.

In clustered indexing, the data is stored in an ordered file, usually based on a non-key field. This ordering can be based on a primary key or, in some cases, a non-primary key. When an index is created on non-primary key columns, which may not be unique, the solution is to combine two or more columns together to form a unique value. This combination is then used to create the index.

Clustered indexing works by grouping records with similar properties together. For example, students can be grouped by their semester, such as first-semester, second-semester, and so on. By grouping related records together, it becomes faster to retrieve them because the index allows for quicker identification and search of the data.

Clustered Indexing
Clustered Indexing

2. Primary Indexing

This is a type of Clustered Indexing wherein the data is sorted according to the search key and the primary key of the database table is used to create the index. It is a default format of indexing where it induces sequential file organization. As primary keys are unique and are stored in a sorted manner, the performance of the searching operation is quite efficient. 

  • Key Features: The data is stored in sequential order, making searches faster and more efficient.

3. Non-clustered or Secondary Indexing

A non-clustered index just tells us where the data lies, i.e. it gives us a list of virtual pointers or references to the location where the data is actually stored. Data is not physically stored in the order of the index. Instead, data is present in leaf nodes.

Example: The contents page of a book. Each entry gives us the page number or location of the information stored. The actual data here(information on each page of the book) is not organized but we have an ordered reference(contents page) to where the data points actually lie. We can have only dense ordering in the non-clustered index as sparse ordering is not possible because data is not physically organized accordingly. 

It requires more time as compared to the clustered index because some amount of extra work is done in order to extract the data by further following the pointer. In the case of a clustered index, data is directly present in front of the index.

Non Clustered Indexing
Non Clustered Indexing

4. Multilevel Indexing

With the growth of the size of the database, indices also grow. As the index is stored in the main memory, a single-level index might become too large a size to store with multiple disk accesses. The multilevel indexing segregates the main block into various smaller blocks so that the same can be stored in a single block.

The outer blocks are divided into inner blocks which in turn are pointed to the data blocks. This can be easily stored in the main memory with fewer overheads. This hierarchical approach reduces memory overhead and speeds up query execution.

Multilevel Indexing
Multilevel Indexing

Advantages of Indexing

  • Improved Query Performance: Indexing enables faster data retrieval from the database. The database may rapidly discover rows that match a specific value or collection of values by generating an index on a column, minimizing the amount of time it takes to perform a query.
  • Efficient Data Access: Indexing can enhance data access efficiency by lowering the amount of disk I/O required to retrieve data. The database can maintain the data pages for frequently visited columns in memory by generating an index on those columns, decreasing the requirement to read from disk.
  • Optimized Data Sorting: Indexing can also improve the performance of sorting operations. By creating an index on the columns used for sorting, the database can avoid sorting the entire table and instead sort only the relevant rows.
  • Consistent Data Performance: Indexing can assist ensure that the database performs consistently even as the amount of data in the database rises. Without indexing, queries may take longer to run as the number of rows in the table grows, while indexing maintains a roughly consistent speed.
  • Data Integrity: By ensuring that only unique values are inserted into columns that have been indexed as unique, indexing can also be utilized to ensure the integrity of data. This avoids storing duplicate data in the database, which might lead to issues when performing queries or reports.

Disadvantages of Indexing

While indexing offers many advantages, it also comes with certain trade-offs:

  • Increased Storage Space: Indexes require additional storage. Depending on the size of the data, this can significantly increase the overall storage requirements.
  • Increased Maintenance Overhead: Indexes must be updated whenever data is inserted, deleted, or modified, which can slow down these operations.
  • Slower Insert/Update Operations: Since indexes must be maintained and updated, inserting or updating data takes longer than in a non-indexed database.
  • Complexity in Choosing the Right Index: Determining the appropriate indexing strategy for a particular dataset can be challenging and requires an understanding of query patterns and access behaviors.

Features of Indexing

Several key features define the indexing process in databases:

  • Efficient Data Structures: Indexes use efficient data structures like B-trees, B+ trees, and hash tables to enable fast data retrieval.
  • Periodic Index Maintenance: Indexes need to be periodically maintained, especially when the underlying data changes frequently. Maintenance tasks include updating, rebuilding, or removing obsolete indexes.
  • Query Optimization: Indexes play a critical role in query optimization. The DBMS query optimizer uses indexes to determine the most efficient execution plan for a query.
  • Handling Fragmentation: Index fragmentation can reduce the effectiveness of an index. Regular defragmentation can help maintain optimal performance.

Conclusion

Indexing is a very useful technique that helps in optimizing the search time in database queries. The table of database indexing consists of a search key and pointer. There are four types of indexing: Primary, Secondary Clustering, and Multivalued Indexing. Primary indexing is divided into two types, dense and sparse. Dense indexing is used when the index table contains records for every search key. Sparse indexing is used when the index table does not use a search key for every record. Multilevel indexing uses B+ Tree. The main purpose of indexing is to provide better performance for data retrieval. 


Next Article
Article Tags :

Similar Reads