Database Indexing and Hashing Techniques

The document discusses indexing and hashing in database systems, explaining how indexing improves data retrieval efficiency through ordered and hash indices. It outlines the purpose of indexing, evaluation metrics, and basic concepts such as search keys and index types, including primary, secondary, dense, and sparse indices. Additionally, it covers hashing techniques, including static and dynamic hashing, and addresses issues like bucket overflows and the implementation of hash indices.

Uploaded by

sauravyadv31

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views39 pages

Database Indexing and Hashing Techniques

Uploaded by

sauravyadv31

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Indexing and Hashing

Database systems
Indexing
• Indexing is a data structure technique to efficiently
retrieve records from database files based on some
attributes on which the indexing has been done.
• Indexing in database systems is similar to the one
we see in books.
• Two basic kinds of indices:
– Ordered indices: search keys are stored in sorted order
– Hash indices: used to access data that is distributed
uniformly across a range of buckets using a “hash
function”.
Purpose of Indexing
• It is a data structure that is added to a file to
provide faster access to the data.
• It reduces the number of blocks that the
DBMS has to check.
Index Evaluation Metrics
• Access types: Access types supported efficiently. E.g.,
– records with a specified value in the attribute
– or records with an attribute value falling in a specified range of
values.
• Access time: Time it takes to find a particular data item, or set
of items, using the technique
• Insertion time: Time it takes to insert a new data item. This
value includes the time it takes to find the correct place to
insert the new data item, as well as the time it takes to update
the index structure
• Deletion time: Time it takes to delete a data item. This value
includes the time it takes to find the item to be deleted, as well
as the time it takes to update the index structure
• Space overhead: The additional space occupied by an index
structure.
Basic Concepts
• Search Key: An attribute or set of attributes
used to look up records in a file is called
search key.
• An Index file consists of records( called index
entries) of the form
Ordered Indices
• Each index structure is associated with a
particular search key.
• An ordered index, stores the value of the
search keys in sorted order and associates
with each search key the record that contains
it. E.g., index of a book, library catalog.
• A file may have several indices, on different
search keys.
Ordered Indices
• Primary index: in a sequentially ordered file, the index
whose search key specifies the sequential order of the file.
– Also called clustering index
– The search key of a primary index is usually but not necessarily
the primary key.
• Secondary index: an index whose search key specifies an
order different from the sequential order of the file.
– Also called non-clustering index.
• Index-sequential file: ordered sequential file with a
primary index.
Dense Index Files
Dense Index
Dense Index: It has index entries for every
search key value (and hence every record) in
the database file. The dense index can be built
on order as well as unordered fields of the
database files.
Dense Index Files
Sparse Index:
It has index entries for only some of the search
key values/records in the database file

.
Sparse Index Files
Multilevel Index
Multilevel Index
Secondary Indices
• Secondary indices must be dense, with an
index entry for every search-key value, and a
pointer to every record in the file.
• A primary index may be sparse, storing only
some of the search-key values, since it is
always possible to find records with
intermediate search-key values by a
sequential access to a part of the file.
Secondary Indices
• If the search key of a secondary index is not a candidate
key, it is not enough to point to just the first record with
each search-key value. The remaining records with the
same search-key value could be anywhere in the file, since
the records are ordered by the search key of the primary
index, rather than by the search key of the secondary index.
• Therefore, a secondary index must contain pointers to all
the records.
• An extra level of indirection is used to implement
secondary indices on search keys that are not candidate
keys.
EXAMPLE
Hashing
• One disadvantage of sequential file
organization is that we must access an index
structure to locate data, or must use binary
search, and that results in more I/O operations.
• File organizations based on the technique of
hashing allow us to avoid accessing an index
structure.
• Hashing also provides a way of constructing
indices.
Example
• Hash file organization of account file, using
branch_name as key
• There are 10 buckets,
• The binary representation of the ith character is
assumed to be the integer i.
• The hash function returns the sum of the binary
representations of the characters modulo 10
Hashing
• A bucket is a unit of storage containing one or more records
(a bucket is typically a disk block).
• In a hash file organization we obtain the bucket of a record
directly from its search-key value using a hash function.
• Hash function h is a function from the set of all search-key
values K to the set of all bucket addresses B.
• Hash function is used to locate records for access, insertion
as well as deletion.
• Records with different search-key values may be mapped to
the same bucket; thus entire bucket has to be searched
sequentially to locate a record.
Static Hashing
• In static hashing, when a search-key value is
provided the hash function always computes
the same address.
• For example, if mod-4 hash function is used
then it shall generate only 5 values. The
output address shall always be same for that
function. The numbers of buckets provided
remain same at all times.
Hash Function
• Worst hash function maps all search-key values to the
same bucket; this makes access time proportional to the
number of search-key values in the file.
• An ideal hash function is having following properties:
• The distribution is uniform. That is, the hash function
assigns each bucket the same number of search-key values
from the set of all possible search-key values.
• The distribution is random. That is, in the average case,
each bucket will have nearly the same number of values
assigned to it, regardless of the actual distribution of
search-key values.
Handling of Bucket Overflows
• If the bucket does not have enough space, a bucket
overflow is said to occur.
• Bucket overflow can occur for several reasons:
– Insufficient buckets: The number of buckets, denoted by
nB , must be chosen such that nB > nr /fr, where nr denotes the
total number of records that will be stored and fr denotes
the number of records that will fit in a bucket.
– Skew: Some buckets are assigned more records than are
others, so a bucket may overflow even when other buckets
still have space. This situation is called bucket skew. This
can occur due to two reasons:
• multiple records have same search-key value
• chosen hash function produces non-uniform distribution of key
values
Handling of Bucket Overflows
• Although the probability of bucket overflow can be reduced, it
cannot be eliminated; it is handled by using overflow buckets.
• Overflow chaining – the overflow buckets of a given bucket
are chained together in a linked list.
• Above scheme is called closed hashing.
Handling of Bucket Overflows
• Linear Probing: When hash function generates an address at
which data is already stored, the next free bucket is allocated
to it. This mechanism is called Open Hashing.
• Open hashing does not use overflow buckets, is not suitable
for database applications.
Hash Indices
• Hashing can be used not only for file organization, but
also for index-structure creation.
• A hash index organizes the search keys, with their
associated record pointers, into a hash file structure.
• The hash function is constructed as follows:
– Apply hash function on a search key to identify a bucket, and
store the key and its associated pointers in the bucket
• Strictly speaking, hash indices are always secondary
indices
– if the file itself is organized using hashing, a separate primary
hash index on it using the same search-key is unnecessary.
– However, we use the term hash index to refer to both
secondary index structures and hash organized files.
Example of Hash Index
Deficiencies of Static Hashing
• In static hashing, function h maps search-key values to a
fixed set of B of bucket addresses. Databases grow or shrink
with time.
– If initial number of buckets is too small, and file grows, and the
hash function is choose based on the current file size,
performance will degrade due to too much overflows.
– If space is allocated for anticipated growth, a significant amount
of space will be wasted initially (and buckets will be underfull).
– If database shrinks, again space will be wasted.
• One solution: periodic re-organization of the file with a new
hash function
– Expensive, disrupts normal operations
• Better solution: allow the number of buckets to be modified
dynamically.
Dynamic Hashing
• Dynamic hashing provides a mechanism in
which data buckets are added and removed
dynamically and on-demand.
• Dynamic hashing is also known as extended
hashing.
• Hash function, in dynamic hashing, is made to
produce large number of values and only a
few are used initially.
Dynamic Hashing
Hashing Practice Problems
Problem 1

• Consider a hash table of size seven, with starting index

zero, and a hash function (3x + 4)mod7. Assuming the
hash table is initially empty, which of the following is
the contents of the table when the sequence 1, 3, 8,
10 is inserted into the table using Open hashing? Note
that ‘_’ denotes an empty location in the table.
(A) 8, _, _, _, _, _, 10
(B) 1, 8, 10, _, _, _, 3
(C) 1, _, _, _, _, _,3
(D) 1, 10, 8, _, _, _, 3
• 1=> (3x+4)mod7=7mod7=0
• 3 => (3x+4)mod7=13mod7=6
• 8 => (3x+4)mod7=28mod7=0
Because address ‘0’ is not empty, store 8 at next
empty data bucket ‘1’
• 10 => (3x+4)mod7=34mod7=6
Because address ‘6’ is not empty, store 10 at next empty data
bucket ‘2’
Correct option is B
Problem 2
• The keys 12, 18, 13, 2, 3, 23, 5 and 15 are
inserted into an initially empty hash table of
length 10 using open addressing with hash
function h(k) = k mod 10 and linear probing.
What is the resultant hash table?
• H(k)=kmod10
• 12=> 12mod10=2
• 18=>18mod10=8
• 13=>13mod10=3
• 2=>2mod10=2, not empty, next available=4
• 3=>3mod10=3, not empty, next available=5
• 23=>23mod10=3, not empty, next available=6
• 5=>5mod10=5, not empty, next available=7
• 15=>15mod10=5, not empty, next available=9
Correct option is C
Problem 3
• For question number 2, what would the correct option if method
used is closed hashing?
• H(k)=kmod10
• 12=> 12mod10=2
• 18=>18mod10=8
• 13=>13mod10=3
• 2=>2mod10=2
• 3=>3mod10=3
• 23=>23mod10=3
• 5=>5mod10=5
• 15=>15mod10=5
Correct option is D
Problem 4
• A hash table of length 10 uses open
addressing with hash function h(k)=k mod 10,
and linear probing. After inserting 6 values
into an empty hash table, the table is as
shown below.
Which one of the following choices gives a
possible order in which the key values could
have been inserted in the table?
(A) 46, 42, 34, 52, 23, 33
(B) 34, 42, 23, 52, 33, 46
(C) 46, 34, 42, 23, 52, 33
(D) 42, 46, 33, 23, 34, 52
• Solution: We will check whether sequence given in option A can
lead to hash table given in question. Option A inserts 46, 42, 34,
52, 23, 33 as:
• For key 46, h(46) is 46%10 = 6. Therefore, 46 is placed at 6th index
For key 42, h(42) is 42%10 = 2. Therefore, 42 is placed at 2nd index
For key 34, h(34) is 34%10 = 4. Therefore, 34 is placed at 4th index
• For key 52, h(52) is 52%10 = 2. However, index 2 is occupied with
42. Therefore, 52 is placed at 3rd index in the hash table. But in
given hash table, 52 is placed at 5th index. Therefore, sequence in
option A can’t generate hash table given in question.
• In the similar way, we can check for other options as well which
leads to answer as (C).

Hashing and Indexing in Databases
No ratings yet
Hashing and Indexing in Databases
37 pages
Indexing and Hashing in DBMS Explained
No ratings yet
Indexing and Hashing in DBMS Explained
36 pages
Hashing in DBMS
No ratings yet
Hashing in DBMS
12 pages
Understanding Ordered Indices and Hashing
No ratings yet
Understanding Ordered Indices and Hashing
10 pages
Indexing, B-Tree & Hashing in DBMS
No ratings yet
Indexing, B-Tree & Hashing in DBMS
32 pages
Advanced Hashing Techniques in Databases
No ratings yet
Advanced Hashing Techniques in Databases
37 pages
Indexing Strategies in SQL Databases
No ratings yet
Indexing Strategies in SQL Databases
29 pages
Aplikasi Database Pegawai MKG
No ratings yet
Aplikasi Database Pegawai MKG
22 pages
Hashing Techniques in Databases
No ratings yet
Hashing Techniques in Databases
8 pages
Understanding Hashing in Databases
No ratings yet
Understanding Hashing in Databases
12 pages
Static and Dynamic Hashing in DBMS
No ratings yet
Static and Dynamic Hashing in DBMS
11 pages
Indexing vs. Hashing in DBMS
No ratings yet
Indexing vs. Hashing in DBMS
31 pages
B+ Tree
No ratings yet
B+ Tree
14 pages
DBMS Hashing
No ratings yet
DBMS Hashing
3 pages
Static vs Dynamic Hashing Explained
No ratings yet
Static vs Dynamic Hashing Explained
28 pages
Storage and Indexing in DBMS
No ratings yet
Storage and Indexing in DBMS
10 pages
Unit-4 Hand Written
No ratings yet
Unit-4 Hand Written
35 pages
Understanding Hashing in DBMS
No ratings yet
Understanding Hashing in DBMS
20 pages
Understanding Hashing in DBMS Techniques
No ratings yet
Understanding Hashing in DBMS Techniques
12 pages
DBMS Unit-4
No ratings yet
DBMS Unit-4
14 pages
Dynamic Hashing Overview
No ratings yet
Dynamic Hashing Overview
10 pages
Indexing and Hashing Techniques
No ratings yet
Indexing and Hashing Techniques
12 pages
Lec04 Hashing CH 11 P2
No ratings yet
Lec04 Hashing CH 11 P2
44 pages
Hashing Techniques in DBMS Explained
No ratings yet
Hashing Techniques in DBMS Explained
9 pages
Understanding Indexing Mechanisms in Databases
No ratings yet
Understanding Indexing Mechanisms in Databases
26 pages
Understanding Hashing Techniques and Functions
No ratings yet
Understanding Hashing Techniques and Functions
22 pages
Hash and Tree Indexing in DBMS
No ratings yet
Hash and Tree Indexing in DBMS
7 pages
Hashing Techniques in DBMS Explained
No ratings yet
Hashing Techniques in DBMS Explained
26 pages
File Organization in Database Design
No ratings yet
File Organization in Database Design
30 pages
Understanding Clustering and Secondary Indexes
No ratings yet
Understanding Clustering and Secondary Indexes
3 pages
Hashing Techniques in DBMS Explained
No ratings yet
Hashing Techniques in DBMS Explained
8 pages
Understanding File Organization and Operations
No ratings yet
Understanding File Organization and Operations
56 pages
Storage Media and File Organization
No ratings yet
Storage Media and File Organization
36 pages
Merged DBMS Unit-5
No ratings yet
Merged DBMS Unit-5
50 pages
Hash-Based Indexing Techniques Explained
No ratings yet
Hash-Based Indexing Techniques Explained
15 pages
File Organization and Indexing in DBMS
No ratings yet
File Organization and Indexing in DBMS
23 pages
Reorganizing Indexed Sequential Files
No ratings yet
Reorganizing Indexed Sequential Files
33 pages
Types of File Organization in DBMS
No ratings yet
Types of File Organization in DBMS
45 pages
Understanding Hashing Techniques in DBMS
No ratings yet
Understanding Hashing Techniques in DBMS
25 pages
Indexing and Hashing Techniques Explained
No ratings yet
Indexing and Hashing Techniques Explained
68 pages
Heap & Hash File Organization in DBMS
No ratings yet
Heap & Hash File Organization in DBMS
21 pages
Hashing and Indexing Techniques Explained
No ratings yet
Hashing and Indexing Techniques Explained
15 pages
Database Design and File Organization
No ratings yet
Database Design and File Organization
28 pages
Dynamic Hashing in DBMS Explained
No ratings yet
Dynamic Hashing in DBMS Explained
79 pages
Hashing Techniques in DBMS
No ratings yet
Hashing Techniques in DBMS
8 pages
External and Extendible Hashing Explained
No ratings yet
External and Extendible Hashing Explained
2 pages
Understanding Hashing in DBMS Techniques
No ratings yet
Understanding Hashing in DBMS Techniques
8 pages
Database Indexing and Query Techniques
No ratings yet
Database Indexing and Query Techniques
91 pages
Notes W6L71
No ratings yet
Notes W6L71
5 pages
Understanding Hashing Techniques in Data Structures
No ratings yet
Understanding Hashing Techniques in Data Structures
24 pages
Hashing Techniques in DBMS Explained
No ratings yet
Hashing Techniques in DBMS Explained
5 pages
File Organization Techniques in DBMS
No ratings yet
File Organization Techniques in DBMS
10 pages
Extendible Hashing Explained
No ratings yet
Extendible Hashing Explained
202 pages
Dmbs New Slides Unit 2
No ratings yet
Dmbs New Slides Unit 2
28 pages
Understanding Hashing Techniques in Databases
No ratings yet
Understanding Hashing Techniques in Databases
17 pages
Data Storage and Querying in DBMS
No ratings yet
Data Storage and Querying in DBMS
45 pages
Understanding Hashing Techniques in Data Structures
No ratings yet
Understanding Hashing Techniques in Data Structures
19 pages
File Organization and Indexing Techniques
No ratings yet
File Organization and Indexing Techniques
23 pages
Offshore Wind Farm Project Design Certificate
No ratings yet
Offshore Wind Farm Project Design Certificate
7 pages
BAIC Senova V8.50 Function Overview
No ratings yet
BAIC Senova V8.50 Function Overview
118 pages
EDME 22.1.0 Installation Guide 02
No ratings yet
EDME 22.1.0 Installation Guide 02
97 pages
Informix Administrator Guide
No ratings yet
Informix Administrator Guide
138 pages
Root Locus Controller Design Case Study
No ratings yet
Root Locus Controller Design Case Study
4 pages
VoIP Patent Infringement Dismissal Order
No ratings yet
VoIP Patent Infringement Dismissal Order
68 pages
Pro-Face GP - ST3000 - To - GP4000M - e
No ratings yet
Pro-Face GP - ST3000 - To - GP4000M - e
38 pages
IT Quiz Questions and Answers Guide
No ratings yet
IT Quiz Questions and Answers Guide
3 pages
Brake Failure Indicator and Auxiliary System
No ratings yet
Brake Failure Indicator and Auxiliary System
22 pages
Python Programming Exam Questions
No ratings yet
Python Programming Exam Questions
2 pages
Hardware and Expense Analysis Report
No ratings yet
Hardware and Expense Analysis Report
1 page
Fluid Dynamics and Heat Transfer Concepts
No ratings yet
Fluid Dynamics and Heat Transfer Concepts
4 pages
Understanding Tree Data Structures
No ratings yet
Understanding Tree Data Structures
113 pages
On Page Optimization For Proppanda - Ai - 6th February, 2026
No ratings yet
On Page Optimization For Proppanda - Ai - 6th February, 2026
2 pages
Acer 14" Laptops with SSD Options
No ratings yet
Acer 14" Laptops with SSD Options
10 pages
TTM AMM3 INST Mod 01
No ratings yet
TTM AMM3 INST Mod 01
58 pages
Multiple Linear Regression: Normal Equation
No ratings yet
Multiple Linear Regression: Normal Equation
6 pages
Branch Data Overview by Type and Region
No ratings yet
Branch Data Overview by Type and Region
92 pages
Teacher's Guide For L 3
No ratings yet
Teacher's Guide For L 3
94 pages
FCSC Exam Past Questions & Answers PDF
No ratings yet
FCSC Exam Past Questions & Answers PDF
8 pages
Project Initiation in Management Cycle
No ratings yet
Project Initiation in Management Cycle
32 pages
ISI Indexed Journals
No ratings yet
ISI Indexed Journals
149 pages
QN Bank Project Report Guidelines
No ratings yet
QN Bank Project Report Guidelines
3 pages
Class 11 Computer Networking Full Question Bank
No ratings yet
Class 11 Computer Networking Full Question Bank
4 pages
B.Tech Mechanical Engineering Certificate
No ratings yet
B.Tech Mechanical Engineering Certificate
1 page
Is Cinema 4D Ideal for Animation?
No ratings yet
Is Cinema 4D Ideal for Animation?
2 pages
Master's Module Handbook: Electrical Engineering
No ratings yet
Master's Module Handbook: Electrical Engineering
167 pages
Ottoman Diy Plan
No ratings yet
Ottoman Diy Plan
4 pages
Electrical Engineer CV of Md. Mahmudur Rahman
No ratings yet
Electrical Engineer CV of Md. Mahmudur Rahman
2 pages
Crankshaft and Piston Parts List
No ratings yet
Crankshaft and Piston Parts List
2 pages

Database Indexing and Hashing Techniques

Uploaded by

Database Indexing and Hashing Techniques

Uploaded by

Indexing and Hashing

• Consider a hash table of size seven, with starting index

You might also like