Hashing in DBMS
Hashing in DBMS
In a huge database structure, it is very inefficient to search all the index values and reach
the desired data. Hashing technique is used to calculate the direct location of a data
record on the disk without using index structure.
In this technique, data is stored at the data blocks whose address is generated by using
the hashing function. The memory location where these records are stored is known as
data bucket or data blocks.
In this, a hash function can choose any of the column value to generate the address.
Most of the time, the hash function uses the primary key to generate the address of the
data block. A hash function is a simple mathematical function to any complex
mathematical function. We can even consider the primary key itself as the address of the
data block. That means each row whose address will be the same as a primary key
stored in the data block.
The above diagram shows data block addresses same as primary key value. This hash
function can also be a simple mathematical function like exponential, mod, cos, sin, etc.
Suppose we have mod (5) hash function to determine the address of the data block. In
this case, it applies mod (5) hash function on the primary keys and generates 3, 3, 1, 4
and 2 respectively, and records are stored in those data block addresses.
30M
694
OOPs Concepts in Java
Next
Stay
Types of Hashing:
o Static Hashing
o Dynamic Hashing
Static Hashing
In static hashing, the resultant data bucket address will always be the same. That means
if we generate an address for EMP_ID =103 using the hash function mod (5) then it will
always result in same bucket address 3. Here, there will be no change in the bucket
address.
Hence in this static hashing, the number of data buckets in memory remains constant
throughout. In this example, we will have five data buckets in the memory used to store
the data.
When a record needs to be searched, then the same hash function retrieves the address
of the bucket where the data is stored.
o Insert a Record
When a new record is inserted into the table, then we will generate an address for a new
record based on the hash key and record is stored in that location.
o Delete a Record
To delete a record, we will first fetch the record which is supposed to be deleted. Then
we will delete the records for that address in memory.
o Update a Record
To update a record, we will first search it using a hash function, and then the data record
is updated.
If we want to insert some new record into the file but the address of a data bucket
generated by the hash function is not empty, or data already exists in that address. This
situation in the static hashing is known as bucket overflow. This is a critical situation in
this method.
To overcome this situation, there are various methods. Some commonly used methods
are as follows:
1. Open Hashing
When a hash function generates an address at which data is already stored, then the
next bucket will be allocated to it. This mechanism is called as Linear Probing.
For example: Suppose R3 is a new address which needs to be inserted into the table,
the hash function generates address as 110 for it. But this bucket is full to store the new
data. In this case, a new bucket is inserted at the end of 110 buckets and is linked to it.
Dynamic Hashing
o The dynamic hashing method is used to overcome the problems of static hashing
like bucket overflow.
o In this method, data buckets grow or shrink as the records increases or decreases.
This method is also known as Extendable hashing method.
o This method makes hashing dynamic, i.e., it allows insertion or deletion without
resulting in poor performance.
For example:
Consider the following grouping of keys into buckets, depending on the prefix of their
hash address:
The last two bits of 2 and 4 are 00. So it will go into bucket B0. The last two bits of 5 and
6 are 01, so it will go into bucket B1. The last two bits of 1 and 3 are 10, so it will go into
bucket B2. The last two bits of 7 are 11, so it will go into B3.