0% found this document useful (0 votes)
15 views

Hashing

Uploaded by

Ankit Dahiya
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Hashing

Uploaded by

Ankit Dahiya
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Hashing

Hashing refers to the process of generating a fixed-size output from an input of variable size
using the mathematical formulas known as hash functions. This technique determines an index or
location for the storage of an item in a data structure.
Hashing in Data Structures refers to the process of transforming a given key to another value. It
involves mapping data to a specific index in a hash table using a hash function that enables fast
retrieval of information based on its key. The transformation of a key to the corresponding value
is done using a Hash Function and the value obtained from the hash function is called Hash Code
.

Need for Hash data structure


Every day, the data on the internet is increasing multifold and it is always a struggle to store this
data efficiently. In day-to-day programming, this amount of data might not be that big, but still, it
needs to be stored, accessed, and processed easily and efficiently. A very common data structure
that is used for such a purpose is the Array data structure.
Now the question arises if Array was already there, what was the need for a new data structure!
The answer to this is in the word ” efficiency “. Though storing in Array takes O(1) time,
searching in it takes at least O(log n) time. This time appears to be small, but for a large data set,
it can cause a lot of problems and this, in turn, makes the Array data structure inefficient.
So now we are looking for a data structure that can store the data and search in it in constant
time, i.e. in O(1) time. This is how Hashing data structure came into play. With the introduction
of the Hash data structure, it is now possible to easily store data in constant time and retrieve
them in constant time as well.
Components of Hashing
There are majorly three components of hashing:
1. Key: A Key can be anything string or integer which is fed as input in the hash
function the technique that determines an index or location for storage of an item in a
data structure.
2. Hash Function: The hash function receives the input key and returns the index of an
element in an array called a hash table. The index is known as the hash index .
3. Hash Table: Hash table is a data structure that maps keys to values using a special
function called a hash function. Hash stores the data in an associative manner in an
array where each data value has its own unique index.

What is a Hash function?


The hash function creates a mapping between key and value, this is done through the use of
mathematical formulas known as hash functions. The result of the hash function is referred to as
a hash value or hash. The hash value is a representation of the original string of characters but
usually smaller than the original.

Types of Hash functions:

There are many hash functions that use numeric or alphanumeric keys.
1. Division Method
The division method involves dividing the key by a prime number and using the remainder as the
hash value.
h(k)=k mod m

Where k is the key and 𝑚m is a prime number.


2. Multiplication Method
In the multiplication method, a constant 𝐴A (0 < A < 1) is used to multiply the key. The
fractional part of the product is then multiplied by 𝑚m to get the hash value.
h(k)=⌊m(kAmod1)⌋

Where ⌊ ⌋ denotes the floor function.

3. Mid-Square Method
In the mid-square method, the key is squared, and the middle digits of the result are taken as the
hash value.
Steps:
1. Square the key.
2. Extract the middle digits of the squared value.

4. Folding Method
The folding method involves dividing the key into equal parts, summing the parts, and then
taking the modulo with respect to 𝑚m.
Steps:
1. Divide the key into parts.
2. Sum the parts.
3. Take the modulo 𝑚m of the sum.

What is a Hash Collision?


A hash collision occurs when two different keys map to the same index in a hash table. This can
happen even with a good hash function, especially if the hash table is full or the keys are similar.
Causes of Hash Collisions:
● Poor Hash Function: A hash function that does not distribute keys evenly across the
hash table can lead to more collisions.
● High Load Factor: A high load factor (ratio of keys to hash table size) increases the
probability of collisions.
● Similar Keys: Keys that are similar in value or structure are more likely to collide.

Collision Resolution Techniques


There are two types of collision resolution techniques:
1. Open Addressing: In open addressing, all elements are stored in the hash table itself.
Each table entry contains either a record or NIL. When searching for an element, we
examine the table slots one by one until the desired element is found or it is clear that
the element is not in the table.
● Linear Probing: In linear probing, the hash table is searched sequentially that starts
from the original location of the hash. If the location that we get is already
occupied, then we check for the next location.
● Quadratic Probing: Quadratic probing is an open addressing scheme in computer
programming for resolving hash collisions in hash tables. Quadratic probing
operates by taking the original hash index and adding successive values of an
arbitrary quadratic polynomial until an open slot is found.
● Double Hashing: Double hashing is a collision resolving technique in Open
Addressed Hash tables. Double hashing make use of two hash function,

The first hash function is h1(k) which takes the key and gives out a location on the
hash table. But if the new location is not occupied or empty then we can easily
place our key.

But in case the location is occupied (collision) we will use secondary hash-function
h2(k) in combination with the first hash-function h1(k) to find the new location on
the hash table.

This combination of hash functions is of the form


h(k, i) = (h1(k) + i * h2(k)) % n
2. Closed Addressing: Closed Hashing, is a way of dealing with collisions, similar to the
Separate Chaining process.

● Chaining: Store colliding keys in a linked list or binary search tree at each index
● Cuckoo Hashing: Use multiple hash functions to distribute keys

Applications of Hash Data structure


● Hash is used in databases for indexing.
● Hash is used in disk-based data structures.
● In some programming languages like Python, JavaScript hash is used to implement
objects.

Advantages of Hash Data structure


● Hash provides better synchronization than other data structures.
● Hash tables are more efficient than search trees or other data structures
● Hash provides constant time for searching, insertion, and deletion operations on
average.
Disadvantages of Hash Data structure
● Hash is inefficient when there are many collisions.
● Hash collisions are practically not avoided for a large set of possible keys.
● Hash does not allow null values.

What is Static Hashing?


When a search key is specified in a static hash, the hashing algorithm always returns the same
address. For example, if you take the mod-4 hash function, only 5 values ​will be produced. For
this to work, the output address must always be the same. The number of buckets at any given
time is constant.
The data bucket address obtained by static hashing will always be the same. So, if we use the
mod(5) hash function to get the address of EmpId = 103, we always get the same data bucket
address 3. In this case, the data bucket position remains unchanged. Therefore, all existing data
buckets in memory remain unchanged while the entire hashing process remains the same. In this
case, there are five partitions in the memory used to hold data.
What is Dynamic Hashing in DBMS?
Dynamic hashing is a technique used to dynamically add and remove data buckets when
demanded. Dynamic hashing can be used to solve the problem like bucket overflow which can
occur in static hashing. In this method, the data bucket size grows or shrinks as the number of
records increases or decreases. This allows easy insertion or deletion into the database and
reduces performance issues.
Bucket Overflow: Bucket overflow occurs when memory address generated by the hash
function is already filled by some data records.

You might also like