3.
4 Hashing
Hashing is a key to address transformation technique.
The hash table or hash map is a data structure that associates key with value.
The implementation of hash tables is frequently called hashing.
Hashing is used for faster access of elements and records from the collection of tables and files.
Hashing is a technique used for performing insertions, deletions and search operations in constant
average time of (1).
Hashing is applied where the array size is large and time taken for searching an element is more.
It works by transforming the key using a hash function into a hash, a member which is used to
index into an array to locate the desired location where the values should be.
Hash table supports the efficient addition of new entries and the time spent searching for the
required data is independent of the number of items stored.
3.4.1 Hash table:
In hashing an ideal hash table data structure is nearly an array of fixed size, containing the key.
The key is a string or integer associate with a value. Each key is mapped in to some number in the
range 0 to TableSize -1 and placed in the appropriate cell.
The mapping is called hash function, which should be simple to compute and should ensure that
any two distinct keys get different cells.
0
1
2
3 A
4 B
5
6 C
7
8
9
3.4.2 Hash function:
Hash function is a key to address transformation which acts upon a given key to compute the
relative position of the key in an array.
The mapping of each key into some number ranged from 0 to TableSize-1 is known as Hashing.
Ideally, the hash function is used to determine the location of any record given its key value.
The hash function should have the following properties
Simple to compute
Must distribute the data evenly.
Generates lower number of collisions.
Reduce storage requirement.
Hashing is a method to transform a key to an address. The transformation involves application of a
function to the key value, as shown below.
Key Hash
Address
Function
The hash function in other words, maps a key value to an address in the table and the domain of
the hash function is non-negative integers in the range 0 to the size of the table.
If x is the key and h is the hash function, then h(x) is the address for the key to be stored.
Simple Hash function:
Consider a simple hash function, h(x) = x mod 10. The following table shows list of key values and
the corresponding hash function values.
Key (x) Hash Function Value h(x)
10 0
25 5
33 3
47 7
64 4
88 8
39 9
The following figure shows the Storage of the key values in the above table using the hash function
value as index.
Key (x) Hash Function Value
h(x) 0 10
1
2
3 33
4 64
5 25
6
7 47
8 88
9 39
Suppose we want to store valu 25, we need to apply the hash function
first. In this case, it is 25 mod 10 = 5 . 5 is used as the address for storing
25
The simple hash function is HashValue = key % TableSize.
Routine for simple Hash Function
Typedef unsigned int index;
index Hash (const int*key, int TableSize)
{
index HashVal =0;
While(*key !=’\0’)
HashVal+=*key++;
return HashVal % TableSize ;
}
For instant suppose the table size is 10007. The given input key can have maximum of 8 character
are few characters long.
Since a char as an integer value (i.e) always atmost 127 (i.e) ASCII code can represent in 7 bits
therefore 27=128.
The hash function can 1st assume the values between 0 to 1016 (i.e) 127x8. Hence it cannot provide
equal distribution.
Routine for Second Hash function:
Typedef unsigned int index;
index Hash (const char*key, int TableSize)
{
return (key[0]+27*key[1]+729*key[2])% TableSize;
}
In this hash function we can assume at least 2 characters + null terminator. The value 27
represents number of English alphabets 26+a blank space. Therefore 729 is 272.
This function examines only the first 3 char, if these are random and the table size is 10007, (i.e)
English is not random. Therefore 263=17576 possible combination of 3 char ignoring blank space.
Routine for Third Hash function:
Typedef unsigned int index;
index Hash (const char*key,int TableSize)
{
index HashVal=0;
While(*key=’\0’)
HashVal=(Hash<<5)+*key++;
return HashVal % TableSize;
}
3.4.3 Collision:
When a memory location filled if another value of the same memory location comes then there
occurs collision.
When an element is inserted it hashes to the same value as an already inserted element, then it
produces collision and need to be resolved.
Collision resolving methods:
Separate chaining (or) External
Hashing Open addressing (or) Closed
Hashing
1. Separate chaining (or) external hashing:
Separate chaining is a collision resolution technique, in which we can keep the list of all elements
that hash to same value. This is called as separate chaining because each hash table element is a
separate chain (linked list).
Each link list contains the entire element whose keys hash to the same
index. Collision diagram:
10 0
1
Key (x) Hash Function Value h(x) 2
10 10 % 10 = 0 33 3
25 25 % 10 = 5 4
33 33 % 10 = 3 25 5
47 47 % 10 = 7 6
65 65 % 10 = 5 47 7
83 83 % 10 = 3 8
30 30 % 10 = 0 9
Separate chaining is an open hashing technique. Hash Table
Each linked list contain all the elements whose keys to the same
index.
A pointer field is added to each record location.
When an overflow occurs this pointer is set to point the overflow blocks making a linked list.
In this method, the table can never overflow, since the linked lists are only extended upon the
arrival of new keys.
The following figure shows separate chaining hash table.
0 10 30
1
2
3 83
33
4
5 25 65
6
7
47
8
9
List_Node Structure is the same as the linked list declaration
Hash_Table Structure contains an array of linked lists, which are dynamically allocated when the table is
initialized.
The_List Pointer to a pointer to a List_Node structure.
Type declaration for open hash table
typedef struct List_Node *Node_Ptr;
struct List_Node
{
Element_Type element;
Node_Ptr next;
};
typedef Node_Ptr List;
typedef Node_Ptr Position;
/* LIST *the_list will be an array of lists, allocated later */
/* The lists will use headers, allocated later */
struct Hash_Tbl
{
unsigned int Table_Size;
List *The_Lists;
};
typedef struct Hash_Tbl *Hash_Table;
Initialization routine for open hash table
Hash_Table Initialize_Table( unsigned int Table_Size )
{
Hash_Table H;
int i;
if(Table_Size < MIN_TABLE_SIZE )
{
error("Table size too small");
return NULL;
}
/* Allocate table */
H = (Hash_Table) malloc ( sizeof (struct Hash_Tbl) );
if( H == NULL )
fatal_error("Out of space!!!");
H->Table_Size = next_prime(Table_Size );
/* Allocate list pointers */
H->The_Lists = (position *) malloc( sizeof (LIST) * H-> Table_Size); if(
H->The_Lists == NULL )
fatal_error("Out of space!!!");
/* Allocate list headers */
for(i=0; i<H->Table_Size; i++ )
{
H->The_Lists[i] = (LIST) malloc( sizeof (struct List_Node) ); if(
H->The_Lists[i] == NULL )
fatal_error("Out of space!!!");
else
H->The_Lists[i]->next = NULL;
}
return H;
}
Find
Function:
1. Use the hash function to determine which list to traverse
2. Traverse the list in normal manner.
3. Return the position where the item is found.
4. The call Find(Key, H) will return a pointer to the cell containing key.
5. If Element_Type is a string, comparison and assignment must be done with
strcmp & strcpy respectively.
Find routine for open hash table
Position Find( Element_Type Key, Hash_Table H )
{
Position p;
List L;
L = H->The_Lists[ Hash( Key, H->Table_Size) ];
P = L->next;
while( (P != NULL) && (P->element != Key) )
/* Probably need strcmp!! */
P = P->next;
return P;
}
Insert Function:
1. Go to the position by the hash function for the item X.
2. Traverse the list to see if X exists already.
3. It not, insert a new node at the rear of the list.
Insert routine for open hash table
Void Insert( Element_Type Key, Hash_Table H )
{
Position Pos, New_Cell;
List L;
Pos = Find( Key, H );
if( Pos == NULL )
{
New_Cell = (Position) malloc(sizeof(struct List_Node)); if(
New_Cell == NULL )
fatal_error("Out of space!!!");
else
{
L = H->The_Lists[ Hash( Key, H->Table Size ) ];
New_Cell->next = L->next;
New_Cell->element = Key; /* Probably need strcpy!! */
L->next = New_Cell;
}
}
}
Advantages:
i) More number of elements can be inserted as it uses array of linked list.
ii) The elements having the same memory address will be in the same chain and hence leads to
faster searching.
iii) Doesn’t require a prior knowledge of the number of elements that are to be stored in the hash
table (i.e.) dynamic allocation is done.
iv) It helps to get a uniform and perfect collision resolution hashing.
Disadvantages:
i) The elements are not evenly distributed. Some hash index may have more elements and some
may not have anything.
ii) It requires pointers which require more memory space. This leads to slow the algorithm down a
bit because of the time required to allocate the new cells, and also essentially requires the
implementation of a second data structure
2. Open addressing or Closed Hashing:
Open addressing hashing is an alternating technique for resolving collisions with linked list. In this
system if a collision occurs, alternative cells are tried until an empty cell is found.
The cell h0(x), h1(x), h2(x)……. Are tried in succession, where hi(x) = (Hash (X) + F(i)) mod Table_Size
with F(0) = 0.
The function F is the collision resolution strategy. This technique is generally used where storage
space is large.
Arrays are used here as hash tables.
Definition: The technique of finding the availability of another suitable empty location in the hash table
when the calculated hash address is already occupied is known as open Addressing. There are three
common collisions resolving strategic.
1. Linear probing
2. Quadratic Probing
3. Double hashing
Linear probing:
Probing is the process of a placing in next available empty position in the hash table. The Linear
probing method searches for next suitable position in a linear manner(next by next). Since this
method searches for the empty position in a linear way, this method is referred as linear probing.
In linear probing for the ith probe, the position to be tried is, (h(k) + i) mod Table_Size, where ‘f’ is a
linear function of i, F(i)=i. This amounts to trying cells sequentially in search of an empty cell.
Example for Linear Probing:
Insert the keys 89, 18, 49, 58, 69 into a hash table using in the same hash function as before and
the collision resolving strategies. F(i)=i.
Solution:
In this e.g initially 89 is inserted at index ‘9’. Then 18 is inserted at index 8.
The first collision occurs when 49 is inserted. It is put in the next available index namely ‘0’ which is
open.
The key 58 collides with 18, 89, 49 afterwards it found an empty cell at the index 1.
Similarly collision 69 is handled.
If the table is big enough, a free cell can be always be found, but the time to do so can get quite
large.
Even if the table is relatively empty, blocks of occupied cells start forming. This is known as primary
clustering means that any key hashes into the cluster will require several attempts to resolve the
collision and then it will add to the cluster.
Empty Table After 89 After 18 After 49 After 58 After 69
0 49 49 49
1 58 58
2 69
3
4
5
6
7
8 18 18 18 18
9 89 89 89 89 89
Algorithm for linear probing:
1. Apply hash function on the key value and get the address of the
location.
2. If the location is free, then
i) Store the key value at this location, else
ii) Check the remaining locations of the table one after the
other till an empty location is reached. Wrap around on
the table can be used. When we reach the end of the
table, start looking again from the beginning.
iii) Store the key in this empty location.
3. End
Advantages of linear probing:
1. It does not require pointers.
2. It is very simpler to implement.
Disadvantages of linear probing:
1. It forms clusters, which degrades the performance of the hash table for
sorting and retrieving data.
2. If any collision occur when the hash table becomes half full, it is difficult to
find an empty location in the hash table and hence the insertion process
takes a longer time.