0% found this document useful (0 votes)
19 views

Graph 1

Uploaded by

amitjangir972
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Graph 1

Uploaded by

amitjangir972
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 64

DATA

STRUCTURES
Graphs
Graphs:
• Basic Concepts
• Different representations of Graphs
• Graph Traversals(BFS & DFS)
• Minimum Spanning Tree (Prim & Kruskal algorithms)
• Dijkstra’s shortest path algorithms
• Hashing(Hash function, Address calculation techniques, Common Hashing functions)
• Collision resolution: Linear and Quadratic probing
• Double hashing
Graph:
It is non linear data structure. It can be defined as group of vertices and edges that are used to connect these
vertices. In a graph the relationship is less restricted i.e. the relationship is from many parents to many children.
A graph can be seen as a cyclic tree, where the vertices (Nodes) maintain any complex relationship among them
instead of having parent child relationship.

A graph G can be defined as an ordered set G(V, E) where V represents the set of vertices and E represents the
set of edges which are used to connect these vertices.

A Graph G(V, E) with 5 vertices (A, B, C, D, E) and six edges ((A,B), (B,C), (C,E), (E,D), (D,B), (D,A)) is shown in the
following fig:

B C

D E
A Graph can be of two types:
1. Undirected Graph
2. Directed Graph

Undirected Graph:
If the pair of vertices are unordered then the graph G is called an undirected graph. If an edge exists between
vertex A and B then the vertices can be traversed from B to A as well as A to B.

A B

Undirected Graph:
If the pair of vertices are ordered then the graph G is called an directed graph. Edges represent a specific path
from some vertex A to another vertex B. Node A is called initial node while node B is called terminal node.
B C
A B

D
Path:
A path can be defined as the sequence of nodes that are followed in order to reach some terminal
node V from the initial node U.

Cycle:
A cycle can be defined as the path which has no repeated edges or vertices except the first and last
vertices.

Connected Graph:
A connected graph is the one in which some path exists between every two vertices (u, v) in V.
There are no isolated nodes in connected graph.
Weighted Graph:
In a weighted graph, each edge is assigned with some data such as length or weight. The weight of
an edge e can be given as w(e) which must be a positive (+) value indicating the cost of traversing
the edge.
5
B C
2
2
A 1

4
D E
3
Graph Representation:
A graph can be represented in two ways:
1. Sequential representation (Adjacency Matrix)
2. Linked representation

Sequential representation(Adjacency Matrix):


In sequential representation, we use adjacency matrix to store the mapping represented by vertices and edges. In
adjacency matrix, the rows and columns are represented by the graph vertices. A graph having n vertices, will
have a dimension n x n.

An entry Aij in the adjacency matrix representation of an undirected graph G will be 1 if there exists an edge
between Vi and Vj.
A B C D E
B C A 0 1 0 1 0

B 1 0 1 1 0
A
C 0 1 0 0 1
D E
D 1 1 0 0 1

E 0 0 1 1 0
Linked Representation:
In the linked representation, an adjacency list is used to store the Graph into the computer's memory.

B C
A B D

A B A D C

C B E
D E
D A B E

E D C

An adjacency list is maintained for each node present in the graph which stores the node value and a pointer to
the next adjacent node to the respective node. If all the adjacent nodes are traversed then store the NULL in the
pointer field of last node of the list.
Traversing a Graph:
Traversing the graph means visiting all the nodes and vertices of the graph. There are two standard methods by
using which, we can traverse the graphs.
• Breadth First Search
• Depth First Search

Breadth First Search(BFS) :


It is a graph traversal algorithm in which we take any node as a starting node(root node) and explores all the
neighboring nodes. Then, it selects the nearest node and explore all the untraversed nodes. The algorithm
follows similar approach for all other adjacent nodes.

Steps:
1. Define a Queue of size total number of vertices in the graph
2. Select any vertex as starting point (root) for traversal. Visit the vertex and insert it into the queue.
3. Visit all adjacent vertices of the vertex which is at the front of queue which is not visited (traversed) and
insert them into the queue
4. When there is no new vertex to visit from the vertex at front of queue then delete that vertex from queue.
5. Repeat steps 3 and 4 until queue becomes empty.
Consider graph given below:

A B C

D E F

Step 1: Select vertex A as starting vertex : Visit A


Insert A into the Queue
A
A

Step 2: Visit all adjacent vertices of A which are not visited (i.e. Visit D E B)
Insert newly visited vertices into Queue and delete A from Queue

A B B E D

D E
Step 3: Visit all adjacent vertices of B which are not visited A, E, C ( A and E are already visited so visit C)
Delete B from Queue

A B C E DC

A B C
D E
G

D E F

Step 4: Visit all adjacent vertices of E which are not visited D, A, B, C, F( D, A, B and C are already visited so
visit F)
Delete E from Queue

A B C DC F

D E F
Step 5: Repeat steps 3 and 4 until Queue becomes empty.

A B C

G
Queue
D E F

Final sequence of Traversal: A B E D C F G


Depth First Search (DFS) Algorithm:
Depth first search (DFS) algorithm starts with the initial node of the graph G, and then goes to deeper and
deeper until we find the goal node or the node which has no children. The algorithm, then backtracks from the
dead end towards the most recent node that is yet to be completely unexplored.
The data structure which is being used in DFS is stack. The process is similar to BFS algorithm. In DFS, the edges
that leads to an unvisited node are called discovery edges while the edges that leads to an already visited node
are called block edges.

Steps:
1. Define a stack of size total number of vertices in the graph.
2. Select any vertex as starting vertex. Visit the vertex and push it onto stack.
3. Visit any one of adjacent vertex of vertex which is at top of stack which is not visited and
push it onto stack
4. Repeat step 3 until there are no new vertex to be visit from the vertex top of stack
5. When there is no new vertex to visit then backtracking and pop one vertex from stack
6. Repeat steps 3, 4 and 5 until stack is empty
Consider graph given below:

A B C

D E F

Step 1: Select vertex A is starting point (Visit A)


Push A onto stack

A A

Step 2: Visit any adjacent vertex of A which is not visited (Suppose we choose B, Visit B)
Push B onto stack

A B

B
A
Step 3: Visit any adjacent vertex of B which is not visited (Suppose we choose C, Visit C)
Push C onto stack

A B C

C
B
A

Step 4: Visit any adjacent vertex of C which is not visited (Suppose we choose G, Visit G)
Push G onto stack

A B C
G
G
C
B
A
Step 5: Visit any adjacent vertex of G which is not visited (Suppose we choose F, Visit F)
Push F onto stack

A B C
F
G
G
F C
B
A

Step 6: Visit any adjacent vertex of F which is not visited (Suppose we choose E, Visit E)
Push E onto stack

A B C
E
G F
E F G
C
B
A
Step 7: Visit any adjacent vertex of E which is not visited (Suppose we choose D, Visit D)
Push D onto stack
D
A B C
E
G F
D E F G
C
B
A
Step 8: Visit any adjacent vertex of D which is not visited (All its adjacent vertices are visited, so we backtrack)
Pop D from stack

E
F
G
C
B
A
Step 9: E is at top of stack. Visit any adjacent vertex of E which is not visited (All its adjacent vertices are visited)
Pop E from stack

D E
F
G
C
B
A

The process continues until stack becomes empty.


D E F G C B A
Now reverse the order:
A B C G F E D (Final sequence of traversal)

Stack is empty
void BFS(int v)
void insert_queue(int vertex)
{ {
int i; if(rear == MAX-1)
insert_queue(v); printf("Queue Overflow\n");
state[v] = waiting; else
while(!isEmpty_queue()) {
{ if(front == -1)
v = delete_queue( ); front = 0;
printf("%d ",v); rear = rear+1;
queue[rear] = vertex ;
state[v] = visited;
}
for(i=0; i<n; i++) }
{
if(adj[v][i] == 1 && state[i] == initial)
{
insert_queue(i);
state[i] = waiting;
}
}
}
printf("\n");
}
int isEmpty_queue()
{
if(front == -1 || front > rear)
return 1;
else
return 0;
}

int delete_queue()
{
int delete_item;
if(front == -1 || front > rear)
{
printf("Queue Underflow\n");
exit(1);
}
delete_item = queue[front];
front = front+1;
return delete_item;
}
#include<conio.h>
void bfs(int,int); A
void dfs(int,int);
int adj[10][10],visited[10]; B D
void main() C
{ E F
int i,j,n,s,ch;
clrscr(); G
printf("\nEnter number of vertices:");
scanf("%d",&n);
1
printf("\nEnter adjacency matrix:");
for(i=1;i<=n;i++) 2 4
{
for(j=1;j<=n;j++) 3
5 6
{
scanf("%d",&adj[i][j]);
7
}
}
Adjacency Matrix:
1 A 2 B 3 C 4 D 5 E 6 F 7 G
1 A 0 1 1 1 0 0 0
2 B 1 0 0 0 1 0 0
3 C 1 0 0 0 0 0 1
4 D 1 0 0 0 0 1 0
5 E 0 1 0 0 0 0 1
6 F 0 0 0 1 0 0 1
7 G 0 0 1 0 1 1 0
printf("\nAdjacency matrix:\n"); switch(ch)
for(i=1;i<=n;i++) {
{ case 1:
for(j=1;j<=n;j++) bfs(s,n);
{ break;
printf("%d ",adj[i][j]); case 2:
} dfs(s,n);
printf("\n"); break;
} }
for(i=1;i<=n;i++) }
visited[i]=0;
printf("\nEnter source vertex:");
scanf("%d",&s);
printf("\nEnter your choice\n 1. BFS \n2. DFS\n");
scanf("%d",&ch);
/*bfs*/ for(i=1;i<=n;i++)
void bfs(int s,int n) {
{ if(adj[s][i]==1 && visited[i]==0)
int i,f=-1,r=-1,q[10]; {
printf("\nBFS traversal: "); printf(" %d",i);
printf("\n%d",s); visited[i]=1;
visited[s]=1; r++;
r++; q[r]=i;
f++; }
q[r]=s; }
while(f<=r) }
{ }
s=q[f];
f++;
/*dfs*/ for(i=n;i>=1;i--)
void dfs(int s,int n) {
{ if(adj[pop][i]==1 && visited[i]==0)
int i,st[10],top=-1,pop,j,k; {
printf("\nDFS traversal:"); top++;
top++; st[top]=i;
st[top]=s; }
while(top>=0) }
{ }
pop=st[top]; }
top--;
if(visited[pop]==0)
{
printf("%d ",pop);
visited[pop]=1;
}
else
continue;
Spanning Tree:
A spanning tree of a graph is a sub-graph that contains all the vertices and is a tree. Spanning tree of a connected
graph G contains all the vertices and has the edges which connects all the vertices. In other words, Spanning tree
is a non-cyclic sub-graph of a connected and undirected graph G that connects all the vertices together. A graph G
can have multiple spanning trees.

Minimum Spanning Tree:


For a connected graph G create a spanning tree T such that the sum of weights of edges in T is as small as
possible. Such a tree is called Minimum Spanning Tree(MST).
There can be weights assigned to every edge in a weighted graph. However, A minimum spanning tree is a
spanning tree which has minimal total weight. In other words, minimum spanning tree is the one which contains
the least weight among all other spanning tree of some particular graph.
MST represents a cheaper way of connecting all the nodes in G.

Famous techniques for MST:


1. Prims’s algorithm
2. Kruskal’s algorithm
Prim's Algorithm:
Prim's Algorithm is used to find the minimum spanning tree from a graph. Prim's algorithm
finds the subset of edges that includes every vertex of the graph such that the sum of the
weights of the edges can be minimized.
Prim's algorithm starts with the single node and explore all the adjacent nodes with all the
connecting edges at every step. The edges with the minimal weights causing no cycles in the
graph got selected.
Prim’s algorithm has the property that the edges in always forms a single tree. Begin with
some vertex U in the graph G. in each iteration we choose the minimum weight edge (U,V)
connecting a vertex V in the set A to the vertex U outside to set A. Then vertex U is bought
into A. This process is repeated until a spanning tree is formed.

Algorithm
1. Select a starting vertex
2. Repeat Steps 3 and 4 until there are fringe vertices
3. Select an edge e connecting the tree vertex and fringe vertex that has minimum weight
4.Add the selected edge and the vertex to the minimum spanning tree T
[END OF LOOP]
5. EXIT
e.g.
Construct a minimum spanning tree of the graph given in the following figure by using prim's algorithm

10 3
B C A
4 2
6
D E
1
Solution
1. Choose a starting vertex B.
2. Add the vertices that are adjacent to B. The edges that connecting the vertices are shown by dotted lines.
3. Choose the edge with the minimum weight among all. i.e. BD and add it to MST. Add the adjacent vertices of
D i.e. C and E.
4. Choose the edge with the minimum weight among all. In this case, the edges DE add it to MST. Add the
adjacent of E i.e. C.
5. Choose the edge with the minimum weight i.e. CA. We can't choose CE as it would cause cycle in the graph.

The graph produces in the step 4 is the minimum spanning tree of the graph shown in the above figure.
The cost of MST will be calculated as;
cost(MST) = 4 + 2 + 1 + 3 = 10 units.
10
B C 10
B C
4
4
B D
D
Step 1 Step 2

10
B C 10
B C
4 2 4 2
D E
1 D E
1

Step 3
10
B C 10
B C
4 2
6 4 2
D E
1 D E
1
Step 4

10 3
B C A 3
B C A
4 2
6 4 2
D E
1 D E
1

Step 5

Cost: 4+1+2+3=10
/*program for the implementation of prims algorithm*/
#include<stdio.h> printf("\nCost Adjacency matrix:\n");
#include<conio.h> for(i=1;i<=n;i++)
{
int i,j,a,b,u,v,n,ne=1;
for(j=1;j<=n;j++)
int min,mincost=0,cost[10][10],visited[10]={0}; {
void prims(); printf("%d ",cost[i][j]);
void main() }
{ printf("\n");
clrscr(); }
printf("\nEnter number of vertices:"); prims();
scanf("%d",&n); getch();
printf("\nEnter cost adjacency matrix:"); }
for(i=1;i<=n;i++)
{
for(j=1;j<=n;j++)
{
scanf("%d",&cost[i][j]);
if(cost[i][j]==0)
cost[i][j]=999;
}
}
/*prims function*/ if(visited[u]==0 || visited[v]==0)
void prims() {
{ printf("\n%d edge (%d,%d)=%d",ne++,a,b,min);
visited[1]=1; mincost+=min;
while(ne<n) visited[b]=1;
{ }
for(i=1,min=999;i<=n;i++) cost[a][b]=cost[b][a]=999;
for(j=1;j<=n;j++) }
if(cost[i][j]<min) printf("\nMinimum cost=%d",mincost);
if(visited[i]!=0) }
{
min=cost[i][j];
a=u=i;
b=v=j;
}
Kruskal's Algorithm:
Kruskal's Algorithm is used to find the minimum spanning tree for a connected weighted graph. The main target of
the algorithm is to find the subset of edges by using which, we can traverse every vertex of the graph. Kruskal's
algorithm follows greedy approach which finds an optimum solution at every stage instead of focusing on a global
optimum.
The algorithm creates a forest of trees. The forest consists of n-single nodes trees and no edges. At each step, we
add one (cheapest one) edge , so that it joins two trees together.

Algorithm
1. Create a forest in such a way that each graph is a separate tree.
2. Create a priority queue Q that contains all the edges of the graph.
3. Repeat Steps 4 and 5 while Q is NOT EMPTY
4. Remove an edge from Q
5. IF the edge obtained in Step 4 connects two different trees, then Add it to the forest (for combining two trees
into one tree).
ELSE
Discard the edge
6. END
e.g.
Apply the Kruskal's algorithm on the graph given as follows:

5
E
A
1 10
7 2
B C D
3 4

Solution:
The weight of the edges given as :

Edge AE AD AC AB BC CD DE
Weight 5 10 7 1 3 4 2
Sort the edges according to their weights.
Edge AB DE BC CD AE AC AD
Weight 1 2 3 4 5 7 10

Start constructing the tree


A
1. Add AB to the MST
1

2. Add DE to the MST


A
E
1

2
B
D
Edge AB DE BC CD AE AC AD
Weight 1 2 3 4 5 7 10

3. Add BC to the MST

A E
1
2
B C D
3

Edge AB DE BC CD AE AC AD
Weight 1 2 3 4 5 7 10
4. Add CD to the MST
A E
1
2
B C D
3 4

Edge AB DE BC CD AE AC AD
Weight 1 2 3 4 5 7 10

• The next step is to add AE, but we can't add that as it will cause a cycle.
• The next edge to be added is AC, but it can't be added as it will cause a cycle.
• The next edge to be added is AD, but it can't be added as it will contain a cycle.

Hence, the final MST is the one which is shown in the step 4
The cost of MST = 1 + 2 + 3 + 4 = 10.
/*program for the implementation of kruskals algorithm*/ printf("\nCost Adjacency matrix:\n");
#include<stdio.h> for(i=1;i<=n;i++)
#include<conio.h> {
int i,j,k,a,b,u,v,n,ne=1; for(j=1;j<=n;j++)
int min,mincost=0,cost[10][10],parent[10]; {
int find(int); printf("%d ",cost[i][j]);
void kruskal(); }
int uni(int,int); printf("\n");
void main() }
{ kruskal();
printf("\nEnter number of vertices:"); getch();
scanf("%d",&n); }
printf("\nEnter cost adjacency matrix:");
for(i=1;i<=n;i++)
{
for(j=1;j<=n;j++)
{
scanf("%d",&cost[i][j]);
if(cost[i][j]==0)
cost[i][j]=999;
}
}
/*kruskal function*/ u=find(u);
void kruskal() v=find(v);
{ if(uni(u,v))
while(ne<n) {
{ printf("\n%d edge (%d,%d)=%d",ne++,a,b,min);
for(i=1,min=999;i<=n;i++) mincost+=min;
{ }
for(j=1;j<=n;j++) cost[a][b]=cost[b][a]=999;
{ }
if(cost[i][j]<min) printf("\nMinimum cost=%d",mincost);
{ }
min=cost[i][j];
a=u=i;
b=v=j;
}
}
}
/*find function*/
int find(int i)
{
while(parent[i])
i=parent[i];
return i;
}

/*union function*/
int uni(int i,int j)
{
if(i!=j)
{
parent[j]=i;
return 1;
}
return 0;
}
Shortest Path Problem:
SPP is the problem of finding a path between two vertices ( source and destination) such that the sum of weights
of edges is minimum.

Dijkstra’s Algorithm:
It is an algorithm for finding shortest path between nodes in a graph. It is a single shortest path problem where
all edges have non-negative weights. It is similar to Prim’s algorithm.

Algorithm Steps:
• Set all vertices distances = infinity except for the source vertex, set the source distance = 0.
• Push the source vertex in a min-priority queue in the form (distance , vertex), as the comparison in the min-
priority queue will be according to vertices distances.
• Pop the vertex with the minimum distance from the priority queue (at first the popped vertex = source).
• Update the distances of the connected vertices to the popped vertex in case of "current vertex distance +
edge weight < next vertex distance", then push the vertex
with the new distance to the priority queue.
• If the popped vertex is visited before, just continue without using it.
• Apply the same algorithm again until the priority queue is empty.
Consider the graph given below:

8 7
B C D
4 9
2
11 I 4 14 E
A
7
6
8 H 10
G F
1 2

Suppose set S is initially empty and distances assigned to vertices are:

A B C D E F G H I
0 ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞

Now pick vertex with minimum distance. So vertex A is picked

S={A}

A
Update distance values of adjacent vertices of A
Adjacent of A is B and H
The distance values are 4 and 8
A B C D E F G H I
0 4 ∞ ∞ ∞ ∞ ∞ 8 ∞

B
4

A
8 H

Now pick the vertex with minimum distance and not included in S. the vertex is B and added to S={A,B}

B
4
A

8 H

Update distance values of adjacent vertices of B.


The adjacent of B is C and distance becomes 12

B 8
C
4
A

8 H

A B C D E F G H I
0 4 12 ∞ ∞ ∞ ∞ 8 ∞

Pick vertex with minimum distance and not in S, vertex H is picked.


S={A,B,H}
A B C D E F G H I
0 4 12 ∞ ∞ ∞ ∞ 8 ∞
Update distance values of adjacent vertices of H i.e. G and I and distance 9 and 15

B 8
C A B C D E F G H I
4
0 4 12 ∞ ∞ ∞ 9 8 15
A I
7
8
H G
1

Pick vertex with minimum distance weight i.e. G and not in S


S={A,B,H,G}

B 8
C A B C D E F G H I
4
0 4 12 ∞ ∞ ∞ 9 8 15
A I
7
8
H G
1
Update distance values of adjacent vertices of G i.e. F and I and distance 11 and 15
B 8 A B C D E F G H I
C
4 0 4 12 ∞ ∞ 11 9 8 15
I
A 7
6
8
H G F
1 2

Pick vertex with minimum distance weight i.e. F and not in S

S={A,B,H,G,F}

8
B C
A B C D E F G H I
4
I 0 4 12 ∞ ∞ 11 9 8 15
A 7 6
8
H G F
1 2
Update distance values of adjacent vertices of F i.e. C, D and E and distance 15, 25 and 21
A B C D E F G H I
B 8 D
C
4
0 4 12 25 21 11 9 8 15
I 4
A 14 E
7
6
10
8
H G F
1 2
Pick vertex with minimum distance weight i.e. C (through B) and not in S

S={A,B,H,G,F,C}
8 A B C D E F G H I
B C D
4
0 4 12 25 21 11 9 8 15
14 E
I
A 7 6 10
8
H G F
1 2
Update distance values of adjacent vertices of C i.e. D and I and distance 19 and 14
8 7 A B C D E F G H I
B C D
4 2 0 4 12 19 21 11 9 8 14
I 14 E
A 7 6 10
8
H G F
1 2

Pick vertex with minimum distance weight i.e. I (through C) and not in S

S={A,B,H,G,F,C,I}
8 7
B C D A B C D E F G H I
4 2
14 E 0 4 12 19 21 11 9 8 14
A I
10
8
H G F
1 2
Pick vertex with minimum distance weight i.e. D(through C) and not in S
S={A,B,H,G,F,C,I,D}
A B C D E F G H I
8 7
B C D 0 4 12 19 21 11 9 8 14
4
E
A I
10
8
H G F
1 2
Update distance values of adjacent vertices of D i.e. E and distance 28 (Through F distance is 21) , so we choose
through F
8 7 D
B C
4
E
A I
10
8
H G F
1 2
Hashing:
Hashing is a technique that is used to uniquely identify an object from a group of objects.
• It is used to map a given value with a particular key for faster access of elements.
• It is a technique to convert a range of key values into a range of indexes of an array.
• Hashing allows to update and retrieve data in constant time { O(1) }.
Constant time means the operation does not depend on the size of data.
• Hashing is used with database to retrieve elements more quickly.
• It is used in encryption and decryption of digital signatures.

e.g.
In universities each student is assigned a unique roll number that can be used to retrieve
information about them.

In libraries , each book is assigned a unique number that can be used to determine
information about book.

So students and books were hashed to a unique number.


In hashing large keys are converted into small keys by using hash function. The values are then
stored in a data structure called hash table. The idea of hashing is to describe entries
(key/value) uniformly across an array.

Each element is assigned a key (converted key). By using that key we can access the element in
O(1) time.

Using the key , the algorithm (hash function) computes an index that finds where an entry can
be found or inserted.

Hashing is a technique to convert a range of key values into a range of indexes of array. Item
are in the (key, value) format.
Key-1
Index Value
0 Value-1
Key-2 Hash
function 1 Value-2
2 Value-3
Key-3 3 Value-4

(1,20) (2,70) (42,80) (4,25) (12,44) (14,32) (17,11)

S. No. Key Hash Array Index


1 1 1%20=1 1
2 2 2%20=2 2
3 42 42%20=2 2
4 4 4%20=4 4
5 12 12%20=12 12
6 14 14%20=14 14
7 17 17%20=17 17
Hash Function:
It is a function that can be used to map a data set to an arbitrary size to a data set of fixed size
which falls into the hash table. The values returned by the hash function are called hash
values, hash codes.

To achieve a good hashing mechanism , it is important to have a good function with following
requirements:
1. Easy to compute: It should be easy to compute
2. Uniform distribution: It should provide a uniform distribution across the hash table and
not result in clustering.
3. Less collisions: collisions occur when pair of elements are mapped to the same hash
value. These should be avoided.

*Good hash function minimizes collisions by spreading the elements uniformly throughout
the array.
There are various types of hash function which are used to place data in a hash table.
1. Division method: In this hash function is dependent upon remainder of division.
h(key)=key mod m
1
2 52
e.g. If key=52 and table size is 10
3
h(52)=52%10=2 44
4
h(68)=68%10=8
5
h(44)=44%10=4
6
7
8 68

2. Mid Square method: In this method key is squared and then mid par of result is taken as
the index.
e.g. consider 3101 as key
3101 * 3101 = 9616201
h(3101)=162(mid 3 digit)
3. Folding method: In this key is divided into separate parts, add them and get the hash key
(hash address)
e.g. key=12465512
Divide key into three parts: 124, 655 and 12
h(12465512)=124+655+12=791

Characteristics of a good hash function:


1. The hash function should generate different hash values for the similar string.
2. The hash function is easy to understand and simple to compute.
3. The hash function should produce the keys which will be distributed uniformly over the
array.
4. A number of collisions should be less while placing data in hash table.

Hash function for strings: strings are also used as keys. It can be alphabetic or alphanumeric.
Each character has ASCII value that can be used for calculating hash key value and that value
can be used with modules operation for mapping with hash table.
Suppose key is “abcdef” and size is 97.
abcdef=a+b+c+d+e+f
=97+98+99+100+101+102
=597
H(abcdef)=597%97=12
So key “abcdef” mapped in 12th position in hash table.

Second method:
Multiply each character ASCII value by 127 and add all values then do modules operation and
table size is 997.

“abcd”= a*127+b*127+c*127+d*127
= 97*127 + 98*127 +99*127 + 100*127
= 50038

H(abcd)=50038 % 997
=188
Hash Table:
A hash table is a data structure that is used to store keys/value pairs. It stores data in an
associative manner.
A hash table consists of an array in which data is accessed via a special index called a key.
The primary idea behind a hash table is to establish a mapping between set of all possible keys
and position in the array using a hash function.
Access of data becomes very fast if we know the index of the desired data.
The average time required to search for an element in hash table is O(1).
Insertion of data in a table is based on a key value. Hence entry in the hash table is defined
with some key.

Application of Hash table:


1. Database system: To require efficient random access, hash table provide a way to locate
data in a constant amount of time.
2. Data dictionaries: Data structures that support adding, deleting and searching for data.
Operations of hash table and data dictionary are similar, other data structures may be used
to implement data dictionaries.
Collision Resolution Techniques:
Suppose we want to add a new record R with key K to file F, but suppose memory H(K) is
already occupied. This situation is called collision. That is collision occurs when more than one
key map to some hash value in the hash table.

1. Linear probing(Open addressing): In open addressing all records are stored in array itself.
When anew record has to be inserted, the hash index of the hashed value is computed
and then the array is examined (starting with hash index). If the hash index is unoccupied
then the record is inserted in slot otherwise it provides in probe sequence until it finds an
unoccupied slot.
The probe sequence is the sequence that is followed while traversing through entries.
In different probe sequences we have different intervals between successive entry slots or
probes. The name “open addressing” refers to the fact that the location or address of item
is not determined by its hash values.

Linear Probing: It is an open addressing in which interval between successive probes is


fixed (usually 1)
index = index % tablesize
index = (index+1) % tablesize
index = (index+2) % tablesize

Hash collision is resolved with linear probing.


h(k,i) = (h’(k)+i) mod m
h’(k) = k mod m
i=0, 1, 2, 3 ----------m-1
m is size of table

e.g. Insert key 26, 37, 59, 76 into a hash table of size m=11 using linear probing.
h’(k) = k mod m 0 1 2 3 4 6 7 8 9 10
1. Insert 26
T
i=0 h(26,0) = (h’(k)+i) mod m
h’(k) = k mod m
= 26 mod 11 =4
h(26,0) = (4 + 0) mod 11 = 4
0 1 2 3 4 5 6 7 8 9
T(4) is free , insert 26 at T(4)
T 26

2. Insert 37:
i=0 h(37,0) = (h’(k)+i) mod m
h’(k) = k mod m
= 37 mod 11 = 4
h(37,0) = (4 + 0) mod 11 = 4
T(4) is not free, then next probe is computed.

i=1 h(37,1) = (h’(k)+i) mod m 0 1 2 3 4 5 6 7 8 9


h’(k) = k mod m T 26 37
= 37 mod 11 = 4
h(37,1) = (4 + 1) mod 11 = 5
T(5) is free, insert 37 at T(5)
Quadrating Probing:
It is similar to linear probing and the only difference is the interval between successive
probes or entry slots. When the slot at hashed index for an entry record is already occupied,
we must start traversing until we find an unoccupied slot.

index = index % tablesize


index = (index+12) % tablesize
index = (index+22) % tablesize
index = (index+32) % tablesize

Quadratic probing uses a hash function of the form:


h(k,i) = (h’(k) + c1i + c2i2) mod m
h’ is an auxiliary hash function
c1 and c2 are auxiliary constants
i=0,1,2,3,4,-----m-1
The method is better than linear probing, but to make full use of hash table the values of c1
and c2 are constrained.
e.g.
Insert 76, 26, 37, 59 and 21 into a hash table of size m=11 using quadrating probing with c1 =1
and c2=3
h(k,i) = (h’(k) + c1i + c2i2) mod m
0 1 2 3 4 5 6 7 8 9 10

1. Insert 76
i=0 h(76,0) = (76 mod 11 + c1* 0 + c2 * 0) mod 11
= (10 + 0 + 0) mod 11
= 10 0 1 2 3 4 5 6 7 8 9 10
T(10) is free to insert 76
T
2. Insert 26
i=0 h(26,0) = (26 mod 11 + c1* 0 + c2 * 0) mod 11
= (4 + 0 + 0) mod 11
=4 0 1 2 3 4 5 6 7 8 9 10
T(4) is free to insert 26 76
T

3. Insert 37
i=0 h(37,0) = (37 mod 11 + c1* 0 + c2 * 0) mod 11
= (4 + 0 + 0) mod 11
=4
T(4) is occupied
Next probe sequence (i=1)
i=1 h(37,1) = (37 mod 11 + c1* 1 + c2 * 1) mod 11
= (4 + 1 + 3) mod 11
0 1 2 3 4 5 6 7 8 9 10
=8
26 37 76
T(8) is free to insert T
Double Hashing:
It is also an open addressing and it is similar to linear probing the only difference is the
interval between two successive probes. Here the interval between probes is computed by
using two hash functions.
Double hashing uses a hash function of the form:
h(K,i) = ( h1(k) + I * h2(k)) mod m
h1 and are two auxiliary hash functions
m is the size of hash table
h1(k) = k mod m
h2(k) = k mod m1
m1 is slightly less than m (say m-1 or m-2)

Double hashing represents a improvement over linear or quadratic probing each (h1(k),h2(k))
pair yields a distinct probe sequence.
h1(k) and h2(k) may vary independently.
As a result performance of double hashing appears to be very close to the performance of the
“ideal” scheme of uniform hashing.
e.g.
Insert 76, 26, 37, 59 and 21 into a hash table of size m=11 using double hashing
Consider h1(k) = k mod 11
h2(k) = k mod 9

1. Insert 76
h1(76) = 76 mod 11 = 10
h2(76) = 76 mod 9 = 4
h(76,0) = (10+0 * 4 ) mod 11
= 10 mod 11
= 10
T(10) is free to insert

You might also like