Unit 4 PPT Merged
Unit 4 PPT Merged
UNIT-4
Topic : Trees
Classification of Data Structures
Introduction to Trees
• The study of trees in mathematics can be traced to Gustav Kirchhoff in the
middle nineteenth century and several years later to Arthur Cayley, who
used trees to study the structure of algebraic formulas.
• Cayley’s work undoubtedly laid the framework for Grace Hopper’s use of
trees in 1951 to represent arithmetic expressions.
• Hopper’s work bears a strong resemblance to today’s binary tree formats.
• Trees are used extensively in computer science to represent algebraic
formulas;
• as an efficient method for searching large, dynamic lists; and for such
diverse applications as artificial intelligence systems and encoding
algorithms.
General Trees
• Linear access time for linked lists too high.
• Solution – group data into trees.
• Trees – non linear data structure
• Used to represent data contains a hierarchical relationship among
elements example: family, record
• Worst Case time – O(log n)
• Tree can be defined in many ways, eg: recursively
General Trees
• A tree is a non-linear data structure.
• Tree is a non-linear data structure which organizes data in
hierarchical structure and this is a recursive definition.
• A tree is a collection of nodes connected by directed (or
undirected) edges.
• A tree can be empty with no nodes or a tree is a structure
consisting of one node called the root and zero or one or more
subtrees.
What is tree data structure ?
• The tree is a non linear data structure that consist of nodes and is connected by
edges.
• A tree data structure is a hierarchical structure that is used to represent and
organize data in a way that is easy to navigate and search.
• It is a collection of nodes that are connected by edges and has a hierarchical
relationship between the nodes.
Degree A – 2
Degree B – 3
Degree C – 0
Degree D – 0
Degree E – 0
Degree F – 0
Degree of entire tree - 3
Tree Terminology - Degree
• Total number of children of a node is called as DEGREE of that Node
Tree Terminology – Internal Node
• Node with minimum one child – internal node
• Also known as non – terminal nodes
Tree Terminology – Internal Node
Tree Terminology – Leaf Node
• Node with no child – Leaf node
• Also known as External nodes or Terminal nodes
Tree Terminology – Leaf Node
• the node which does not have a child is called as LEAF Node
Tree Terminology – Level
• Each step of tree is level number
• Staring with root as 0
• the root node is said to be at Level 0 and the children of root node
are at Level 1
Tree Terminology – Height
• Number of edges in longest path from the node to any leaf
• Height of any leaf node is 0
• Height of Tree = Height of Root node
Height A – 3
Height B – 2
Height D – 1
Height C,G,E,F – 0
Height of tree - 3
Tree Terminology – Height
• total number of edges from leaf node to a particular node in the
longest path
Tree Terminology – Depth
• Number of edges from root node to particular node is Depth
• Depth of root node is 0
• Depth of Tree = Depth of longest path from root to leaf
Depth A – 0
Depth B ,C – 1
Depth D, E, F – 2
Depth G – 3
Depth of tree - 3
Tree Terminology – Depth
• total number of edges from root node to a particular node
Tree Terminology – Subtree
• Each child of a tree forms a subtree recursively
Tree Terminology – Forest
• Set of disjoint trees
Tree Terminology - Path
• The sequence of Nodes and Edges from one node to another
node is called as PATH between that two Nodes.
• Length of a Path is total number of nodes in that path.
Tree Terminology - Path
• A path from node a1 to ak is defined as a sequence of nodes a1, a2, . . .
, ak such that ai is the parent of ai+1 for 1 ≤ i < k
Path A-G: A – B – D – G
Path A-G: A – B – D – G
A – Ancestor for G, B, D…
G – Descendant of D,B,A
Tree Representations
A tree data structure can be represented in two methods.
• List Representation
• Left Child - Right Sibling Representation
Tree Representations
List Representation
1. Two types of nodes one for representing the node with data
called 'data node' and another for representing only references
called 'reference node'.
2. start with a 'data node' from the root node in the tree. Then it
is linked to an internal node through a 'reference node' which is
further linked to any other node directly
Tree Representations
• List Representation
Tree Representations
Left Child - Right Sibling Representation
• A list with one type of node which consists of three fields namely
Data field, Left child reference field and Right sibling reference field.
• Data field stores the actual value of a node
• left reference field -> address of the left child
• right reference field -> address of the right sibling node.
Tree Representations
Left Child - Right Sibling Representation
• In this representation, every node's data field stores the actual value
of that node.
• If that node has left a child, then left reference field stores the
address of that left child node otherwise stores NULL.
• If that node has the right sibling, then right reference field stores the
address of right sibling node otherwise stores NULL.
Tree Representations
Left Child - Right Sibling Representation
Tree Representations
Left Child - Right Sibling Representation
Binary Tree
• A tree in which every node can have a maximum of two children is
called Binary Tree.
• In a binary tree, every node can have either 0 children or 1 child or 2
children but not more than 2 children.
Binary Tree
• A binary tree is a data structure specified as a set of node elements.
• The topmost element in a binary tree is called the root node, and each
node has 0, 1, or at most 2 kids.
Binary Tree
Properties of Binary Trees
• At each level of n, the maximum number of nodes is 2n
• Maximum number of nodes possible at height h is 2h+1 -1
• The minimum number of nodes possible at height h is h+1
Strict Binary Tree
• Each node must contain either 0 or 2 children.
• Also a tree in which each node must contain 2 children except the leaf
nodes. All nodes filled from left to right
Strict Binary Tree / full Binary tree
• A full Binary tree is a special type of binary tree in which every parent
node/internal node has either two or no children.
Complete Binary Tree
• Except last level, all the nodes need to completely filled
• All nodes filled from left to right
Complete Binary Tree
• The complete binary tree is a tree in which all the nodes are completely
filled except the last level.
• In the last level, all the nodes must be as left as possible. The nodes
should be added from the left.
• All nodes filled from left to right
Representation of Binary Tree
• Each node – three portion – Data portion, left child pointer, Right
child pointer
Linked List Representation of Tree
struct node
{
struct node *left;
int value;
struct node *right;
};
Linked List Representation of Tree
struct node printf("\nPress 1 for new node");
{ printf("Enter your choice : ");
int data; scanf("%d", &choice);
struct node *left, *right; if(choice==0)
} { return 0; }
void main() else
{ {
struct node *root; printf("Enter the data:");
root = create(); scanf("%d", &data);
} temp->data = data;
struct node *create() printf("Enter the left child of %d", data);
{ temp->left = create();
struct node *temp; printf("Enter the right child of %d", data);
int data; temp->right = create();
temp = (struct node *)malloc(sizeof(struct node)); return temp;
printf("Press 0 to exit"); } }
Linked List Representation of Tree
21CSC201J
DATA STRUCTURES AND
ALGORITHMS
UNIT-4
Topic : Tree Traversal
Tree Traversal
• Traversal – visiting all node only once
• Based on the order of visiting :
• In – order traversal
• Pre – order traversal
• Post – order traversal
Tree traversal
• The term 'tree traversal' means traversing or visiting each node of a
tree. Traversing can be performed in three ways
Pre-order traversal
In-order traversal
Post-order traversal
Pre-order traversal
Algorithm
Step 1 - Visit the root node
Step 2 - Traverse the left subtree recursively.
Step 3 - Traverse the right subtree recursively.
The output of the preorder traversal of the above tree is -
A→B→D→E→C→F→G
Tree TRAVERSAL
In-order traversal
• The first left subtree is visited after that root node is traversed, and
finally, the right subtree is traversed.
Algorithm
Step 1 - Traverse the left subtree recursively.
Step 2 - Visit the root node.
Step 3 - Traverse the right subtree recursively.
Result: A,
Pre-order Traversal
Algorithm: Preorder(Tree)
1. Repeat step 2 – 4 while Tree != Null
2. Write(Tree->Data)
3. Preorder(Tree->left)
4. Preorder(Tree->right)
5. End
Result: A,
Pre-order Traversal
Algorithm: Preorder(Tree)
1. Repeat step 2 – 4 while Tree != Null
2. Write(Tree->Data)
3. Preorder(Tree->left)
4. Preorder(Tree->right)
5. End
Result: A, B,
Pre-order Traversal
Algorithm: Preorder(Tree)
1. Repeat step 2 – 4 while Tree != Null
2. Write(Tree->Data)
3. Preorder(Tree->left)
4. Preorder(Tree->right)
5. End
Result: A, B,
Pre-order Traversal
Algorithm: Preorder(Tree)
1. Repeat step 2 – 4 while Tree != Null
2. Write(Tree->Data)
3. Preorder(Tree->left)
4. Preorder(Tree->right)
5. End
Result: A, B, D,
Pre-order Traversal
Algorithm: Preorder(Tree)
1. Repeat step 2 – 4 while Tree != Null
2. Write(Tree->Data)
3. Preorder(Tree->left)
4. Preorder(Tree->right)
5. End
Result: A, B, D,G,
Pre-order Traversal
Algorithm: Preorder(Tree)
1. Repeat step 2 – 4 while Tree != Null
2. Write(Tree->Data)
3. Preorder(Tree->left)
4. Preorder(Tree->right)
5. End
Result: A, B, D, G,
Pre-order Traversal
Algorithm: Preorder(Tree)
1. Repeat step 2 – 4 while Tree != Null
2. Write(Tree->Data)
3. Preorder(Tree->left)
4. Preorder(Tree->right)
5. End
Result: A, B, D, G, H,
Pre-order Traversal
Algorithm: Preorder(Tree)
1. Repeat step 2 – 4 while Tree != Null
2. Write(Tree->Data)
3. Preorder(Tree->left)
4. Preorder(Tree->right)
5. End
Result: A, B, D, G, H, L,
Pre-order Traversal
Algorithm: Preorder(Tree)
1. Repeat step 2 – 4 while Tree != Null
2. Write(Tree->Data)
3. Preorder(Tree->left)
4. Preorder(Tree->right)
5. End
Result: A, B, D, G, H, L, E,
Pre-order Traversal
Algorithm: Preorder(Tree)
1. Repeat step 2 – 4 while Tree != Null
2. Write(Tree->Data)
3. Preorder(Tree->left)
4. Preorder(Tree->right)
5. End
Result: A, B, D, G, H, L, E,
Pre-order Traversal
Algorithm: Preorder(Tree)
1. Repeat step 2 – 4 while Tree != Null
2. Write(Tree->Data)
3. Preorder(Tree->left)
4. Preorder(Tree->right)
5. End
Result: A, B, D, G, H, L, E, C,
Pre-order Traversal
Algorithm: Preorder(Tree)
1. Repeat step 2 – 4 while Tree != Null
2. Write(Tree->Data)
3. Preorder(Tree->left)
4. Preorder(Tree->right)
5. End
Result: A, B, D, G, H, L, E, C,
Pre-order Traversal
Algorithm: Preorder(Tree)
1. Repeat step 2 – 4 while Tree != Null
2. Write(Tree->Data)
3. Preorder(Tree->left)
4. Preorder(Tree->right)
5. End
Result: A, B, D, G, H, L, E, C, F,
Pre-order Traversal
Algorithm: Preorder(Tree)
1. Repeat step 2 – 4 while Tree != Null
2. Write(Tree->Data)
3. Preorder(Tree->left)
4. Preorder(Tree->right)
5. End
Result: A, B, D, G, H, L, E, C, F,
I,
Pre-order Traversal
Algorithm: Preorder(Tree)
1. Repeat step 2 – 4 while Tree != Null
2. Write(Tree->Data)
3. Preorder(Tree->left)
4. Preorder(Tree->right)
5. End
Result: A, B, D, G, H, L, E, C, F,
I, J,
Pre-order Traversal
Algorithm: Preorder(Tree)
1. Repeat step 2 – 4 while Tree != Null
2. Write(Tree->Data)
3. Preorder(Tree->left)
4. Preorder(Tree->right)
5. End
Result: A, B, D, G, H, L, E, C, F,
I, J, K
Pre-order Traversal
Algorithm: Preorder(Tree)
1. Repeat step 2 – 4 while Tree != Null
2. Write(Tree->Data)
3. Preorder(Tree->left)
4. Preorder(Tree->right)
5. End
Result: A, B, D, G, H, L, E, C, F,
I, J, K
Post– order Traversal
• Traverse left node, Traverse right node , visit root node
UNIT-4
Topic : BINARY SEARCH TREE
BINARY SEARCH TREE
• Binary Search Tree is a binary tree in which every node contains only smaller
values in its left sub tree and only larger values in its right sub tree.
• Also called as ORDERED binary tree
BST– properties:
• It should be Binary tree.
• Left subtree < Root Node < = Right subtree
(or)
Left subtree < =Root Node < Right subtree
BINARY SEARCH TREE
Binary search trees or not ?
• Operations: Searching, Insertion, Deletion of a Node
• TimeComplexity :
BEST CASE WORST CASE
Step 2: END
EXAMPLE : Inserting nodes with values 55 in the given binary search tree
UNIT-4
Topic : AVL TREE
AVL TREE
• An AVL tree is a balanced binary search tree.
• In AVL Tree balance factor of every node is either -1, 0 or +1.
4 8
1 5 7 11
2
operations on AVL tree
1.SEARCHING
2.INSERTION
3.DELETION
Search Operation in AVL Tree
ALGORITHM:
STEPS:
1 : Get the search element
2 : check search element == root node in the tree.
3 : If both are exact match, then display “element found" and end.
4 : If both are not matched, then check whether search element is < or > than that node
value.
5: If search element is < then continue search in left sub tree.
6: If search element is > then continue search in right sub tree.
7: Repeat the step from 1 to 6 until we find the exact element
8: Still the search element is not found after reaching the leaf node display “element not
found”.
INSERTION or DELETION
• After performing any operation on AVL tree - the balance factor of each node is to be checked.
• After insertion or deletion there exists either any one of the following:
Scenario 1:
• After insertion or deletion , the balance factor of each node is either 0 or 1 or -1.
• If so AVL tree is considered to be balanced.
• The operation ends.
Scenario 1:
• After insertion or deletion, the balance factor is not 0 or 1 or -1 for at least one node then
• The AVL tree is considered to be imbalanced.
• If so, Rotations are need to be performed to balance the tree in order to make it as AVL TREE.
Rotation
• After insertion / deletion balance factor of every node in the tree 0,1-1.
• Otherwise we must make it balanced.
• Whenever the tree becomes imbalanced due to any operation we
do rotation operations to make the tree balanced.
• Rotation is the process of moving nodes either to left or to right to make the tree
balanced.
AVL TREE ROTATION
LL ROTATION
new node is inserted in the left sub-tree of the left sub-tree of the node (
LL imbalanced)
LL ROTATION
When new node is inserted in the left sub-tree of the left sub-tree of the
critical
LL ROTATION- Example
RR ROTATION
new node is inserted in the right sub-tree of the right sub-tree of the node
(RR Imbalanced)
RR ROTATION
When new node is inserted in the right sub-tree of the right sub-tree of
the critical
RR Rotation - Example
LR ROTATION
When new node is inserted in the left sub-tree of the right sub-
tree of the node LR Imbalance
LR ROTATION
When new node is inserted in the left sub-tree of the right sub-
tree of the node
LR ROTATION
When new node is inserted in the left sub-tree of the right sub-
tree of the critical
LR Rotation - Example
RL ROTATION
When new node is inserted in the right sub-tree of the left sub-
tree of the node ( RL imbalance)
RL ROTATION
When new node is inserted in the right sub-tree of the left sub-
tree of the node ( RL imbalance)
RL ROTATION
When new node is inserted in the right sub-tree of the left sub-
tree of the node ( RL imbalance)
RL ROTATION
When new node is inserted in the right sub-tree of the left sub-
tree of the critical
RL Rotation - Example
AVL TREE CONSTRUCTION / NODE INSERTION - Example
Construct an AVL tree by inserting the following elements in the given order 63, 9, 19, 27, 18, 108,
99, 81.
AVL TREE CONSTRUCTION / NODE INSERTION - Example
In an organization, 10 employees joined with their ID 50,20, 60, 10, 8, 15, 32, 46, 11, 48. Kindly insert
them one by one but follow the Height balance while constructing the tree. Write the Steps to follow the
insertion and solve it stepwise
AVL TREE – NODE DELETION - Example
• Delete nodes 52, 36, and 61 from the AVL tree given
Construct AVL Tree
• 35, 15,5,20,25,17,45
21CSC201J
DATA STRUCTURES AND
ALGORITHMS
UNIT-4
Topic : B TREE
B-Trees
• A B-tree is called as an m-way tree. Its order is m.
B-Tree of order 3
B-Trees - Examples
B-Tree of order 4
Searching in a B-Tree
• Similar to searching in a binary search tree.
• Consider the B-Tree shown here.
• If we wish to search for 72 in this tree, first consider
the root, the search key is greater than the values in
the root node. So go to the right sub-tree.
• The right sub tree consists of two values and again 72 is greater than 63 so traverse to right sub tree of
63.
• The right sub tree consists of two values 72 and 81. So we found our value 72.
Insertion in a B-Tree
• Insertions are performed at the leaf level.
• Search the B-Tree to find suitable place to insert the new element.
• If the leaf node is not full, the new element can be inserted in the leaf
level.
• If the leaf is full,
• insert the new element in order into the existing set of keys.
• split the node at its median into two nodes.
• push the median element up to its parent’s node. If the parent’s node is
already full, then split the parent node by following the same steps.
Insertion - Example
• Consider the B-Tree of order 5
UNIT-4
Topic : Heap
Heaps
• A heap is a complete binary tree in which each node can have utmost
two children.
• Heap is a balanced binary tree where the root-node key is compared
with its children and arranged accordingly.
• Min-Heap − the value of the root node is less than or equal to either
of its children
• Max-Heap − the value of the root node is greater than or equal to
either of its children.
Heaps
• A heap is a binary tree whose left and right subtrees have values less
than their parents.
• The root of a maxheap is guaranteed to hold the largest node in the
tree; its subtrees contain data that have lesser values.
• Unlike the binary search tree, however, the lesser-valued nodes of a
heap can be placed on either the right or the left subtree.
• Heaps have another interesting facet: they are often implemented in
an array rather than a linked list.
Heaps
• Max-Heap − the value of the root node is greater than or equal to
either of its children.
Heaps : Max-Heap Construction/Insertion
• Step 1 − Create a new node at the end of heap.
• Step 2 − Assign new value to the node.
• Step 3 − Compare the value of this child node with its parent.
• Step 4 − If value of parent is less than child, then swap them.
• Step 5 − Repeat step 3 & 4 until Heap property holds.
Heaps : Max-Heap Construction/Insertion
• Construct heap data structure: 35, 33, 42, 10, 14, 19, 27, 44, 26, 31
Heaps : Max-Heap Deletion
• Step 1 − Remove root node.
• Step 2 − Move the last element of last level to root.
• Step 3 − Compare the value of this child node with its parent.
• Step 4 − If value of parent is less than child, then swap them.
• Step 5 − Repeat step 3 & 4 until Heap property holds.
Heaps : Max-Heap Deletion
• In Deletion in the heap tree, the root node is always deleted and it is
replaced with the last element.
Definition
• To implement the insert and delete operations, we need two basic algorithms:
reheap up and reheap down.
Reheap Up
• The reheap up operation reorders a “broken” heap by floating the
last element up the tree until it is in its correct location in the heap.
Heap Implementation
• Although a heap can be built in a dynamic tree structure, it is most often
implemented in an array. This implementation is possible because the heap is, by
definition, complete or nearly complete. Therefore, the relationship
• between a node and its children is fixed and can be calculated as shown below.
• 1. For a node located at index i, its children are found at:
• a. Left child: 2i + 1
• b. Right child: 2i + 2
• 2. The parent of a node located at index i is located at [(i – 1) / 2]
• 3. Given the index for a left child, j, its right sibling, if any, is found at j + 1.
• Conversely, given the index for a right child, k, its left sibling, which must exist, is
found at k – 1
• 4. Given the size, n, of a complete heap, the location of the first
leaf is [(n / 2)]
• Given the location of the first leaf element, the location of the
last non leaf
• element is one less.
parent= (i-1)/2
parent of 40, i=4
parent=3/2=1= 75
A heap can be implemented in an
array because it must be a complete
or nearly complete binary tree, which
allows a fixed relationship between
each node and its children.
Max Heap Construction Algorithm
Example :max heap. Insert a new node with
value 85.
• Step 1 - Insert the newNode with value 85
as last leaf from left to right. That means • Step 2 - Compare newNode value (85) with
newNode is added as a right child of node with
value 75. After adding max heap is as follows... its Parent node value (75). That means 85 > 75
• Step3 - Here newNode value (85) is • Step 4 - Now, again compare new Node
greater than its parent value (75), value (85) with its parent node value (89).
then swap both of them. After
swapping, max heap is as follows...
• Here, newNode value (85) is smaller than its parent node value
(89). So, we stop insertion process. Finally, max heap after
insertion of a new node with value 85 is as follows...
Max Heap Deletion Algorithm
Problem Example:
• The value “7” at the root of the tree is less than both of its children,
the nodes containing the value “8” and “9”. We need to swap the “7”
with the largest child, the node containing the value “9”.
Heap sort
• Heaps can be used in sorting an array.
• In max-heaps, maximum element will always be at the root. Heap Sort
uses this property of heap to sort the array.
• Consider an array Arr which is to be sorted using Heap Sort.
• 1. Initially build a max heap of elements in Arr.
• 2. The root element, that is Arr[1], will contain maximum element of Arr.
After that, swap this element with the last element of Arr and heapify
the max heap excluding the last element which is already in its correct
position and then decrease the length of heap by one.
• 3. Repeat the step 2, until all the elements are in their correct position.
Heap sort-complexity
Hashing
Hashing is the process of mapping large amount of data item to smaller table with
the help of hashing function. Hashing is also known as Hashing Algorithm or Message Digest
Function.
Hashing Mechanism
In hashing, an array data structure called as Hash table is used to store the data
items.
Based on the hash key value, data items are inserted into the hash table.
Hash Key Value (index)
Hash key value is a special value that serves as an index for a data item. It
indicates where the data item should be stored in the hash table. Hash key value is generated
using a hash function.
Hash Table
Hash table or hash map is a data structure used to store key-value pairs. It is a collection
of items stored to make it easy to find them later. It uses a hash function to compute an index
into an array of buckets or slots from which the desired value can be found.
2
Data Item Value % No. of Slots Hash Value
26 26 % 10 = 6 6
70 70 % 10 = 0 0
18 18 % 10 = 8 8
31 31 % 10 = 1 1
54 54 % 10 = 4 4
93 93 % 10 = 3 3
3
Figure 4: Collision Handling
Keys: 5, 28, 19, 15, 20, 33, 12, 17, 10
HT slots: 9
hash function = h(k) = k % 9
h(5) = 5 % 9 = 5
h(28) = 28 % 9 = 1
h(19) = 19 % 9 = 1
h(15) = 15 % 9 = 6
h(20) = 20 % 9 = 2
h(33) = 33 % 9 = 6
h(12) = 12 % 9 = 3
h(17) = 17 % 9 = 8
h(10) = 10 % 9 = 1
Collision resolution techniques
There are two types of collision resolution techniques.
1. Separate chaining (open hashing)
2. Open addressing (closed hashing)
4
Figure 5: Techniques in Collision Resolution
Separate
chaining
In this technique, a linked list is created from the slot in which collision has occurred, after
which the new key is inserted into the linked list. This linked list of slots looks like a chain, so it
is called separate chaining. It is used more when we do not know how many keys to insert or
delete.
Problem-
Using the hash function ‘key mod 7’, insert the following sequence of keys in the hash table-
50, 700, 76, 85, 92, 73 and 101
Use separate chaining technique for collision resolution.
Solution-
The given sequence of keys will be inserted in the hash table as-
Step-01:
Draw an empty hash table.
For the given hash function, the possible range of hash values is [0, 6].
So, draw an empty hash table consisting of 7 buckets as-
5
Figure 6: Empty Hash Table
Step-02:
Insert the given keys in the hash table one by one.
The first key to be inserted in the hash table = 50.
Bucket of the hash table to which key 50 maps = 50 mod 7 = 1.
So, key 50 will be inserted in bucket-1 of the hash table as-
Figure 7: Insert 50
Step-03:
The next key to be inserted in the hash table = 700.
Bucket of the hash table to which key 700 maps = 700 mod 7 = 0.
So, key 700 will be inserted in bucket-0 of the hash table as-
6
Figure 8: Insert 700
Step-04:
The next key to be inserted in the hash table = 76.
Bucket of the hash table to which key 76 maps = 76 mod 7 = 6.
So, key 76 will be inserted in bucket-6 of the hash table as-
Figure 9: Insert 76
Step-05:
The next key to be inserted in the hash table = 85.
Bucket of the hash table to which key 85 maps = 85 mod 7 = 1.
Since bucket-1 is already occupied, so collision occurs.
Separate chaining handles the collision by creating a linked list to bucket-1.
So, key 85 will be inserted in bucket-1 of the hash table as-
7
Figure 10: Insert 85
Step-06:
The next key to be inserted in the hash table = 92.
Bucket of the hash table to which key 92 maps = 92 mod 7 = 1.
Since bucket-1 is already occupied, so collision occurs.
Separate chaining handles the collision by creating a linked list to bucket-1.
So, key 92 will be inserted in bucket-1 of the hash table as-
Step-07:
The next key to be inserted in the hash table = 73.
Bucket of the hash table to which key 73 maps = 73 mod 7 = 3.
So, key 73 will be inserted in bucket-3 of the hash table as-
8
Step-08:
The next key to be inserted in the hash table = 101.
Bucket of the hash table to which key 101 maps = 101 mod 7 = 3.
Since bucket-3 is already occupied, so collision occurs.
Separate chaining handles the collision by creating a linked list to bucket-3.
So, key 101 will be inserted in bucket-3 of the hash table as-
9
Step-01:
Draw an empty hash table.
For the given hash function, the possible range of hash values is [0, 6].
So, draw an empty hash table consisting of 7 buckets as-
Step-02:
Insert the given keys in the hash table one by one.
The first key to be inserted in the hash table = 50.
Bucket of the hash table to which key 50 maps = 50 mod 7 = 1.
So, key 50 will be inserted in bucket-1 of the hash table as-
10
Figure 16: Insert 700
Step-04:
The next key to be inserted in the hash table = 76.
Bucket of the hash table to which key 76 maps = 76 mod 7 = 6.
So, key 76 will be inserted in bucket-6 of the hash table as-
11
Figure 18: Insert 85
Step-06:
The next key to be inserted in the hash table = 92.
Bucket of the hash table to which key 92 maps = 92 mod 7 = 1.
Since bucket-1 is already occupied, so collision occurs.
To handle the collision, linear probing technique keeps probing linearly until an empty
bucket is found.
The first empty bucket is bucket-3.
So, key 92 will be inserted in bucket-3 of the hash table as-
12
Figure 20: Insert 73
Step-08:
The next key to be inserted in the hash table = 101.
Bucket of the hash table to which key 101 maps = 101 mod 7 = 3.
Since bucket-3 is already occupied, so collision occurs.
To handle the collision, linear probing technique keeps probing linearly until an empty
bucket is found.
The first empty bucket is bucket-5.
So, key 101 will be inserted in bucket-5 of the hash table as-
13
UNIT -4
COLLISION
the hash function is a function that returns the key value using which the record can be placed in
the hash table. Thus this function helps us in placing the record in the hash table at appropriate
position and due to this we can retrieve the record directly from that location. This function need
to be designed very carefully and it should not return the same hash key address for two different
records. This is an undesirable situation in hashing.
Definition: The situation in which the hash function returns the same hash key (home bucket) for
more than one record is called collision and two same hash keys returned for different records is
called synonym.
Similarly when there is no room for a new pair in the hash table then such a situation is
called overflow. Sometimes when we handle collision it may lead to overflow conditions.
Collision and overflow show the poor hash functions.
For example, 0
1 131
Consider a hash function. 2
3 43
H(key) = recordkey%10 having the hash table size of 10 4 44
5
The record keys to be placed are 6 36
7 57
131, 44, 43, 78, 19, 36, 57 and 77 8 78
131%10=1 9 19
44%10=4
43%10=3
78%10=8
19%10=9
36%10=6
57%10=7
77%10=7
Now if we try to place 77 in the hash table then we get the hash key to be 7 and at index 7 already
the record key 57 is placed. This situation is called collision. From the index 7 if we look for next
vacant position at subsequent indices 8.9 then we find that there is no room to place 77 in the hash
table. This situation is called overflow.
104
UNIT -4
CHAINING
In collision handling method chaining is a concept which introduces an additional field with data
i.e. chain. A separate chain table is maintained for colliding data. When collision occurs then a
linked list(chain) is maintained at the home bucket.
For eg;
Here D = 10
0
1 131 21 61 NULL
3 NULL
131 61 NULL
7 97 NULL
A chain is maintained for colliding elements. for instance 131 has a home bucket (key) 1.
similarly key 21 and 61 demand for home bucket 1. Hence a chain is maintained at index 1.
For example:
105
UNIT -4
H(key) = 131 % 10
=1
Index 1 will be the home bucket for 131. Continuing in this fashion we will place 4, 8, 7.
Now the next key to be inserted is 21. According to the hash function
H(key)=21%10
H(key) = 1
But the index 1 location is already occupied by 131 i.e. collision occurs. To resolve this collision
we will linearly move down and at the next empty location we will prob the element. Therefore
21 will be placed at the index 2. If the next element is 5 then we get the home bucket for 5 as
index 5 and this bucket is empty so we will put the element 5 at index 5.
106
UNIT -4
The next record key is 9. According to decision hash function it demands for the home bucket 9.
Hence we will place 9 at index 9. Now the next final record key 29 and it hashes a key 9. But
home bucket 9 is already occupied. And there is no next empty bucket as the table size is limited
to index 9. The overflow occurs. To handle it we move back to bucket 0 and is the location over
there is empty 29 will be placed at 0th index.
Problem with linear probing:
One major problem with linear probing is primary clustering. Primary clustering is a process in
which a block of data is formed in the hash table when collision is resolved.
Key
39
19%10 = 9 cluster is formed
18%10 = 8 29
39%10 = 9 8
29%10 = 9
8%10 = 8
18
QUADRATIC PROBING: 19
Quadratic probing operates by taking the original hash value and adding successive values of an
arbitrary quadratic polynomial to the starting value. This method uses following formula.
for eg; If we have to insert following elements in the hash table with table size 10:
Consider i = 0 then
(17 + 02) % 10 = 7
107
UNIT -4
(17 + 12) % 10 = 8, when i =1
H1(37) = 37 % 10 = 7
H1(90) = 90 % 10 = 0 37
H1(45) = 45 % 10 = 5
H1(22) = 22 % 10 = 2
H1(49) = 49 % 10 = 9 49
108
UNIT -4
Now if 17 to be inserted then
Key
H1(17) = 17 % 10 = 7 90
H2(key) = M – (key % M)
17
Here M is prime number smaller than the size of the table. Prime number 22
smaller than table size 10 is 7
Hence M = 7
45
H2(17) = 7-(17 % 7)
=7–3=4
37
That means we have to insert the element 17 at 4 places from 37. In short we ha v e to take 4
jumps. Therefore the 17 will be placed at index 1.
49
Now to insert number 55
Key
H1(55) = 55 % 10 =5 Collision
90
H2(55) = 7-(55 % 7) 17
=7–6=1 22
That means we have to take one jump from index 5 to place 55.
Finally the hash table will be -
45
55
37
49
Comparison of Quadratic Probing & Double Hashing
The double hashing requires another hash function whose probing efficiency is same as
some another hash function required when handling random collision.
The double hashing is more complex to implement than quadratic probing. The quadratic
probing is fast technique than double hashing.
REHASHING
Rehashing is a technique in which the table is resized, i.e., the size of table is doubled by creating
a new table. It is preferable is the total size of table is a prime number. There are situations in
which the rehashing is required.
109
UNIT -4
In such situations, we have to transfer entries from old table to the new table by re computing
their positions using hash functions.
Consider we have to insert the elements 37, 90, 55, 22, 17, 49, and 87. the table size is 10 and will
use hash function.,
37 % 10 = 7
90 % 10= 0
55 % 10 = 5
22 % 10 = 2
17 % 10 = 7 Collision solved by linear probing
49 % 10 = 9
Now this table is almost full and if we try to insert more elements collisions will occur and
eventually further insertions will fail. Hence we will rehash by doubling the table size. The old
table size is 10 then we should double this size for new table, that becomes 20. But 20 is not a
prime number, we will prefer to make the table size as 23. And new hash function will be
Advantages:
110
UNIT -4
1. This technique provides the programmer a flexibility to enlarge the table size if required.
2. Only the space gets doubled with simple hash function which avoids occurrence of
collisions.
EXTENSIBLE HASHING
Extensible hashing is a technique which handles a large amount of data. The data to be
placed in the hash table is by extracting certain number of bits.
Extensible hashing grow and shrink similar to B-trees.
In extensible hashing referring the size of directory the elements are to be placed in
buckets. The levels are indicated in parenthesis.
0 1
Levels
(0) (1)
001 111
data to be
010
placed in bucket
The bucket can hold the data of its global depth. If data in bucket is more than global
depth then, split the bucket and double the directory.
Consider we have to insert 1, 4, 5, 7, 8, 10. Assume each page can hold 2 data entries (2 is the
depth).
Step 1: Insert 1, 4
1 = 001
0
4 = 100
(0)
We will examine last bit
001
of data and insert the data
010
in bucket.
111