Data Structures-Sorting and Searching, Hashing
Data Structures-Sorting and Searching, Hashing
Sorting, Searching
Sorting:
Sorting refers to arranging data in a particular format. Sorting algorithm specifies the way to
arrange data in a particular order. Most common orders are in numerical or lexicographical order.
The importance of sorting lies in the fact that data searching can be optimized to a very high
level, if data is stored in a sorted manner. Sorting is also used to represent data in more readable
formats. Following are some of the examples of sorting in real-life scenarios.
Telephone Directory the telephone directory stores the telephone numbers of people
sorted by their names, so that the names can be searched easily.
Dictionary the dictionary stores words in an alphabetical order so that searching of any
word becomes easy.
There are many types of Sorting techniques, differentiated by their efficiency and space
requirements. Following are some sorting techniques which we will be covering in next sections.
1. Bubble Sort
2. Insertion Sort
3. Selection Sort
4. Shell Sort
5. Quick Sort
6. Merge Sort
7. Radix Sort
Bubble Sort
Bubble Sort is an algorithm which is used to sort N elements that are given in a memory for eg:
an Array with N number of elements. Bubble Sort compares the entire element one by one and
sort them based on their values. It is also known as Sinking Sort.
It is called Bubble sort, because with each iteration the largest element in the list bubbles up
towards the last place, just like a water bubble rises up to the water surface.
Sorting takes place by stepping through all the data items one-by-one in pairs and comparing
adjacent data items and swapping each pair that is out of order.
In this case, value 33 is greater than 14, so it is already in sorted locations. Next, we compare 33
with 27.
We find that 27 is smaller than 33 and these two values must be swapped.
Next we compare 33 and 35. We find that both are in already sorted positions.
We know then that 10 is smaller 35. Hence they are not sorted.
We swap these values. We find that we have reached the end of the array. After one iteration, the
array should look like this
To be precise, we are now showing how an array should look like after each iteration. After the
second iteration, it should look like this
Notice that after each iteration, at least one value moves at the end.
And when there's no swap required, bubble sorts learns that an array is completely sorted.
Algorithm
We assume list is an array of n elements. We further assume that swap function swaps the values
of the given array elements.
begin BubbleSort(list)
return list
end BubbleSort
Program:
#include <iostream>
int main()
{
int n, i;
cout<<"\nEnter the number of data element to be sorted: ";
cin>>n;
int arr[n];
for(i = 0; i < n; i++)
{
cout<<"Enter element "<<i+1<<": ";
cin>>arr[i];
}
BubbleSort(arr, n);
return 0;
Insertion Sort
It is a simple Sorting algorithm which sorts the array by shifting elements one by one. Following
are some of the important characteristics of Insertion Sort.
2. It is efficient for smaller data sets, but very inefficient for larger lists.
3. Insertion Sort is adaptive, that means it reduces its total number of steps if given a
partially sorted list, hence it increases its efficiency.
4. It is better than Selection Sort and Bubble Sort algorithms.
5. Its space complexity is less. Like Bubble Sorting, insertion sort also requires a single
additional memory space.
6. It is a Stable sorting, as it does not change the relative order of elements with equal keys
It finds that both 14 and 33 are already in ascending order. For now, 14 is in sorted sub-list.
It swaps 33 with 27. It also checks with all the elements of sorted sub-list. Here we see that the
sorted sub-list has only one element 14, and 27 is greater than 14. Hence, the sorted sub-list
remains sorted after swapping.
By now we have 14 and 27 in the sorted sub-list. Next, it compares 33 with 10.
So we swap them.
We swap them again. By the end of third iteration, we have a sorted sub-list of 4 items.
This process goes on until all the unsorted values are covered in a sorted sub-list.
#include <stdlib.h>
#include <iostream.h>
int main() {
int array[5]= {5,4,3,2,1};
insertionSort(array,5);
return 0;
}
Selection Sort
Selection sorting is conceptually the most simplest sorting algorithm. This algorithm first finds
the smallest element in the array and exchanges it with the element in the first position, then find
the second smallest element and exchange it with the element in the second position, and
continues in this way until the entire array is sorted.
Note: Selection sort is an unstable sort i.e it might change the occurrence of two similar elements
in the list while sorting. But it can also be a stable sort when it is implemented using linked list
data structures.
How Selection Sort Works?
For the first position in the sorted list, the whole list is scanned sequentially. The first position
where 14 is stored presently, we search the whole list and find that 10 is the lowest value.
So we replace 14 with 10. After one iteration 10, which happens to be the minimum value in the
list, appears in the first position of the sorted list.
For the second position, where 33 is residing, we start scanning the rest of the list in a linear
manner.
We find that 14 is the second lowest value in the list and it should appear at the second place. We
swap these values.
After two iterations, two least values are positioned at the beginning in a sorted manner.
The same process is applied to the rest of the items in the array.
#include <iostream>
using namespace std;
int main()
{
int n, i;
cout<<"\nEnter the number of data element to be sorted: ";
cin>>n;
int arr[n];
for(i = 0; i < n; i++)
{
cout<<"Enter element "<<i+1<<": ";
cin>>arr[i];
}
SelectionSort(arr, n);
return 0;
}
Quick Sort
Quick Sort, as the name suggests, sorts any list very quickly. Quick sort is not a stable search,
but it is very fast and requires very less additional space. It is based on the rule of Divide and
Conquer(also called partition-exchange sort). This algorithm divides the list into three main
parts :
1. Elements less than the Pivot element
2. Pivot element(Central element)
3. Elements greater than the pivot element
In the list of elements, mentioned in below example, we have taken 25 as pivot. So after the first
pass, the list will be changed like this.
6 8 17 14 25 63 37 52
Hence after the first pass, pivot will be set at its position, with all the elements smaller to it on its
left and all the elements larger than to its right. Now 6 8 17 14 and 63 37 52 are considered as
two separate lists, and same logic is applied on them, and we keep doing this until the complete
list is sorted.
#include<iostream>
#include<cstdlib>
return index;
}
// Swapping pvt value from high, so pvt value will be taken as pivot
while partitioning.
swap(&a[high], &a[pvt]);
int main()
{
int n, i;
cout<<"\nEnter the number of data element to be sorted: ";
cin>>n;
int arr[n];
for(i = 0; i < n; i++)
{
cout<<"Enter element "<<i+1<<": ";
cin>>arr[i];
}
QuickSort(arr, 0, n-1);
return 0;
}
Merge Sort
Merge Sort follows the rule of Divide and Conquer. In merge sort the unsorted list is divided
into N sublists, each having one element, because a list consisting of one element is always
sorted. Then, it repeatedly merges these sublists, to produce new sorted sublists, and in the end,
only one sorted list is produced.
Merge sort first divides the array into equal halves and then combines them in a sorted manner.
How Merge Sort Works?
To understand merge sort, we take an unsorted array as the following
We know that merge sort first divides the whole array iteratively into equal halves unless the
atomic values are achieved. We see here that an array of 8 items is divided into two arrays of size
4.
This does not change the sequence of appearance of items in the original. Now we divide these
two arrays into halves.
We further divide these arrays and we achieve atomic value which can no more be divided.
Now, we combine them in exactly the same manner as they were broken down. Please note the
color codes given to these lists.
We first compare the element for each list and then combine them into another list in a sorted
manner. We see that 14 and 33 are in sorted positions. We compare 27 and 10 and in the target
list of 2 values we put 10 first, followed by 27. We change the order of 19 and 35 whereas 42
and 44 are placed sequentially.
In the next iteration of the combining phase, we compare lists of two data values, and merge
them into a list of found data values placing all in a sorted order.
After the final merging, the list should look like this
#include <iostream>
int main()
{
int n, i;
cout<<"\nEnter the number of data element to be sorted: ";
cin>>n;
int arr[n];
for(i = 0; i < n; i++)
{
cout<<"Enter element "<<i+1<<": ";
cin>>arr[i];
}
MergeSort(arr, 0, n-1);
return 0;
}
Shell Sort
Shell sort is a highly efficient sorting algorithm and is based on insertion sort algorithm. This
algorithm avoids large shifts as in case of insertion sort, if the smaller value is to the far right and
has to be moved to the far left.
This algorithm uses insertion sort on a widely spread elements, first to sort them and then sorts
the less widely spaced elements. This spacing is termed as interval. This interval is calculated
based on Knuth's formula as
Knuth's Formula
h=h*3+1
where h is interval with initial value 1
How Shell Sort Works?
Let us consider the following example to have an idea of how shell sort works. We take the same
array we have used in our previous examples. For our example and ease of understanding, we
take the interval of 4. Make a virtual sub-list of all values located at the interval of 4 positions.
Here these values are {35, 14}, {33, 19}, {42, 27} and {10, 44}
We compare values in each sub-list and swap them (if necessary) in the original array. After this
step, the new array should look like this
Then, we take interval of 2 and this gap generates two sub-lists - {14, 27, 35, 42}, {19, 10, 33,
44}
We compare and swap the values, if required, in the original array. After this step, the array
should look like this
Finally, we sort the rest of the array using interval of value 1. Shell sort uses insertion sort to sort
the array.
Following is the step-by-step depiction
int main()
{
int arr[] = {12, 34, 54, 2, 3}, i;
int n = sizeof(arr)/sizeof(arr[0]);
shellSort(arr, n);
return 0;
}
Radix Sort
Radix sort is a small method that many people intuitively use when alphabetizing a large list of
names. Specifically, the list of names is first sorted according to the first letter of each name, that
is, the names are arranged in 26 classes.
Intuitively, one might want to sort numbers on their most significant digit. However, Radix sort
works counter-intuitively by sorting on the least significant digits first. On the first pass, all the
numbers are sorted on the least significant digit and combined in an array. Then on the second
pass, the entire numbers are sorted again on the second least significant digits and combined in
an array and so on.
Searching:
Searching is an operation or a technique that helps finds the place of a given
element or value in the list. Any search is said to be successful or unsuccessful depending upon
whether the element that is being searched is found or not. Some of the standard searching
technique that is being followed in data structure is listed below:
This is the simplest method for searching. In this technique of searching, the element to be found
in searching the elements to be found is searched sequentially in the list. This method can be
performed on a sorted or an unsorted list (usually arrays). In case of a sorted list searching starts
from 0th element and continues until the element is found from the list or the element whose
value is greater than (assuming the list is sorted in ascending order), the value being searched is
reached.
As against this, searching in case of unsorted list also begins from the 0th element and continues
until the element or the end of the list is reached.
Example:
The list given below is the list of elements in an unsorted array. The array contains 10 elements.
Suppose the element to be searched is 46, so 46 is compared with all the elements starting from
the 0th element and searching process ends where 46 is found or the list ends.
The performance of the linear search can be measured by counting the comparisons done to find
out an element.
#include <iostream>
#include<conio.h>
#include<stdlib.h>
#define MAX_SIZE 5
using namespace std;
int main() {
int arr_search[MAX_SIZE], i, element;
if (i == MAX_SIZE)
cout << "\nSearch Element : " << element << " : Not Found \n";
getch();
}
Binary Search
Binary search is a very fast and efficient searching technique. It requires the list to be in sorted
order. In this method, to search an element you can compare it with the present element at the
center of the list. If it matches, then the search is successful otherwise the list is divided into two
halves: one from the 0th element to the middle element which is the center element (first half)
another from the center element to the last element (which is the 2nd half) where all values are
greater than the center element.
The searching mechanism proceeds from either of the two halves depending upon whether the
target element is greater or smaller than the central element. If the element is smaller than the
central element, then searching is done in the first half, otherwise searching is done in the second
half.
Example:
For a binary search to work, it is mandatory for the target array to be sorted. We shall learn the
process of binary search with a pictorial example. The following is our sorted array and let us
assume that we need to search the location of value 31 using binary search.
Here it is, 0 + (9-0) / 2 = 4 (integer value of 4.5). So, 4 is the mid of the array.
Now we compare the value stored at location 4, with the value being searched, i.e. 31. We find
that the value at location 4 is 27, which is not a match. As the value is greater than 27 and we
have a sorted array, so we also know that the target value must be in the upper portion of the
array.
We change our low to mid + 1 and find the new mid value again.
low = mid + 1
mid = low + (high-low) / 2
Our new mid is 7 now. We compare the value stored at location 7 with our target value 31.
The value stored at location 7 is not a match, rather it is more than what we are looking for. So,
the value must be in the lower part from this location.
Binary search halves the searchable items and thus reduces the count of comparisons to be made
to very less numbers.
#include <iostream>
#include<conio.h>
#include<stdlib.h>
#define MAX_SIZE 5
int main() {
int arr_search[MAX_SIZE], i, element;
int f = 0, r = MAX_SIZE, mid;
if (f > r)
cout << "\nSearch Element : " << element << " : Not Found \n";
getch();
}
Hashing:
It is a technique used for performing insertions, deletions and search operation in constant
average time by implementing Hash table data structure.
Types of hashing
1.Static hashing
the hash function maps search key value to a fixed set of locations
2.Dynamic hashing
the hash table can grow to handle more items at run time.
Hash table
The hash table data structure is an array of some fixed size table, containing the keys. A key
value is associated with each record. A hash table is partitioned into array of buckets. Each
bucket has many slots and each slot holds one record
Hashing functions
A hashing function is a key-to-address transformation which acts upon a given key to
compare the relative position of the key in the hash table.
Hash (24) = 24 % 5 = 4
The key value 24 is placed in the relative location 4 in the hash table
Hash function
Minimize collisions
The key is squared and the middle part of the result is taken as the hash value based on
the number or digits required for addressing.
For example :
1. Modulo division
This method computes hash value from key using modulo(%) operator
Index Slot
0 4
1
2
3
For example:
H(4) = 4 % 4 = 0
2. Folding method
This method involves splitting keys into two or more parts each of which has the same
length as the required address and then adding the parts to form the hash function
Two types
o Fold shifting method
o Fold boundary method
Example:
Map the key 123203241 to a hash table size of 1000
Select the digits from the positions 2 , 5, 8 .
Now the hash value = 204
6. Radix transformation
In this method , a key is transformed into another number base
Example :
Map the key (8465)10 using base 15
Now (8465)10 = (2795)15
Now the hash value is 2795
Applications of Hash tables
Database systems
Symbol tables
Data dictionaries
Network processing algorithms
Browse caches
Collision
Collision occurs when a hash value of a record being inserted hashes to an address that already
contain a different record. (i.e) when two key values hash to the same position.
Example : 37, 24 , 7
Index Slot
0
1
2 37
3
4 24
37 is placed in index 2
24 is placed in index 4
Now inserting 7
Hash (7) = 7 mod 5 = 2
2 collides
Collision Resolution strategies
The process of finding another position for the collide record is called Collision Resolution
strategy.
Two categories
1. Open hashing - separate chaining
Each bucket in the hash table is the head of a linked list. All elements that hash to
same value are linked together.
2. Closed hashing - Open addressing, rehashing and extendible hashing.
Collide elements are stored at another slot in the table.
It ensures that all elements are stored directly into the hash table.
1) Separate chaining
Disadvantages
It requires pointers that occupy more space.
It takes more effort to perform search, since it takes time to evaluate the hash function
and also to traverse the list.
Open Addressing:
1. H0(42) = 42 % 10 = 2
2. H0(39) = 39 %10 = 9
3. H0(69) = 69 % 10 = 9 collides with 39
H1(69) = (9+1) % 10 = 10 % 10 = 0
4. H0(21) = 21 % 10 = 1
5. H0(71) = 71 % 10 = 1 collides with 21
H1(71) = (1 +1) % 10 = 2 % 10 = 2 collides
with 42 H2(71) = (2 +1) % 10 = 3 % 10 = 3
6. H0(55) = 55 % 10 = 5
Index Empty Table After 42 After 39 After 69 After 21 After 71 After 55
0 69 69 69 69
1 21 21 21
2 42 42 42 42 42 42
3 71 71
5 55
9 39 39 39 39 39
Advantages
It doesnt require pointers
Disadvantages
It forms clusters that degrades the performance of the hash table
b) Quadratic probing
Based on quadratic function i.e., F(i) = i
Hi(x) =Hash(X) + F(i) mod Table size
Example: To insert 89, 18, 49, 58, 69 to the hash table of size 10 using quadratic probing
1. H0(89) = 89 %10 =9
2. H0(18) = 18 %10 =8
3. H0(49) = 49 %10 =9 collides with 89
4. H1(49) = (9+12) % 10 = 10 % 10 = 0
5. H0(58) = 58 %10 = 8 collides with 18
H1(58) = (8 +12) % 10 = 9 % 10 = 9 collides with 89
H2(58) = (8 + 22) % 10 = 12 % 10 = 2
5. H0(69) = 69 % 10 =9 collides with 89
H1(69) = (9 +12) % 10 = 10 % 10 = 0 collides with 49
H2(69) = (9 + 22) % 10 = 13 % 10 = 3
0 49 49 49
2 58 58
3 69
5 55
8 18 18 18 18
9 89 89 89 89 89
Limitations:
It faces secondary clustering that is difficult to find the empty slot if the table is half full.
C) Double Hashing
It uses the idea of applying a second hash function to the key when a collision occurs.
The result of the second hash function will be the number of positions from the point
of collision to insert.
F(i) = i * Hash2(X)
Hi(x) = (Hash(X) + F(i)) mod Tablesize
Hi(x) = (Hash(X) + i * Hash2(X) ) mod Tablesize A popular second hash function is
Hash2(X) = R (X % R)
where R is a prime number
Insert : 89, 18, 49, 58, 69 using Hash2(X) = R (X % R) and R = 7
Open addressing hash table using double hashing
0 69
3 58 58
6 49 49 49
8 18 18 18 18
9 89 89 89 89 89
1. H0(89) = 89 % 10 = 9
2. H0(18) = 18 % 10 = 8
3. H0(49) = 49 % 10 = 9 collides with 89
H1(49) = ((49 % 10 ) + 1 * (7- (49 % 7)) ) % 10
=16 % 10 = 6
4. H0(58) = 58 % 10 = 8 collides with 18
H1(58) = ((58 % 10 ) + 1 * (7- (58 % 7)) ) % 10
=13 % 10 = 3
5. H0(69) = 69 % 10 = 9 collides with 89
H1(69) = ((69 % 10 ) + 1 * (7- (69 % 7)) ) % 10
= 10 % 10 = 0
Rehashing:
HashTable Rehash(HashTable H)
{
int i, oldsize;
cell *oldcells;
oldcells = HThecells;
oldsize = H Table_size;
H= InitializeTable(2*oldsize);
for (i=0;i<oldsize; i++)
if (oldcells[i].Info==Legitimate)
Insert(oldcells[i].Element.H);
free(oldcells);
return H;
}
Example : Suppose the elements 13, 15, 24, 6 are inserted into an open addressing hash table of
size 7 and if linear probing is used when collision occurs.
Index Slot
0 6
1 15
2
3 24
4
5
6 13
If 23 is inserted, the resulting table will be over 70 percent full.
Index Slot
0 6
1 15
2 23
3 24
4
5
6 13
A new table is created. The size of the new table is 17, as this is the first prime number
that is twice as large as the old table size.
Index Slot
0
1
2
3
4
5
6 6
7 23
8 24
9
10
11
12
13 13
14
15 15
16
Advantages
Programmer doesnt worry about the table size
Simple to implement
Extendible Hashing:
When open addressing or separate hashing is used, collisions could cause several
blocks to be examined during a Find operation, even for a well distributed hash table.
Furthermore , when the table gets too full, an extremely expensive rehashing step must
be performed, which requires O(N) disk accesses.
These problems can be avoided by using extendible hashing.
Extendible hashing uses a tree to insert keys into the hash table.
Example:
Consider the key consists of several 6 bit integers.
The root of the tree contains 4 pointers determined by the leading 2 bits.
In each leaf the first 2 bits are identified and indicated in parenthesis.
D represents the number of bits used by the root(directory)
The number of entries in the directory is 2D
Provides quick access times for insert and find operations on large databases.
Disadvantages
This algorithm does not work if there are more than M duplicates