Data Structure-Bsc-Module4-22
Data Structure-Bsc-Module4-22
Bubble sort, also referred to as comparison sort, is a simple sorting algorithm that repeatedly goes
through the list, compares adjacent elements and swaps them if they are in the wrong order. This is
the most simplest algorithm and inefficient at the same time. Yet, it is very much necessary to learn
about it as it represents the basic foundations of sorting.
Application
Understand the working of Bubble sort
Bubble sort is mainly used in educational purposes for helping students understand the
foundations of sorting.
This is used to identify whether the list is already sorted. When the list is already sorted
(which is the best-case scenario), the complexity of bubble sort is only O(n).
In real life, bubble sort can be visualised when people in a queue wanting to be standing in a
height wise sorted manner swap their positions among themselves until everyone is standing
based on increasing order of heights.
Explanation
Algorithm: We compare adjacent elements and see if their order is wrong (i.e a[i] > a[j] for
1 <= i < j <= size of array; if array is to be in ascending order, and vice-versa). If yes,
then swap them.
Explanation:
Let us say, we have an array of length n. To sort this array we do the above step (swapping)
for n - 1 passes.
In simple terms, first, the largest element goes at its extreme right place then, second largest
to the last by one place, and so on. In the ith pass, the ith largest element goes at its right place
in the array by swappings.
In mathematical terms, in ith pass, at least one element from (n - i + 1) elements from start will
go at its right place. That element will be the ith (for 1 <= i <= n - 1) largest element of the
array. Because in the ith pass of the array, in the jth iteration (for 1 <= j <= n - i), we are
checking if a[j] > a[j + 1], and a[j] will always be greater than a[j + 1] when it is the largest
element in range [1, n - i + 1]. Now we will swap it. This will continue until ith largest element
is at the (n - i + 1)th position of the array.
First Pass:
We proceed with the first and second element i.e., Arr[0] and Arr[1]. Check if 14 >
33 which is false. So, no swapping happens and the array remains the same.
We proceed with the second and third element i.e., Arr[1] and Arr[2]. Check if 33 >
27 which is true. So, we swap Arr[1] and Arr[2].
Thus the array becomes:
We proceed with the third and fourth element i.e., Arr[2] and Arr[3]. Check if 33 >
35 which is false. So, no swapping happens and the array remains the same.
We proceed with the fourth and fifth element i.e., Arr[3] and Arr[4]. Check if 35 >
10 which is true. So, we swap Arr[3] and Arr[4].
Thus, marks the end of the first pass, where the Largest element reaches its final(last) position.
Second Pass:
We proceed with the first and second element i.e., Arr[0] and Arr[1]. Check if 14 >
27 which is false. So, no swapping happens and the array remains the same.
We now proceed with the second and third element i.e., Arr[1] and Arr[2]. Check if 27 > 33 which is
false. So, no swapping happens and the array remains the same.
We now proceed with the third and fourth element i.e., Arr[2] and Arr[3]. Check if 33 >
10 which is true. So, we swap Arr[2] and Arr[3].
Thus marks the end of second pass where the second largest element in the array has occupied its
correct position.
Third Pass:
After the third pass, the third largest element will be at the third last position in the array.
.
.
i-th Pass:
After the ith pass, the ith largest element will be at the ith last position in the array.
.
.
n-th Pass:
After the nth pass, the nth largest element(smallest element) will be at nth last position(1st
position) in the array, where ‘n’ is the size of the array.
After doing all the passes, we can easily see the array will be sorted.
Thus, the sorted array will look like this:
end for
end for
end
Selection Sort
Selection sort is a simple comparison-based sorting algorithm. It is in-place and needs no extra
memory.
The idea behind this algorithm is pretty simple. We divide the array into two parts: sorted and
unsorted. The left part is sorted subarray and the right part is unsorted sub array. Initially, sorted sub
array is empty and unsorted array is the complete given array.
We perform the steps given below until the unsorted sub array becomes empty:
1. Pick the minimum element from the unsorted sub array.
2. Swap it with the leftmost element of the unsorted sub array.
3. Now the leftmost element of unsorted sub array becomes a part (rightmost) of sorted sub array and
will not be a part of unsorted sub array.
A selection sort works as follows:
12 11 13 5 6
Here, 12 is greater than 11 hence they are not in the ascending order and 12 is not at its
correct position. Thus, swap 11 and 12.
So, for now 11 is stored in a sorted sub-array.
11 12 13 5 6
Second Pass:
Now, move to the next two elements and compare them
11 12 13 5 6
Here, 13 is greater than 12, thus both elements seems to be in ascending order, hence, no
swapping will occur. 12 also stored in a sorted sub-array along with 11
Third Pass:
Now, two elements are present in the sorted sub-array which are 11 and 12
Moving forward to the next two elements which are 13 and 5
11 12 13 5 6
Both 5 and 13 are not present at their correct place so swap them
11 12 5 13 6
After swapping, elements 12 and 5 are not sorted, thus swap again
11 5 12 13 6
5 11 12 13 6
5 11 12 13 6
Clearly, they are not sorted, thus perform swap between both
5 11 12 6 13
5 11 6 12 13
5 6 11 12 13
// Driver code
int main()
{
int arr[] = {12, 11, 13, 5, 6};
int n = sizeof(arr) / sizeof(arr[0]);
insertionSort(arr, n);
printArray(arr, n);
return 0;
}
Searching
Searching is the process of finding some particular element in the list. If the element is present in the
list, then the process is called successful and the process returns the location of that element,
otherwise the search is called unsuccessful.
There are two popular search methods that are widely used in order to search some item into the list.
However, choice of the algorithm depends upon the arrangement of the list.
o Linear Search
o Binary Search
Linear Search
Linear search is the simplest search algorithm and often called sequential search. In this type of
searching, we simply traverse the list completely and match each element of the list with the item
whose location is to be found. If the match found then location of the item is returned otherwise the
algorithm return NULL.
Linear search is mostly used to search an unordered list in which the items are not sorted. The
algorithm of linear search is given as follows.
Algorithm
o LINEAR_SEARCH(A, N, VAL)
o Step 1: [INITIALIZE] SET POS = -1
o Step 2: [INITIALIZE] SET I = 1
o Step 3: Repeat Step 4 while I<=N
o Step 4: IF A[I] = VAL
SET POS = I
PRINT POS
Go to Step 6
[END OF IF]
SET I = I + 1
[END OF LOOP]
o Step 5: IF POS = -1
PRINT " VALUE IS NOT PRESENTIN THE ARRAY "
[END OF IF]
o Step 6: EXIT
Complexity of algorithm
C Program
1. #include<stdio.h>
2. void main ()
3. {
4. int a[10] = {10, 23, 40, 1, 2, 0, 14, 13, 50, 9};
5. int item, i,flag;
6. printf("\nEnter Item which is to be searched\n");
7. scanf("%d",&item);
8. for (i = 0; i< 10; i++)
9. {
10. if(a[i] == item)
11. {
12. flag = i+1;
13. break;
14. }
15. else
16. flag = 0;
17. }
18. if(flag != 0)
19. {
20. printf("\nItem found at location %d\n",flag);
21. }
22. else
23. {
24. printf("\nItem not found\n");
25. }
26. }
Binary Search
Binary search is the search technique which works efficiently on the sorted lists. Hence, in order to
search an element into some list by using binary search technique, we must ensure that the list is
sorted.
Binary search follows divide and conquer approach in which, the list is divided into two halves and
the item is compared with the middle element of the list. If the match is found then, the location of
middle element is returned otherwise, we search into either of the halves depending upon the result
produced through the match.
Binary search algorithm is given below.
Complexity
SN Performance Complexity
Example
Let us consider an array arr = {1, 5, 7, 8, 13, 19, 20, 23, 29}. Find the location of the item 23 in the
array.
In 1st step :
1. BEG = 0
2. END = 8
3. MID = 4
4. a[mid] = a[4] = 13 < 23, therefore
in Second step:
1. Beg = mid +1 = 5
2. End = 8
3. mid = 13/2 = 6
4. a[mid] = a[6] = 20 < 23, therefore;
in third step:
1. beg = mid + 1 = 7
2. End = 8
3. mid = 15/2 = 7
4. a[mid] = a[7]
5. a[7] = 23 = item;
6. therefore, set location = mid;
7. The location of the item will be 7.
The two-dimensional array can be defined as an array of arrays. The 2D array is organized as matrices
which can be represented as the collection of rows and columns. However, 2D arrays are created to
implement a relational database lookalike data structure. It provides ease of holding the bulk of data at
once which can be passed to any number of functions wherever required.
1. data_type array_name[rows][columns];
Consider the following example.
1. int twodimen[4][3];
In C programming, you can create an array of arrays. These arrays are known as multidimensional
arrays. For example,
float x[3][4];
Here, x is a two-dimensional (2d) array. The array can hold 12 elements. You can think the array as a
table with 3 rows and each row has 4 columns.
Initialization of 2D Array in C
In the 1D array, we don't need to specify the size of the array if the declaration and initialization are
being done simultaneously. However, this will not work with 2D arrays. We will have to define at
least the second dimension of the array. The two-dimensional array can be declared and defined in the
following way.
1. int arr[4][3]={{1,2,3},{2,3,4},{3,4,5},{4,5,6}};
1. #include<stdio.h>
2. int main(){
3. int i=0,j=0;
4. int arr[4][3]={{1,2,3},{2,3,4},{3,4,5},{4,5,6}};
5. //traversing 2D array
6. for(i=0;i<4;i++){
7. for(j=0;j<3;j++){
8. printf("arr[%d] [%d] = %d \n",i,j,arr[i][j]);
9. }//end of j
10. }//end of i
eturn 0;
11. }
Output
arr[0][0] = 1
arr[0][1] = 2
arr[0][2] = 3
arr[1][0] = 2
arr[1][1] = 3
arr[1][2] = 4
arr[2][0] = 3
arr[2][1] = 4
arr[2][2] = 5
arr[3][0] = 4
arr[3][1] = 5
arr[3][2] = 6
Hash Table
Hash Table is a data structure which stores data in an associative manner. In a hash table, data is
stored in an array format, where each data value has its own unique index value. Access of data
becomes very fast if we know the index of the desired data.
Thus, it becomes a data structure in which insertion and search operations are very fast irrespective
of the size of the data. Hash Table uses an array as a storage medium and uses hash technique to
generate an index where an element is to be inserted or is to be located from.
Hashing
Hashing is a technique to convert a range of key values into a range of indexes of an array. We're
going to use modulo operator to get a range of key values. Consider an example of hash table of size
20, and the following items are to be stored. Item are in the (key,value) format.
(1,20)
(2,70)
(42,80)
(4,25)
(12,44)
(14,32)
(17,11)
(13,78)
(37,98)
Sr.No. Key Hash Array Index
1 1 1 % 20 = 1 1
2 2 2 % 20 = 2 2
3 42 42 % 20 = 2 2
4 4 4 % 20 = 4 4
5 12 12 % 20 = 12 12
6 14 14 % 20 = 14 14
7 17 17 % 20 = 17 17
8 13 13 % 20 = 13 13
9 37 37 % 20 = 17 17
Linear Probing
As we can see, it may happen that the hashing technique is used to create an already used index of
the array. In such a case, we can search the next empty location in the array by looking into the next
cell until we find an empty cell. This technique is called linear probing.
1 1 1 % 20 = 1 1 1
2 2 2 % 20 = 2 2 2
3 42 42 % 20 = 2 2 3
4 4 4 % 20 = 4 4 4
5 12 12 % 20 = 12 12 12
6 14 14 % 20 = 14 14 14
7 17 17 % 20 = 17 17 17
8 13 13 % 20 = 13 13 13
9 37 37 % 20 = 17 17 18
Application of Hash Tables:
1. Database System: Specifically, those that are required efficient random access. Usually,
database systems try to develop between two types of access methods: sequential and random.
Hash Table is an integral part of efficient random access because they provide a way to locate
data in a constant amount of time.
2. Symbol Tables: The tables utilized by compilers to maintain data about symbols from a
program. Compilers access information about symbols frequently. Therefore, it is essential
that symbol tables be implemented very efficiently.
3. Data Dictionaries: Data Structure that supports adding, deleting, and searching for data.
Although the operation of hash tables and a data dictionary are similar, other Data Structures
may be used to implement data dictionaries.
4. Associative Arrays: Associative Arrays consist of data arranged so that n th elements of one
array correspond to the nth element of another. Associative Arrays are helpful for indexing a
logical grouping of data by several key field
Hashing
There are many possibilities for representing the dictionary and one of the best methods for
representing is hashing. Hashing is a type of a solution which can be used in almost all situations.
Hashing is a technique which uses less key comparisons and searches the element in O(n) time in the
worst case and in an average case it will be done in O(1) time. This method generally used the hash
functions to map the keys into a table, which is called a hash table.
1) Hash table
Hash table is a type of data structure which is used for storing and accessing data very quickly.
Insertion of data in a table is based on a key value. Hence every entry in the hash table is defined with
some key. By using this key data can be searched in the hash table by few key comparisons and then
searching time is dependent upon the size of the hash table.
2) Hash function
Hash function is a function which is applied on a key by which it produces an integer, which can be
used as an address of hash table. Hence one can use the same hash function for accessing the data
from the hash table. In this the integer returned by the hash function is called hash key.
There are various types of hash function which are used to place the data in a hash table,
1. Division method
In this the hash function is dependent upon the remainder of a division. For example:-if the record
52,68,99,84 is to be placed in a hash table and let us take the table size is 10.
Then:
h(key)=record% table size.
2=52%10
8=68%10
9=99%10
4=84%10
In this method firstly key is squared and then mid part of the result is taken as the index. For example:
consider that if we want to place a record of 3101 and the size of table is 1000. So
3101*3101=9616201 i.e. h (3101) = 162 (middle 3 digit)
In this method the key is divided into separate parts and by using some simple operations these parts
are combined to produce a hash key.
Example 1: The task is to fold the key 123456789 into a Hash Table of ten spaces (0 through 9).
It is given that the key, say X is 123456789 and the table size (i.e., M = 10).
Since it can break X into three parts in any order. Let’s divide it evenly.
Therefore, a = 123, b = 456, c = 789.
Now, H(x) = (a + b + c) mod M i.e., H(123456789) =(123 + 456 + 789) mod 10 = 1368 mod
10 = 8.
Hence, 123456789 is inserted into the table at address 8.
Characteristics of good hashing function
1. The hash function should generate different hash values for the similar string.
2. The hash function is easy to understand and simple to compute.
3. The hash function should produce the keys which will get distributed, uniformly over an
array.
4. A number of collisions should be less while placing the data in the hash table.
5. The hash function is a perfect hash function when it uses all the input data.
Collision
It is a situation in which the hash function returns the same hash key for more than one record, it is
called as collision. Sometimes when we are going to resolve the collision it may lead to a overflow
condition and this overflow and collision condition makes the poor hash function.
Liner Probing
Quadratic probing
Double hashing
Hash table: a data structure where the data is stored based upon its hashed key which is obtained using
a hashing function.
Hash function: a function which for a given data, outputs a value mapped to a fixed range. A hash
table leverages the hash function to efficiently map data such that it can be retrieved and updated
quickly. Simply put, assume S = {s1, s2, s3, ...., sn} to be a set of objects that we wish to store into a
map of size N, so we use a hash function H, such that for all s belonging to S; H(s) -> x, where x is
guaranteed to lie in the range [1,N]
Perfect Hash function: a hash function that maps each item into a unique slot (no collisions).
Hash Collisions: As per the Pigeonhole principle if the set of objects we intend to store within our
hash table is larger than the size of our hash table we are bound to have two or more different objects
having the same hash value; a hash collision. Even if the size of the hash table is large enough to
accommodate all the objects finding a hash function which generates a unique hash for each object in
the hash table is a difficult task. Collisions are bound to occur (unless we find a perfect hash function,
which in most of the cases is hard to find) but can be significantly reduced with the help of various
collision resolution techniques.
Suppose you wish to store a set of numbers = {0,1,2,4,5,7} into a hash table of size 5.
Now, assume that we have a hash function H, such that H(x) = x%5
So, if we were to map the given data with the given hash function we'll get the corresponding values
H(0)-> 0%5 = 0
H(1)-> 1%5 = 1
H(2)-> 2%5 = 2
H(4)-> 4%5 = 4
H(5)-> 5%5 = 0
H(7)-> 7%5 = 2
Clearly 0 and 5, as well as 2 and 7 will have the same hash value, and in this case we'll simply append
the colliding values to a list being pointed by their hash keys.
Obviously in practice the table size can be significantly large and the hash function can be even more
complex, also the data being hashed would be more complex and non-primitive, but the idea remains
the same.
This is an easy way to implement hashing but it has its own demerits.
The lookups/inserts/updates can become linear [O(N)] instead of constant time [O(1)] if the hash
function has too many collisions.
It doesn't account for any empty slots which can be leveraged for more efficient storage and lookups.
Ideally we require a good hash function to guarantee even distribution of the values.
Say, for a load factor
λ=number of objects stored in table/size of the table (can be >1)
a good hash function would guarantee that the maximum length of list associated with each key is
close to the load factor.
Note that the order in which the data is stored in the lists (or any other data structures) is based upon
the implementation requirements. Some general ways include insertion order, frequency of access etc.
2. Closed Hashing (Open Addressing)
This collision resolution technique requires a hash table with fixed and known size. During insertion,
if a collision is encountered, alternative cells are tried until an empty bucket is found. These
techniques require the size of the hash table to be supposedly larger than the number of objects to be
stored (something with a load factor < 1 is ideal).
There are various methods to find these empty buckets:
a. Liner Probing
b. Quadratic probing
c. Double hashing
a. Linear Probing
The idea of linear probing is simple, we take a fixed sized hash table and every time we face a hash
collision we linearly traverse the table in a cyclic manner to find the next empty slot.
Assume a scenario where we intend to store the following set of numbers = {0,1,2,4,5,7} into a hash
table of size 5 with the help of the following hash function H, such that H(x) = x%5.
So, if we were to map the given data with the given hash function we'll get the corresponding values
H(0)-> 0%5 = 0
H(1)-> 1%5 = 1
H(2)-> 2%5 = 2
H(4)-> 4%5 = 4
H(5)-> 5%5 = 0
in this case we see a collision of two terms (0 & 5). In this situation we move linearly down the table
to find the first empty slot. Note that this linear traversal is cyclic in nature, i.e. in the event we
exhaust the last element during the search we start again from the beginning until the initial key is
reached.
b. Quadratic Probing
This method lies in the middle of great cache performance and the problem of clustering. The general
idea remains the same, the only difference is that we look at the Q(i) increment at each iteration when
looking for an empty bucket, where Q(i) is some quadratic expression of i. A simple expression of Q
would be Q(i) = i^2, in which case the hash function looks something like this:
H(x, i) = (H(x) + i^2)%N
In general, H(x, i) = (H(x) + ((c1\*i^2 + c2\*i + c3)))%N, for some choice of constants c1, c2, and c3
Despite resolving the problem of clustering significantly it may be the case that in some situations this
technique does not find any available bucket, unlike linear probing which always finds an empty
bucket.
Luckily, we can get good results from quadratic probing with the right combination of probing
function and hash table size which will guarantee that we will visit as many slots in the table as
possible. In particular, if the hash table's size is a prime number and the probing function is H(x, i) =
i^2, then at least 50% of the slots in the table will be visited. Thus, if the table is less than half full, we
can be certain that a free slot will eventually be found.
Alternatively, if the hash table size is a power of two and the probing function is H(x, i) = (i^2 + i)/2,
then every slot in the table will be visited by the probing function.
Assume a scenario where we intend to store the following set of numbers = {0,1,2,5} into a hash table
of size 5 with the help of the following hash function H, such that H(x, i) = (x%5 + i^2)%5.
Clearly 5 and 0 will face a collision, in which case we'll do the following:
c. Double Hashing
This method is based upon the idea that in the event of a collision we use an another hashing function
with the key value as an input to find where in the open addressing scheme the data should actually be
placed at.
In this case we use two hashing functions, such that the final hashing function looks like:
H(x, i) = (H1(x) + i*H2(x))%N
Typically for H1(x) = x%N a good H2 is H2(x) = P - (x%P), where P is a prime number smaller than
N.
A good H2 is a function which never evaluates to zero and ensures that all the cells of a table are
effectively traversed.
Assume a scenario where we intend to store the following set of numbers = {0,1,2,5} into a hash table
of size 5 with the help of the following hash function H, such that
H(x, i) = (H1(x) + i*H2(x))%5
H1(x) = x%5 and H2(x) = P - (x%P), where P = 3
(3 is a prime smaller than 5)
Clearly 5 and 0 will face a collision, in which case we'll do the following: