0% found this document useful (0 votes)
2 views

DSModule3 (1)

The document discusses searching and sorting algorithms, focusing on linear and binary search methods, along with their complexities and limitations. It explains linear search as a simple technique suitable for small lists but inefficient for larger datasets, while binary search is more efficient on sorted lists with a complexity of O(log n). Additionally, the document covers sorting techniques, particularly insertion sort and selection sort, detailing their algorithms and performance characteristics.

Uploaded by

Anu Joseph
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

DSModule3 (1)

The document discusses searching and sorting algorithms, focusing on linear and binary search methods, along with their complexities and limitations. It explains linear search as a simple technique suitable for small lists but inefficient for larger datasets, while binary search is more efficient on sorted lists with a complexity of O(log n). Additionally, the document covers sorting techniques, particularly insertion sort and selection sort, detailing their algorithms and performance characteristics.

Uploaded by

Anu Joseph
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

DATA STRUCTURES Search Sort

SERACHING AND SORTING


SEARCHING
Information retrieval is one of the most important applications of computers. For each
particular structure used to hold data, the functions that allow access to elements in the structure
must be defined. In some cases, access is limited to the elements in specific positions in the
structure, such as the top element in a stack or the front element in a queue. Often, when data are
stored in a list or a table, we want to be able to access any element in the structure.
Searching refers to the operation of finding the location ‘LOC’ of an ‘ITEM’ in a file of data.
The search is successful, if the item exits in the file and unsuccessful, if it does not exist in the search
list. The various types of searches are:
Linear Search
Binary Search

LINEAR SEARCH (Sequential Search)


The simplest search technique is the sequential search. In this technique, we start at a
beginning of a list or a table and search for the desired record by examining each subsequent record
until either the desired record is found or the list is exhausted. This technique is suitable for a table
organized either as an array or as a linked list. It can be applied to an unordered table but the
efficiency of the search may be improved if the list is ordered.
Consider DATA, a linear array of ‘n’ elements. In linear search, the algorithm searches, the
search item ‘ITEM’ in the array DATA, by comparing each element of the array DATA sequentially
one by one, from 1 to n locations with ITEM, as whether DATA [1] = ITEM, DATA [2] = ITEM
and so on. Since, this method traverses DATA sequentially to locate the item ITEM, it is called as
sequential search or linear search.

Algorithm: (To perform search using linear search method)


procedure linear_search (list, value)

for each item in the list


if match item == value
return the item's location
end if
end for

end procedure

Complexity Analysis:
In the Worst case, occurs when ITEM does not appear in DATA, after the complete set of
search from 1 to n .i.e. (n) = n+1 comparisons.

1
DATA STRUCTURES Search Sort

In the Average case, key item ITEM occurs at any position ‘i’, which is a successful search
i.e.,n-i+1 key comparisons are made. The average number of key comparisons are ∑ 1<=i<=n ( n-i+1)/n
= (n + 1 ) / 2

Limitation of linear search Algorithm:


For a larger value of ‘n’, there are many comparisons needs to be made, which makes it an
inefficient search.

5.3 BINARY SEARCH


Sequential search is a simple and easy method. It is efficient for small lists but highly
inefficient for larger lists. In the worst case, we will have to make N comparisons, as to search for
the last record in the list we examine each record preceding it once. It is just like searching for
'SARAH' in a large telephone directory by reading one name at a time starting at the front of the
directory. If the keys in the list are sorted in some order, we can improve the search time to a worst
case of O(log2 n).
To search a particular item with a certain key value target the approximate middle entry of
the table is located, and its key value is examined. If the key value at the middle entry is higher than
the target, the key value of the middle entry of the first half of the list is examined and the procedure
is repeated on the first half until the required item is found. If the value is lower than the target, the
key of the middle entry of the second half of the table is taken and the procedure is repeated on the
second half. This process continues until the required key is found or the search intervals become
empty. This search mechanism can be implemented only on a sorted set or in a array ordered in the
order of non-decreasing values. Based on the results of comparison with the middle key, Km, the
following conditions are analyzed.
If K < Km i.e., the item being searched is present in lower half of the array.
If K = Km i.e., the searched item is found as the middle item.
If K> Km i.e., the search item is found at higher half of the array.

Algorithm: (To perform search using Binary search method)


Procedure binary_search
A ← sorted array
n ← size of array
x ← value to be searched

Set lowerBound = 1
Set upperBound = n

while x not found


if upperBound < lowerBound
EXIT: x does not exists.

set midPoint = lowerBound + ( upperBound - lowerBound ) / 2

if A[midPoint] < x
set lowerBound = midPoint + 1

if A[midPoint] > x

2
DATA STRUCTURES Search Sort

set upperBound = midPoint - 1

if A[midPoint] = x
EXIT: x found at location midPoint
end while
end procedure

Complexity Analysis:
The complexity measured as f(n) with n elements, such that, for each search, the sample size
is reduced to half.ie, for K comparisons the size is [ n/2k ].i.e. , worst case is O(log n) comparisons.
For a Binary search algorithm, the worst case is equal to log n, also the average case is
approximately equal to running time of worst case.

Limitation of Binary search Algorithm:


The limitation of the binary search algorithm is based on the two conditions used in it.
The list must be sorted i.e., maintaining a sorted array of elements when there are many
continuous insertions and d deletions are very expensive.
A direct access is required to the middle element in any sub list. In such situations data needs
to be represented in other structures like linked list or binary search tree, even though if the
algorithm is fast in execution.
E.g.: only 20 comparisons required with 100 000 elements.

5 COMPARISON OF DIFFERENT METHODS

Searchin Implementation summary comments Type Asymptotic


g Complexities
Algorith
m

Linear
Search The simplest search Effective if Linear The complexity is
technique is the sequential the list is Comparison measured in terms of
small. f (n), which is the
search. In this technique, we
number of
start at a beginning of a list or comparisons
a table and search for the required to find
desired record by examining ITEM in DATA with
each subsequent record until ‘n’ elements. The
either the desired record is cases in complexity
found or the list is exhausted. analysis are:
Average case and
This technique is suitable for
worst case.
a table organized either as an In the Worst
array or as a linked list. It can case, occurs when
be applied to an unordered ITEM does not
table but the efficiency of the appear in DATA,
search may be improved if the after the complete

3
DATA STRUCTURES Search Sort

list is ordered. set of search from 1


to n .i.e. (n) = n+1
comparisons.
In the
Average case, key
item ITEM occurs at
any position ‘i’,
which is a successful
search i.e.,n-i+1 key
comparisons are
made. The average
number of key
comparisons are ∑
1<=i<=n ( n-i+1)/n = (n +

1)/2
Binary
It is efficient Computing
Search The complexity
To search a particular for small lists the mid
item with a certain key value but highly value, measured as f(n)
target the approximate middle inefficient for splitting the with n elements,
larger lists list into 2, such that, for each
entry of the table is located,
based on the
and its key value is examined. search, the sample
median, and
If the key value at the middle then linear size is reduced to
entry is higher than the target, comparison. half. ie, for K
the key value of the middle comparisons the size
entry of the first half of the is [ n/2k ].i.e. , worst
list is examined and the case is O(log n)
procedure is repeated on the comparisons. For a
first half until the required Binary search
item is found. If the value is algorithm, the worst
lower than the target, the key case is equal to log
of the middle entry of the n, also the average
second half of the table is case is
taken and the procedure is approximately equal
repeated on the second half. to running time of
This process continues until worst case.
the required key is found or
the search intervals become
empty. This search
mechanism can be
implemented only on a sorted
set or in a array ordered in the
order of non-decreasing
values. Based on the results of
comparison with the middle
key, Km, the following
conditions are analyzed.

4
DATA STRUCTURES Search Sort

If K < Km i.e., the item being


searched is present in lower
half of the array.
If K = Km i.e., the searched
item is found as the middle
item.
If K> Km i.e., the search item
is found at higher half of the
array.

5.7 SORTING
Sorting is defined as the operation of arranging data in some given order, such as increasing
or decreasing order, with numerical data order or in alphabetical order. Let A be a list of n elements
A1, A2……… An , here sorting refers to an operation of rearranging the contents of A, as in increasing
order of A1<= A2<= A3<= A4 <= …….<= An and in decreasing order as, A1>=A2>=A3>= A4
>=…….>=An etc.

The complexity analysis of the various sorting algorithms is don eby measuring the running
time considering the number n of the items to be sorted, Any srt algorithm ‘S’ contains the
following operations:
Comparisons that test whether Ai < Aj or test Ai<B, where B refers to any auxiliary location in an
array.
Interchanges which switch the contents of Ai and Aj or Ai and B.
Assignments which set B = Ai
Then set Aj = B or
Aj = Ai.
Sorting can be classified as,
Internal sorting and
External sorting.

5.8 INTERNAL SORTING


The sorting methods where the records are sorted in the main or internal memory is referred
as Internal sorting technique. The various internal sorting techniques are:
Insertion sort
Bubble sort

5
DATA STRUCTURES Search Sort

Selection sort
Quick sort
Heap sort
Radix sort
Merge sort etc

In the above mentioned sorting techniques, the file size is small and could be accommodated in the
main memory.

5.9 INSERTION SORT


The insertion sort only passes through the array once. Therefore it is a very fast and efficient
sorting algorithm with small arrays. (The efficiency is lost however with large amounts of data.).The
sort works as follows: the array is split into two (virtual) sub-arrays. (With virtual I mean that the
array is not really split.) The first sub-array is considered to be the "sorted array". The elements of
the second sub-array will be inserted into the first sub-array at the right position.

Logic: Insertion sort is often used with a set of elements when the size of the array ‘n’ is small.
Here, sorting takes place by inserting a particular element at the appropriate position, that's why the
name- insertion sorting. Consider an array with n elements A[1], A[2] , A[3] ,…… A[N] and the
insertion sort algorithm scans the array A from A[1] to A[N], inserting each element A[K] into the
proper position in the sorted sub array A[1], A[2] , A[2] ,…… A[K-1].

Steps:

Pass 1: A [1] is itself sorted.

Pass 2: A [2] is inserted either before or after, such that A [1], A [2] is sorted.

Pass 3: A[3] is inserted in proper place either before A[1], between A[1] and A[2], or after A[3]

Such that A [1], A [2], A [3] remains in sorted order.

Pass N: A[N] is inserted onto proper place in A[1], A[2] , A[3] ,…… A [N-1] ,such that A[1],
A[2] , A [3],…… A [N] is sorted.

In the First iteration, second element A[1] is compared with the first element A[0]. In the
second iteration third element is compared with first and second element. In general, in every
iteration an element is compared with all the elements before it. While comparing if it is found that
the element can be inserted at a suitable position, then space is created for it by shifting the other

6
DATA STRUCTURES Search Sort

elements one position up and inserts the desired element at the suitable position. This procedure is
repeated for all the elements in the list. To avoid a constant check in the comparison of A[K] with
A[1], an element A[0] = -∞ is used.

Algorithm: 5.4 (To perform sort using insertion sort method)


Algorithm INSERTION_SORT ( A, N )
//sorts an array A with N element.//
Set A[0] = -∞

For k = 2 to N

Set TEMP = A[K] and PTR = K-1

Repeat while TEMP < A[PTR]

Set A[PTR +1 ] = A[PTR]

Set PTR = PTR – 1

Endwhile

Set A[PTR + 1] = TEMP

EndFor.

Return.

Complexity of Insertion sort


The complexity is computed using the number of comparisons f(n). In worst case, when the
array A is in reverse order and the inner while loop has maximum number of comparisons. i.e., f (n)
= 1+2+…………+ (n-1) = n (n-1)/2 = O (n2).

In the average case, approximately (k-1)/2 comparisons of the inner loop is considered

i.e., f (n) = ½ + 2/2 +…………+ (n-1) / 2 = n (n-1)/4 = O (n2).

Example 1: 77 33 44 11 88

Pass A [0] A [1] A [2] A [3] A [4] A [5]

K=1 -∞ 77 33 44 11 88

7
DATA STRUCTURES Search Sort

K=2 -∞ 33 77 44 11 88

K=3 -∞ 33 44 77 11 88

K=4 -∞ 11 33 44 77 88

K=5 -∞ 11 33 44 77 88
Table: 5.1 various passes in insertion sort

SELECTION SORT
Selection sort is a simple sorting algorithm. It works using the selection mechanism as the
name. The basic steps of selection sort algorithm are:

1. Find the minimum element in the list

2. Swap it with the element in the first position of the list

3. Repeat the steps above for all remainder elements of the list starting at the second position.

Procedure:

Considering an array A of ‘n’ elements A[1] ,A[2] ,A[3] ,…… , A[N], Selection sort
algorithm finds the smallest element in the list and interchanges it with the element in the first
position in the step1, In the second step, from N-1 elements, it finds the next smaller element and
interchanges it with the element in position 2 and so on.

Pass 1 Finds the smallest element in the list 1 to N, LOC. An interchange operation of A[LOC] and
A[1] is done resulting in sorted A[1].

Pass 2 Find the location LOC of the second smallest element in the sub list N-1,A[2] ,A[3] ,…… ,
A[N], and interchanges A[LOC] and A[2], i.e., A[1] and A[2] are sorted at the end of pass 2 as A[1]
<= A[2].

Pass N-1 Finds location LOC of the smallest element A [N-1], A [N] in the list, and interchange A
[LOC] and A [N-1] as A [N-1] <= A [N].The selection sort Algorithm sorts an array ’A’ after N-1
number of passes.

Algorithm: 5.6 (To perform sort using Selection sort method)


Algorithm SELECTION_SORT (A, N)
// Algorithm sorts the array A with N elements //

8
DATA STRUCTURES Search Sort

For k=1 to N-1


Call MIN ( A, K, N, LOC)
//Interchange A [K] and A [LOC] //
Set TEMP = A [K]
A [K] = A [LOC]
A [LOC] = TEMP
End for
Exit.

Sub-Algorithm: 5.6.1 (To find the Minimum element in the sort using Selection sort method)
//In an array A, the procedure MIN finds location LOC of the smallest element in the array A
[K], A[K+1]………A[N].//

Set MIN = A [K]


LOC =K
Repeat for J= K+1 to N
If MIN >A [J] then
Set MIN = A [J]
LOC = J
Endif
End for
Return.

Complexity of Selection sort

The number of comparisons f (n) of the selection sort algorithm is independent of the original
order of elements. Procedure MIN (A, K, N, LOC) requires N-K comparisons in each pass. In Pass 1
, Its N-1 comparisons and in Pass 2, Its N-2 comparisons.

i.e., f (n) = (n-1) + (n-2) +………………2 + 1 = n (n-1)/2 = O (n2)

For both worst case and Average case, in the selection sort procedure, the complexity is O (n2)

Example: An example is illustrated with the following array given below.

. The selection sort marks the first element (7). It then goes through
the remaining data to find the smallest number (1). It swaps with the first element (7) and the
smallest element (1) which is placed in its correct position.

9
DATA STRUCTURES Search Sort

It then marks the second element (4) and looks through the remaining data for the next smallest
number (2). These two numbers are then swapped.

Marking the third element (7) and looking through the remaining data for the next smallest number
(4). These two numbers are now swapped.

Lastly it marks the fourth element (9) and looks through the remaining data for the next smallest
number (7). These two numbers are then swapped.

If we were not finished at this point this sort would continue until N-1 passes.

5.12 MERGE SORT


The merge sort algorithm divides an array into two equal pieces, sorts them recursively, and
then merges them back together. There for it is an example of the divide and conquer algorithmic
paradigm. Merge sort is an O(n log n) comparison-based sorting algorithm. In most
implementations it is stable, meaning that it preserves the input order of equal elements in the sorted
output.

Conceptually, a merge sort works as follows:

If the list is of length 0 or 1, then it is already sorted. Otherwise:


Divide the unsorted list into two sub lists of about half the size.
Sort each sub list recursively by re-applying merge sort.
Merge the two sub lists back into one sorted list.

Merge sort incorporates two main ideas to improve its runtime:

A small list will take fewer steps to sort than a large list.

10
DATA STRUCTURES Search Sort

Fewer steps are required to construct a sorted list from two sorted lists than two unsorted lists. For
example, you only have to traverse each list once if they're already sorted (see the merge function
below for an example implementation).

Example:

Using merge sort to sort a list of integers contained in an array. As the basic idea behind
merge sort is splitting the given array into two equal sized sets using the divide and conquer method,
and combining is done called a merging of these two sets, after sorting.

Suppose we have a sequence of ‘n’ elements A[1]………A[n] , it is split into subsequent set
as A[1],…..,A[(n/2)] and A[(n/2)+1],…………..,A[n], where each set is individually sorted, by
again splitting it into subsequent smaller sets.

The merge sort Algorithm proceeds using recursion and function Merge, which merges two
sorted sets to form a single set. The algorithm Merge Sort (1, n) rearranges the values in the array
‘A’ in non-decreasing order.

Algorithm: 5.7 (To perform sort using Merge sort method)


Algorithm MERGE_SORT (low, high)
// A [Low: High] is a global array to be sorted //
{
If ( Low < High )
{
Mid = [ ( Low + High ) /2 ];
//finding position to split the array //
Mergesort ( Low, Mid );
Mergesort ( Mid + 1, High );
// solving the Sub problems //
Merge ( Low, Mid, High );
// Combine the solutions .//
}
}

Sub-Algorithm: 5.7.1 (To merge the sub lists generated in the Merge sort method)
Algorithm Merge (Low, Mid, High)
// A [Low: High] is a global array with t
wo sorted subsists A [Low: High] and A [Mid + 1:
High]. B [ ] is an auxiliary global array used to temporarily store the intermediate values//
{
h = Low; i = Low; j = Mid +1;
while ( ( h < Mid ) and ( j <= high ) ) do
{
if ( A [ h ] <= A [ j ] ) then

11
DATA STRUCTURES Search Sort

{
B [ i ] = A [ h ]; h = h + 1;
}
Else
{
B[i] = A [ j ]; j = j + 1;
}
i = i +1;
}
If ( h > Mid ) then
For k = j to high do
{
B [i] = A [ k ]; i = i +1;
}
Else
For k= h to Mid do
{
B[ i ] = A [ k ]; i = i + 1;
}
For k = low to high do A [ k ] = B [ k ];
}

Example 2:
The recursive steps in the merge sort is illustrated in the Figure: 5.1

Figure: 5.1 Representation of steps in the Merge sort

12
DATA STRUCTURES Search Sort

Explanation:
Consider an array A [ 1 : 10 ] as, 310 285 179 652 351 423 861 254 450 520

After the execution of the Algorithm Merge sort (A [ 1 : 10 ]), splits the array A [ 1: 10 ] into
A [ 1 : 5 ] , A [ 6 : 10 ]. The elements in the array A [ 1 : 5 ] is again split into A [ 1 : 3 ] and A [ 4 : 5
]. The array A [ 1 : 3 ] is again split into sub array as A [ 1 : 2 ] and A [ 3 : 3 ], where the array A [ 3
: 3 ] is already sorted. The array A [ 1 : 2 ] is again split into one element sub arrays.

The array A [ ] now becomes,

310 │ 285 │ 179 │ 652 351 │423 861 254 450 520.

Similarly, the array A [ 6 : 10 ] is splitted as the first array and the sub arrays are merged as
by sorting each of the smaller sub arrays, to get a independent sorted files, which at the end is
merged to get a completely sorted array of ‘n’ elements. The Merge Sort uses 2n locations, i.e.,
additional ‘n’ locations are needed, because in the Algorithm array A[i ] and array B[i ], which is an
auxiliary array are used to execute the sorting operation. After the completion of the recursive
procedure calls by the Merge Sort Algorithm applied in the ten elements in the array A [ i ], the pair
of parameters of the function Mergesort, Low and High could be represented as a tree of calls as
shown in the figure:5.2 Below:
1, 10

Figure: 5.2 Representation of Procedure calls in the form of a tree6,structure


1, 5 10
in a Merge sort.

After the completion of the calls of Merge Function recursively, the Low, Mid, High values are as
represented in the1,Figure:
3
5.3. 4, 5 6, 8 9, 10

1, 1, 2 6, 6, 7

1, 2 3,3 4, 4 5, 5 6, 7 8, 8 9, 9 10, 10
Figure: 5.3 Return result values after the execution of each Procedure call in the completion of Merge
Complexity of1,Merge
1, 3 sort 4, 4, 5 6, 7, 8 9, 9, 10
1, 1 2, 2 6, 6 7, 7
The complexity of Merge Sort is O (n log n). The disadvantage of Merge Sort is the use of
2n locations, i.e., additional ‘n’ locations are needed, because in the Algorithm array A[i] and array
B[i], which is an auxiliary array are used to execute the sorting operation.
1, 3, 5 6, 8, 10

5.13 QUICK SORT


Quick sort is a well-known sorting algorithm developed by C. A. R. Hoare that, on average,
makes Θ (nlogn) 1, 5, 10However, in the worst case, it makes
comparisons to sort n items. Θ (n2)

13
DATA STRUCTURES Search Sort

comparisons. Typically, quick sort is significantly faster in practice than other Θ (nlogn)
algorithms, because its inner loop can be efficiently implemented on most architecture, and in most
real-world data, it is possible to make design choices which minimize the probability of requiring
quadratic. Similar to binary search method, Quick sort sorts by employing a divide and conquer
strategy to divide a list into two sub-lists. Full example of quick sort on a random set of numbers.
The boxed element is the pivot. It is always chosen as the last element of the partition.

Procedure:
In quick sort, the array A [1: n] is divided into two sub arrays where the sorted sub arrays
need not be merged at the ends, as in a merge sort algorithm. It is done by rearranging the elements
A[1:n], such that A[i] <= A[j], for all ‘i’ between 1 and n, and all ‘j’ between m+1 and n, for any
value of m considered in the list as 1<=m<=n. Therefore the elements, A [1: m] and A [m+1: n] is
independently sorted, without any merging. The rearrangement of elements as A[1: m] and A[m+1 :
n ] is achieved by selecting any element, consider t = A[s]. Based on t=A[s], all elements appearing
before t=A[s] in A[1:n] are less than or equal(<=) to ‘ t’, and all elements after ‘t’ are greater than or
equal to ‘y’.This rearrangement is referred as Partitioning and the element A[s] is referred as pivot
or partition element. Based on the method of partitioning a set of elements, based on a chosen
element (which is selected based on divide and conquer method) for sorting ‘n’ elements, will divide
the set into two subsets as S1 and S2. Here, all elements in S1 is S1<=A[m] and S2.

The steps are:

Pick an element, called a pivot, from the list.

Reorder the list so that all elements which are less than the pivot come before the pivot and so that
all elements greater than the pivot come after it (equal values can go either way). After this
partitioning, the pivot is in its final position. This is called the partition operation.

Recursively sort the sub-list of lesser elements and the sub-list of greater elements.

Example 1:
The base case of the recursion are lists of size zero or one, which are always sorted. Considering an
array of A [1:n] where m=1 and p-1 =n, (n+1 = p), A[n+1] must be defined, and must be greater than
or equal to all elements in a A[1:n]. Consider a unsorted list as given below:

A [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]

16 8 12 15 6 3 9 5 +∞

10 5 8 12 15 6 3 9 16 +∞

10 5 8 9 15 6 3 12 16
10 5 8 9 3 6 15 12 16

14
DATA STRUCTURES Search Sort

6 5 8 9 3 10 15 12 16

Algorithm: 5.8 (To perform sort using Quick sort method)

Figure: 5.4 Representation of steps in the Quick sort


Algorithm QUICK_SORT (A, N)
// The Algorithm sorts all elements A[p]………A[q] in the array A [1: n] in ascending order, A
[n+1] is defined and must be >= to all elements in A [1: n] //
{
If (p<q) then
{
j = partition (A, p, q+1); // j refers to position of partition element //
Quicksort ( p, j-1);
Quicksort (j+1 , q);
// solve the sub problems which are not combined for solutions //
}

Sub-Algorithm: 5.8.1 (To solve the sub lists, partitioned using a partition element in the Selection
sort method)
Algorithm partition (A, m, p)
//Set A[p] =∞, the partition element is t= A[m], within A[m], A [m+1]……….., A[p-
1] elements that area to be rearranged.//
{
V = A[m]; i = m; j = p;
Repeat
i = i + 1;
until ( A[i] >= V);
repeat
j = j -1;
until ( A[j] <= V );
if ( i< j ) then Interchange ( A, i, j );
} until ( i>= j );
A[m] = A [j ] ; A[j] = V ; return j;
}
Sub-Algorithm: 5.8.1.1 (To exchange or interchange A[i] with A[j])
Algorithm Interchange (A, i, j)
// To interchange A[i] with A[j] //
{
P = A[i];

15
DATA STRUCTURES Search Sort

A[i] = A[j];
A[j] = p;

Complexity of Selection sort

The complexity Analysis of Quick sort, the number of element comparisons C (n) is
considered. The average time required is in the order of O (n log n). The Worst case time is O (n2).

5.14 HEAP SORT


Heaps are based on the notion of a complete tree, for which it is necessary to know the
definition of tree, Binary tree and complete binary tree.

Tree is a data structure which represents data items in a hierarchal order.

Binary Tree In a tree structure every node represents data value, right child address and left node
address. The starting node of the root is referred as root. A binary tree is defined as, A finite set of
nodes, where (i) it is either empty or (ii) it consists a node called root (N) with two
disjoint binary trees called left subtree (2 N) and right sub tree (2 N + 1)
A

A binary tree is completely full if it is of height, h, and has 2h+1-1 nodes.


B C
A binary tree of height, h, is complete iff
D E F G
it is empty or
its left subtree is complete of height h-1 and its right subtree is completely full of height h-2 or
Figure: 5.5 Binary
its left subtree is completely full oftree
height h-1 and its right subtree is complete of height h-1.

A complete tree is filled from the left:

all the leaves are on


the same level or
two adjacent ones and
all nodes at the lowest level are as far to the left as possible.

A binary tree T is said to be a complete Binary tree, if all its level, except the last level has
maximum number of nodes ( 21), and all nodes at the last level appear as far as left as possible.

Heaps 2 3

A4binary tree
5
has the heap
6
property
7
iff ,

8 9 10 11 1

Figure: 5.6 Complete Binary tree


16
DATA STRUCTURES Search Sort

it is empty or
The key in the root is larger than that in either child and both subtrees have the heap property.

A heap can be used as a priority queue: the highest priority item is at the root and is trivially
extracted. But if the root is deleted, we are left with two sub-trees and we must efficiently re-create
a single tree with the heap property.

Heap trees The heap structure, H is a complete Binary tree, this structure is termed as heap tree,
where it satisfies the following properties:

Each node N in H, the value of N is greater than or equal to the value of each of the children of N (
or successor’s of N). Such that a heap tree is called max heap.
If the root node contains the smallest data, compared to each of the successor nodes, it is referred as
min heap.

95
15

Figure: 5.7 (a) Max heap tree. Figure: 5.7 (b) Min heap tree.
85
45 45
25

75
55
Operations on a65
25 35
Heap tree 45
75

The two main operations are Insertion and deletion operations. In the insertion, a node is
inserted
55
85 into
65 the existing heap tree, satisfying the properties of a heap tree (depending on whether it
95

is max / min heap structure). Consider a max heap, and node to be inserted as 111.

95 95

85 85 45 45

After the
After75insertion,
75 25 25 35 35
restructure 15 15
the heap to maintain max heap property.

55
55 65
65 111
95
111 11
19
95
19
Figure:
11 5.9 (a) Max heap tree re-structuring itself to maintain the max heap property
9585 45 45
85
(b) Max heap structure after restructuring and maintaining the max heap property.
11
75 85 35 15
75 25 35 15
Figure: 5.8 (a) Max heap tree. (b) Insertion of node = 111 into the max heap structure
Deletion
55 65
of node
19 from heap 25tree.
25
55 65 111 19
The principle for deletion in a max heap structure is as,
Read the root node into a temporary storage, for e.g., ITEM.
Replace the root node by the last node in the heap tree. Then reconstruct (or reheap) the tree by
comparing the root node value with its two child. Then interchange the content of the root node with
the largest values among the child nodes. Repeat this interchange until the max heap structure i.e. re-
established.

17
DATA STRUCTURES Search Sort

99 63
63 26

57 5.8 (a) Max heap


45 Figure: 63 tree with deletion of node having data = 99
26
The Heap45sort is an application of
57Heap tree, where the records to be sorted R = R1,…..,Rn is
(b) The max heap structure after deletion of node 99.
represented as a complete binary tree. The sorting can be implemented in two methods, depending o
35 29 57 42
the type of sorted sequence required as ascending order or descending order using max-heap or min-
35 29 26 42
heap structure. 26
27 in Heap
Steps 12 Sort
249 26
27 12 249
Step 1: Build a heap tree with a given set if data.

X1

X2
(X1……………….Xn) X3

Step 2:
X4 X5 X6 X7
Delete the root node from Z6the heap.
Rebuild the heap after deletion
Xn-2 Place Xn-1
the deletedXnnode in the output.

Step 3: Repeat and continue until the heap tree is empty.


Figure: 5.9Converting the data set into its corresponding Heap tree.
Consider the input File,

26, 5, 77, 1, 61, 11, 59, 15, 48, 19

Input Heap

95
77

i=9 85
61 45
59 i=8

75
48 25
19 61
59 11
35 15
26
5

48 59
26
55
15 65
1 19
5

i=7 15 19 11 26
1 i=6
Figure: 5.10 (a) Input heap 5tree (b) Corresponding Max heap tree
48
26
5 1

19 26
11
Figure: 5.11 (a) sorted: 77 (b) sorted: 61, 77
i=5 15 5 11
1 1 i=4
5

Figure: 5.12 (a) sorted: 59, 61, 77 (b) sorted: 48, 59, 61, 77
18
DATA STRUCTURES Search Sort

15 19

5 11
15 11

i =1 3 i=2
1 5

5 11
Figure: 5.13 (a) sorted: 26, 48, 59, 61, 77 (b) sorted: 19, 26, 48, 59, 61, 77
1 5 1

Result: of heap sort algorithm


The complexity 1, 5, 11,the15,
is for 19,O(n
depth 26,log48, 59, value
n ).The 61, of77the heap structure is
Figure: 5.14 (a) sorted: 15, 19, 26, 48, 59, 61, 77 (b) sorted: 11, 15, 19, 26, 48, 59, 61, 77
that we can both extract the highest priority item and insert a new one in O (log n) time.

Algorithm: 5.9 (To perform sort using Heap sort method)


// A [1: n] contains n elements to be sorted. This algorithm
rearranges it in place into non-decreasing order //
{
Heapify ( A, n ); // Transform array into heap //

For i = n to 2 step -1 do
{
t = A [i];
A [i] = A [ 1];
A [ 1] = t ;
Adjust ( A, 1, i – 1 );
}
}

Sub-Algorithm: 5.9.1 (To create a heap structure)

Algorithm Heapify (A, n)


// Readjust the elements A [1 : n ] to form a heap //
{
For i = [ n/2 ] to 1 step -1 do
Adjust ( A, i, n);
}

Sub-Algorithm: 5.9.1.1 (To adjust the data items in such a way to maintain a heap structure)

Algorithm Adjust (A, i, n)


// Readjust the elements A [1 : n ] to form a heap //
{
j = 2i;
item = A [i];

19
DATA STRUCTURES Search Sort

while (j <=n ) do
{
If ( ( j < n ) and ( A [ j ] < A [ j + 1 ]) )
Then j = j + 1;
If ( item >= A [ j ] ) then break;
A [ ( j / 2) ] = A [j];
j = 2j;
}
A [ ( j /2) ] = item;
}

5.15 RADIX SORT

This is an extremely simplified implementation of the bucket sort algorithm. Here, several
bucket like sorts are performed (one for each digit), but instead of having a counts array representing
the range of all possible values for the data, it represents all of the possible values for each individual
digit, which in decimal numbering is only 10. Firstly a bucked sort is performed, using only the least
significant digit to sort it by, then another is done using the next least significant digit, until the end,
when you have done the number of bucket sorts equal to the maximum number of digits of your
biggest number. Because with the bucket sort, there are only 10 buckets (the counts array is of size
10), this will always be an O(n) sorting algorithm!

Radix sort is a method mainly used for alphabetizing large list of names based on the
sequence of alphabets in names, where the radix s 26, i.e., 26 letters of alphabets. In the case of
numerical data, where the radix is 10(i.e., 0 – 9), in binary numbers (the 2 digits i.e., ‘0’ and ‘1’)
Similar to the bucket sort algorithm, the sorting is based on the number at the unit position first, and
then the 10th position, 100th position and so on. If there exists ‘n’ digits in a number, to be sorted, it
takes ‘n’ passes to sort the list of numbers. Below given is an example for a Radix sort. On each of
the adapted bucket sorts it does, the count array stores the numbers of each digit. Then the offsets are
created using the counts, and then the sorted array regenerated using the offsets and the original data.

Consider the list of numbers, 42, 43, 74 11, 65, 57, 94, 36, 99, 87, 70, 81 , 61.

The radix sort after the First pass, based on unit digit position of each number it becomes,

61

81 94 87

70 11 42 23 74 65 36 57 99
0 1 2 3 4 5 6 7 8 9

20
DATA STRUCTURES Search Sort

Figure: 5.15 radix sort using the unit (10th )position digits.
Combining the contents of pockets from 0 to 9, the list becomes, 70, 11, 81, 61, 42, 23, 74, 94, 65,
36, 57, 87, and 99.

In the second pass, based on the next higher order digit, the sorting is repeated and the list becomes
sorted as below:

65 74 87 99

11 23 36 42 57 61 70 81 94
0 1 2 3 4 5 6 7 8 9

Figure: 5.16 radix sort using the 100th position digits.


This type of sorting is called Radix sort and the list becomes sorted by combining the elements from
0 to 9 at the end of the second pass. i.e. 11, 23, 36, 42, 57, 61, 65, 70, 74, 81, 87, 94, 99.

Disadvantage:

The sequential allocation technique is not practical in representing the pockets.


Inability to predict the number of records per pocket during a certain pass.

The inability to predict the number of records per pocket can be solved by using a linked allocation.
Where each pocket is represented as a linked FIFO queue and at the end of each pass, the queue can
be combined to get a ordered list.

Program:

#include <stdio.h>
#define MAX 5
#define SHOWPASS

void print(int *a,int n)


{
int i;
for(i=0;i<n;i++)
printf("%d\t",a[i]);
}

void radixsort(int *a,int n)

21
DATA STRUCTURES Search Sort

{
int i,b[MAX],m=0,exp=1;

for(i=0;i<n;i++) //finding max elt


{
if(a[i]>m)
m=a[i];
}
while(m/exp>0)
{
int bucket[10]={0};
for(i=0;i<n;i++)
bucket[a[i]/exp%10]++;
for(i=1;i<10;i++)
bucket[i]+=bucket[i-1];
for(i=n-1;i>=0;i--)
b[--bucket[a[i]/exp%10]]=a[i];
for(i=0;i<n;i++)
a[i]=b[i];
exp*=10;
#ifdef SHOWPASS
printf("\nPASS : ");
print(a,n);
#endif
}
}
int main()
{
int arr[MAX];
int i,n;
printf("Enter total elements (n < %d) : ",MAX);
scanf("%d",&n);
printf("Enter %d Elements : ",n);
for(i=0;i<n;i++)
scanf("%d",&arr[i]);
printf("\nARRAY : ");
print(&arr[0],n);
radixsort(&arr[0],n);
printf("\nSORTED : ");

22
DATA STRUCTURES Search Sort

print(&arr[0],n);
printf("\n");
return 0;

5.16 COMPARISON OF SORTING TECHNIQUES (VARIOUS SORTING


ALGORITHMS)

Sorting Implementation summary Comments Type Asymptotic


Algorithm Complexities

Quick sort
The quick sort operates as, A Exchange
Firstly a pivot is selected, and complicated Best Case:
removed from the list (hidden at the but effective
end). sorting O(n log n).
Then the elements are partitioned into algorithm
2 sections. Worst Case:
One which is less than the pivot, and O(n2)
one that is greater. This partitioning is
achieved by exchanging values. Then
the pivot is restored in the middle,
and those 2 sections are recursively
quick sorted.

Selection
sort
The Selection sort algorithm, A very simple
although not very efficient is very algorithm, to Selection Unlike the
simple. Basically, it does n2 linear code, and a Bubble sort
passes on the list, and on each pass, it very simple ,This works
selects the largest value, and swaps it one to best in the
with the last unsorted element. explain, but is order of O (
This means that it isn't stable, because little slow. It n2 ) for best
for example a 3 could be swapped could be case and
with a 5 that is to the left of a performed by worst case
different 3. selecting the are the same,
smallest because even
value, and if the list is
swapping it sorted, the
with the first same number
unsorted of selections
element. is still
performed.

23
DATA STRUCTURES Search Sort

Merge sort
It is done by taking 2 sorted lists, and
combines into another sorted list The sorting
.Traversing through the list, method is of Merge
comparing the heads of each list, type natural
removing the smallest to join the new and intrinsic
sorted list. This is an O (n) operation. Best and
Worst Case:
With 2 way sorting, this method is
applied to a single unsorted list. In (n log n)
brief, the algorithm recursively splits
up the array until it is fragmented into
pairs of two single element arrays.
Each of those single elements is then
merged with its pairs, and then those
pairs are merged with their pairs and
so on, until the entire list is united in
sorted order.

Insertion
if there is every an odd number, an Best Case:
sort
extra operation is added, where it is O(n).
added to one of the pairs, so that that Good for Insertion
particular pair will have 1 more nearly sorted Worst Case:
element than most of the others, and lists, very bad O(n2)
won't have any effect on the actual for out of
sorting. order lists,
due to large
number of
On each pass the current item is shuffling
inserted into the sorted section of the
list. It starts with the last position of
the sorted list, and moves backwards
until it finds the proper place of the
current item. This item is then
inserted into that place, and all items
after that are shuffled to the left to
accommodate it. It is for this reason,
that if the list is already sorted, then
the sorting would be O(n) because Best Case:
every element is already in its sorted O(n log n).
position.
If however the list is sorted in reverse, Worst Case:
it would take O(n2) time as it would O(n log n).
be searching through the entire sorted
section of the list each time it does an
insertion, and shuffling all other
elements down the list..

24
DATA STRUCTURES Search Sort

Heap Sort This is similar to the Straight


Selection Sorting, except, instead of
using a linear search for finding the This utilizes, Selection
maximum value in the list, and a just about the
heap is constructed, here the only good use
maximum value can easily be of heaps, that
removed (and the heap reformed) in is finding the
log n time. This means that to do n maximum
Best and
passes, each time doing a log n element, in a
Worst Case:
remove maximum, meaning that the max heap (or
Q(n)
algorithm will always run in Q(n log the minimum
n) time, as it makes no difference the of a min
original order of the list. heap). Is in
every way as
good as the
straight
selection sort,
but faster.

Radix Sort
This is an extremely smart
implementation of the bucket sort Distribution
algorithm. several buckets are used This is one
in the sorts and are performed (one among the
for each digit), but instead of having a good sorting
counts array representing the range of algorithms.
all possible values for the data, it
represents all of the possible values It search’s the
for each individual digit, which in largest list,
decimal numbering is only 10. with the
biggest
Firstly a bucked sort is performed, numbers, and
using only the least significant digit has a is
to sort it by, then another is done guaranteed O
using the next least significant digit, (n) time
complexity. Best Case:
until the end, when you have done the
O(n).
number of bucket sorts equal to the
maximum number of digits of your
Worst Case:
biggest number. Because with the
O(n2)
bucket sort, there are only 10 buckets
(the counts array is of size 10), this
will always be an O(n) sorting
algorithm.

Bubble
Sort On each pass of the data, adjacent
elements are compared, and switched
if they are out of order. eg. e1 with In general this Exchange

25
DATA STRUCTURES Search Sort

e2, then e2 with e3 and so on. is better than


This means that on each pass, the Insertion Sort
largest element that is left unsorted I believe,
has been "bubbled" to its rightful because it has
place at the end of the array. a good change
However, due to the fact that all of being
adjacent out of order pairs are sorted in
swapped, the algorithm could be much less
finished sooner. It takes O(n2) time than O(n2)
because it keeps sorting even if it is in time, unless
order. you are a
The algorithm can be ended in the blind Preiss
case when no swaps are made, follower.
thereby making the best case O(n)
(when it is already sorted) and worst
case still at O(n2).

Table: 5.3 Comparison of sorting algorithms

Hashing

We need search techniques in which there are no unnecessary comparisons. Therefore , the objective
here is to minimize the number of comparisons in order to find the desired record efficiently. There is an
approach , in which we compute the location of the desired record in order to retrieve it in single
access. This avoid the unnecessary comparisons. in this method , the location of the desired record
present in the search table depends only on the given key but not on other key.
Hashing is the process of mapping large amount of data item to smaller table with the help of
hashing function.
Hashing is also known as Hashing Algorithm or Message Digest Function.
It is a technique to convert a range of key values into a range of indexes of an array.
It is used to facilitate the next level searching method when compared with the linear or binary
search.
Hashing allows to update and retrieve any data entry in a constant time O(1).
Constant time O(1) means the operation does not depend on the size of the data.
Hashing is used with a database to enable items to be retrieved more quickly.

26
DATA STRUCTURES Search Sort

It is used in the encryption and decryption of digital signatures.

Hash Function

The basic idea in hashing is the transformation of a key into the corresponding location in the hash
table. This is done by a hash function.
It is usually denoted by H.
There are basically two type of hash function
Distribution-independent function
Distribution-dependent function
A fixed process converts a key to a hash key is known as a Hash Function.
This function takes a key and maps it to a value of a certain length which is called a Hash
value or Hash.
Hash value represents the original string of characters, but it is normally smaller than the original.
If the size of the hash table is m, then we need a hash function that can generate address in the range 0
to m-1.
Basically we have two criteria for choosing a good hash function
It should be easy to compute
It should generate address with minimum collision
The following are the most popular distribution independent hash function
Truncation
Division method
Mid-square method
Folding method
Digit analysis method
Division Reminder Method
The division remainder is the simplest and most commonly used method. In this method,the key k is
divided by the number of slot N in the hash table, and the remainder obtained after the division is used
as an index in the hash table. The hash function is
H(k)= k mod N
This function works well if the index ranges from 0 to N-1. If the index ranges from 1 to N the function
will be
H(k)=K mod N+1
Midsquare Method

27
DATA STRUCTURES Search Sort

This method also operates in two steps . first the square of the key is calculated, and then some of
teh digits from left and right ends of k2 are removed. The number obtained after removing the digits
is used as the hash value .
Folding method
The folded key is also a two-step process. In the first step , the key k is divided into several groups from
the left most digits, where each group contain n number of digits,except the last one which may contain
lesser number of digits. In the next step , these groups are added together,and the hash value is obtained
by ignoring the last carry (if any) this technique is called shift folding . This technique can be modified
to get another folding technique called boundary folding. In this technique the key is assumed to be
written on a paper which is folded at the boundaries of the parts of the parts of the key, so all even are
reversed before addition.

What is Hash Table?

Hash table or hash map is a data structure used to store key-value pairs.
It is a collection of items stored to make it easy to find them later.
It uses a hash function to compute an index into an array of buckets or slots from which the desired
value can be found.
It is an array of list where each list is known as bucket.
It contains value based on the key.
Hash table is used to implement the map interface and extends Dictionary class.
Hash table is synchronized and contains only unique elements.

The above figure shows the hash table with the size of n = 10. Each position of the hash table is
called as Slot. In the above hash table, there are n slots in the table, names = {0, 1, 2, 3, 4, 5, 6, 7, 8,
9}. Slot 0, slot 1, slot 2 and so on. Hash table contains no items, so every slot is empty.
As we know the mapping between an item and the slot where item belongs in the hash table is
called the hash function. The hash function takes any item in the collection and returns an integer in
the range of slot names between 0 to n-1.
Suppose we have integer items {26, 70, 18, 31, 54, 93}. One common method of determining a
hash key is the division method of hashing and the formula is :

28
DATA STRUCTURES Search Sort

Hash Key = Key Value % Number of Slots in the Table

Division method or reminder method takes an item and divides it by the table size and returns the
remainder as its hash value.

Data Item Value % No. of Slots Hash Value


26 26 % 10 = 6 6
70 70 % 10 = 0 0
18 18 % 10 = 8 8
31 31 % 10 = 1 1
54 54 % 10 = 4 4
93 93 % 10 = 3 3

After computing the hash values, we can insert each item into the hash table at the designated
position as shown in the above figure. In the hash table, 6 of the 10 slots are occupied, it is referred
to as the load factor and denoted by, λ = No. of items / table size. For example , λ = 6/10.
It is easy to search for an item using hash function where it computes the slot name for the item and
then checks the hash table to see if it is present.
Constant amount of time O(1) is required to compute the hash value and index of the hash table at
that location.

29
DATA STRUCTURES Search Sort

Collision Resolution Techniques

An idea hash function should perform one to one mapping between set of all possible keys and all
hash table address but this is almost impossible , and no hash function can totally prevent collision .
A collision occurs whenever a key is mapped to an address that is already occupied ,and the different
collision resolution techniques suggest for an alternate place where this key can be placed. The two
most commonly used techniques

Separate chaining (open hashing or or Rehashing)

Open addressing (closed hashing or chaining)

Open Addressing

In open addressing, the key which caused the collision is placed inside the hash table itself but at a
location other than its hash address. Initially a key value is mapped to a particular address in the hash
table. If that address is already occupied then we will try to insert the key at some other empty
location inside the table.The array is assumed to be closed and hence this method is named as closed
hashing.

Three methods to search for an empty location inside the table

Linear probing

Quadratic probing

Double Hashing

Linear Probing

Take the above example, if we insert next item 40 in our collection, it would have a hash value of 0
(40 % 10 = 0). But 70 also had a hash value of 0, it becomes a problem. This problem is called
as Collision or Clash. Collision creates a problem for hashing technique.
Linear probing is used for resolving the collisions in hash table, data structures for maintaining
a collection of key-value pairs.
Linear probing was invented by Gene Amdahl, Elaine M. McGraw and Arthur Samuel in 1954 and
analyzed by Donald Knuth in 1963.
It is a component of open addressing scheme for using a hash table to solve the dictionary problem.

30
DATA STRUCTURES Search Sort

The simplest method is called Linear Probing. Formula to compute linear probing is:
h(k,i)=(h(k)+i) mod N
where h(k)= k mod N, N is the table Size , i=0 to N-1

Quadratic Probing
In quadratic probing, ·
When collision occurs, we probe for i^2 ‘th bucket in ith iteration.
We keep probing until an empty bucket is found.

DOUBLE HASHING

H(k,i)=(H l(k)+i * H ll(k)) mod N


where H l (k)= k mod N,
H ll(k)= k mod (N-1 / N-2) N is the table Size , i=0 to N-1

31

You might also like