DSD Unit 3 Sorting and Searching
DSD Unit 3 Sorting and Searching
DESIGN
Unit – 3
Sorting and Searching
SORTING AND SEARCHING
SORTING TECHNIQUES
Sorting is the process of arranging the elements of a list into either ascending or descending
order. There are two types of sorting techniques. They are,
• Internal sorting
• External sorting
Internal Sorting
If sorting operation is done in main memory of the computer, then it is known as internal
sorting.
Example: Bubble sort, insertion sort, selection sort, quick sort
External Sorting
If sorting is done in secondary memory of the computer, then it is known as external
sorting.
BUBBLE SORT
A = [56,91,35,75,48]
bubbleSort(A)
print(A)
Advantages
Besides the memory the list occupies, the bubble sort requires very little
memory.
The bubble sort is made up of only a few lines of code.
Useful for small data sets.
Disadvantage
It is highly inefficient for large data sets, with a running time of O(n2).
Follow study with JBR Trisea You Tube Channel for Tamil Explanation
SELECTION SORT
Selection sort is an in-place sorting algorithm. This sorting technique improves over bubble
sort by making only one exchange in each pass.
The selection sort finds the smallest element in the list and it is swapped with the first
element in the list.
Then the second smallest element is searched and it is swapped with the second element in
the list.
This selection and exchange process continues, until all the elements in the list have been
sorted in ascending order.
This algorithm is called selection sort since it repeatedly selects the smallest element.
Algorithm
1. Find the minimum value in the list
2. Swap it with the value in the current position
Implementation
def selectionsort(A):
for i in range(len(A)):
minimum=i
for j in range(i+1,len(A)):
if A[j]<A[minimum]:
minimum = j
swap(A,minimum,i)
def swap(A,x,y):
temp=A[x]
A[x]=A[y]
A[y]=temp
A=[54,26,93,17,77,31,44,55,20]
selectionsort(A)
INSERTION SORT
A=[54,26,93,17,77,31,44,55,20]
insertionsort(A)
print(A)
Advantages
Easy to implement.
Efficient for small data sets and already substantially sorted data sets.
Disadvantages
Insertion sort is inefficient against more extensive data sets.
The insertion sort exhibits the worst-case time complexity of O(n2)
MERGE SORT
• Quick sort is an example of divide and conquer algorithm, whereby the list is divided into sub-
sequences, recur to sort each subsequence, and then combine the sorted sub-sequences by a simple
concatenation. It is also called partition exchange sort.
Algorithm:
• Divide: If the sequence S has at least two elements, select a specific element x from S, which is
called the pivot. In common practice, choose the pivot x to be the last element in S.
1. Remove all the elements from S and put them into three sequences:
• • L, storing the elements in S less than x
• • E, storing the elements in S equal to x (Pivot element)
• • G, storing the elements in S greater than x
• If the elements of S are distinct, then E holds just one element i.e., the pivot itself.
• 3. Conquer: Recursively sort sequences L and G.
• 4. Combine: Put back the elements into S in order by first inserting the elements of L, then those
Quick Sort Tree
• The execution of quick-sort can be visualized by means of a binary recursion
tree, called the quick-sort tree. The step-by-step evolution of the quick-sort tree
is shown in Figure. In this example, the pivot element is chosen as last element
in the sequence.
Pivot Selection
Pivot at Random
Instead of picking the pivot as the first or last element of S, it can be a
random element of S. This variation of quick-sort is called randomized
quick-sort.
The expected running time of randomized quick-sort on a sequence with
n elements is O(nlog n).
Pivot at Median of three
Another common technique for choosing a pivot is to use the median of
three values, and it is taken from the front, middle, and tail of the array.
This median-of-three will more often choose a good pivot.
Computing a median of three may require lower overhead than selecting
a pivot with a random number generator.
Optimizations for quick sort (In-Place Quick Sort)
• An algorithm is in-place if it uses only a small amount of memory in addition to that
needed for the original input.
The quick-sort algorithm in-place requires the input sequence itself to store the sub-
sequences for all the recursive calls.
In-place quick-sort modifies the input sequence using element swapping and does not
explicitly create sub-sequences.
Instead, a subsequence of the input sequence is implicitly represented by a range of
positions specified by a leftmost index l and a rightmost index r.
Index l scans the sequence from left to right, and index r scans the sequence from right
to left.
A swap is performed when l is at an element as large as the pivot and r is at an element
as small as the pivot.
A final swap with the pivot, completes the divide step.
Implementation
def inplacequicksort(S, a, b): S[left], S[right] = S[right], S[left]
if a >= b: left, right = left + 1, right - 1
return S[left], S[b] = S[b], S[left]
pivot = S[b] inplacequicksort(S, a, left - 1)
left = a inplacequicksort(S, left + 1, b)
right = b-1
while left <= right:
while left <= right and S[left] < pivot:
left += 1
while left <= right and pivot < S[right]:
right -= 1
if left <= right:
• Complexity
The way to get close to the best-case running time, is choosing the
pivot to divide the input sequence S almost equally.
That is, having pivots close to the “middle” of the set of elements
leads to an O(nlog n) best case running time for quick-sort.
COMPARISON OF SORTING ALGORITHMS
SEARCHING TECHNIQUES
Searching is a process of determining whether an element is present in a given list of elements
or not. If the element is found, then search is successful otherwise it is considered as an
unsuccessful search. It is useful to retrieve the information proficiently from the database.
LINEAR SEARCH
The simplest search problem is the sequential or linear search algorithm.
When the sequence is unsorted the standard approach for searching a target value is sequential
search.
This technique iterates over the sequence, one item at a time, until the specific item is found or
all items have been examined.
In Python, a target item can be found in a sequence using the in operator
if key in A :
print( "The key is in the array." )
else :
print( "The key is not in the array." )
The use of the in operator is, it makes the code simple and easy to read but it hides the inner
workings.
Example
To search element 31 in the array, the search begins with the value in the first position.
Since the first element does not contain the target value, the next element in sequential
order is compared to value 31. This process is repeated until the item is found in the sixth
position.
Similarly, to search element 8 in the same array, then the search begins in the same
manner, starting with the first element until the desired element is found or end of the
sequence. Here it is an unsuccessful search.
Implementation
def linearsearch(A,key):
found=0
for i in range(len(A)):
if A[i] == key:
found+=1
else:
continue
if found==0:
print("Key not found")
else:
print("Key found")
num=[10,51,2,18,4,31,13,5,23,64,29]
linearsearch(num,31)
• Complexity
• Any algorithm is analyzed based on the unit of computation it performs. For
linear search, it needs to count the number of comparisons performed, but each
comparison may or may not search the desired item.
• Linear search is a simple search algorithm. But it is inefficient for large size list.
BINARY SEARCH
The binary search is used to efficiently locate a target value within a sorted sequence of n elements. Here the
sequence is sorted and indexible.
• For any index j, all the values stored at indices 0 to j-1 are less than or equal to the value at index j, and all the
values stored at indices j+1 to n-1 are greater than equal to that at index j. This allows to quickly search target
value.
• The algorithm maintains two parameters, low and high, such that all the candidate entries have index at least
low and at most high. Initially, low = 0 and high = n−1. Then compare the target value to the median candidate,
that is, the item data[mid] with index
mid = (low+high)/2.
Consider three cases:
If the target equals data[mid], then the item is found, and the search terminates successfully.
If target < data[mid], then we recur on the first half of the sequence, that is, on the interval of indices from low to
mid−1.
If target > data[mid], then we recur on the second half of the sequence, that is, on the interval of indices from
mid+1 to high.
An unsuccessful search occurs if low > high, as the interval [low, high] is empty. This algorithm is known as
binary search.
Example
• To search 10 in a sorted list of elements, first determine the middle element of
the list. As the middle item contains 18, which is greater than the target value
10, so discard the second half of the list and repeat the process to first half of
the list. This process is repeated until the desired target item is located in the list.
If the item is found then it returns True, otherwise it returns False.
Searching for 10 in a sorted array using the binary search
Implementation
def binarysearch(data, target, low, high):
if low > high:
return False
else:
mid = (low + high) // 2
if target == data[mid]:
return True
elif target < data[mid]:
return binarysearch(data, target, low, mid − 1)
else:
return binarysearch(data, target, mid + 1, high)
This binary search algorithm requires O(log n) time. Whereas the sequential search algorithm
uses O(n) time.
thank you