Chapter 3 Sorting and Searching
Chapter 3 Sorting and Searching
3.1. Sorting
Introduction
We sort many things in our everyday lives: A handful of cards when playing Bridge; bills and other
piles of paper; jars of spices; and so on. And we have many intuitive strategies that we can use to do the
sorting, depending on how many objects we have to sort and how hard they are to move around.
Sorting is also one of the most frequently performed computing tasks. We might sort the records in a
database so that we can search the collection efficiently.
Because sorting is so important, naturally it has been studied intensively and many algorithms have
been devised. Some of these algorithms are straightforward adaptations of schemes we use in everyday
life. Others are totally alien to how humans do things, having been invented to sort thousands or even
millions of records stored on the computer. After years of study, there are still unsolved problems
related to sorting. New algorithms are still being developed and refined for special purpose
applications.
While introducing this central problem in computer science, this chapter has a secondary purpose of
illustrating many important issues in algorithm design and analysis. The collection of sorting
algorithms presented will illustrate the different classification of sorting algorithms.
Sorting algorithms will be used to illustrate a wide variety of analysis techniques in this chapter. We’ll
see how it is possible to speed up sorting algorithms by taking advantage of the best case behavior of
another algorithm. We’ll see several examples of how we can tune an algorithm for better performance.
Input to the sorting algorithms presented in this chapter is a collection of records stored in an array.
Records are compared to one another by means of a comparator.
The Sorting Problem allows input with two or more records that have the same key value. Certain
applications require that input not contain duplicate key values. The sorting algorithms presented in this
chapter and in Chapter 8 can handle duplicate key values unless noted otherwise.
such that
This is essentially the process of converting an abstract list into an abstract sorted list.
Under this chapter we are going to cover four types of basic sorting algorithm:
1. Bubble Sort
2. Insertion sort
3. Selection Sort
The first sorting algorithm we are going to look into is called Bubble Sort- the way it operates is
supposedly reminiscent of bubbles floating up in a glass of spa rood.
Pass
i=1 j=0 j=1 j=2 j=3
i=4 j=0
• The number of comparison between elements and the number of exchange between elements
determine the efficiency of Bubble Sort algorithm.
• Generally, the number of comparisons between elements in Bubble Sort can be stated as follows:
In any cases, (worse case, best case or average case) to sort the list in ascending order the number of
comparisons between elements is the same.
–Worse Case 75 74 54 4 2
– Average Case 74 75 4 2 54
– Best Case 2 4 54 74 75
All lists with 5 elements need 10 comparisons to sort all the data.
In the example given, it can be seen that the number of comparison for worse case and best case is the
same ‐ with 10 comparisons.
• The difference can be seen in the number of swapping elements. Worse case has maximum number of
swapping: 10, while best case has no swapping since all data is already in the right position.
Ins: Fikrezgy Yohannes CoSC2083-Data Structures and Algorithms 4|Page
• For best case, starting with pass one, there is no exchange of data occur.
• From the example, it can be concluded that in any pass, if there is no exchange of data occur, the list
is already sorted. The next pass shouldn't be continued and the sorting process should stop.
Bubble sort is not very fast. Various suggestions have been made to improve it.
For example, a Boolean variable can be set to false at the beginning of each pass through the list and set to
true whenever a swap is made. If the flag is false when the pass is completed, then no swaps were done and
the array is sorted, so the algorithm can halt. This gives exactly the same worst case complexity, but a best
case complexity of only n. The average case complexity is still in O(n2), however, so this is not much of an
improvement.
To improve the efficiency of Bubble Sort, a condition that check whether the list is sorted should be
add at the external loop. A Boolean variable, sorted is added in the algorithm to signal whether there is
any exchange of elements occur in certain pass.
In external loop, sorted is set to true. If there is exchange of data inside the inner loop, sorted is set to
false. Another pass will continue, if sorted is false and will stop if sorted is true.
Consider the Best case for the improved bubble sort with array 2 4 54 74 75
Pass
i=1 j=0 j=1 j=2 j=3
2 4 54 74 75 2 4 54 74 75 2 4 54 74 75 2 4 54 74 75
No swaps made on an already sorted array, (n-1) comparisons, zero (0) exchange (swap)
Ins: Fikrezgy Yohannes CoSC2083-Data Structures and Algorithms 5|Page
In pass 1 (when i=1), there is no exchange of data occur and variable sorted is always True. Therefore,
condition statement in external loop will become false and the loop will stop execution. In this
example, pass 2 will not be continued.
pass = 1
Analysis - For best case, the number of comparison between elements is 4, (n-1) which is O(n).
We continue with insertion sort, which is an efficient algorithm for sorting a small number of elements.
Insertion sort works the way many people sort a hand of playing cards. We start with an empty left
hand and the cards face down on the table. We then remove one card at a time from the table and insert
it into the correct position in the left hand. To find the correct position for a card, we compare it with
each of the cards already in the hand, from right to left, as illustrated in Figure 3.1. At all times, the
cards held in the left hand are sorted, and these cards were originally the top cards of the pile on the
table.
It is a simple sorting algorithm which sorts the array by shifting elements one by one. Following are
some of the important characteristics of Insertion sort.
A list with only one element is already sorted, so the elements inserted begin with the second element
in the array. The inserted element is held in the key variable and values in the sorted portion of the array
are moved up to make room for the inserted element in the same loop where the search is done to find
the right place to make the insertion. Once it is found, the loop ends and the inserted element is placed
into the sorted portion of the array.
Though Insertion sort works in the way depicted above, each pass contains several steps to shift
elements. For example:
The time taken by the INSERTION-SORT procedure depends on the input: sorting a thousand numbers
takes longer than sorting three numbers. Moreover, INSERTIONSORT can take different amounts of
time to sort two input sequences of the same size depending on how nearly sorted they already are. In
general, the time taken by an algorithm grows with the size of the input, so it is traditional to describe
the running time of a program as a function of the size of its input. To do so, we need to define the
terms “running time” and “size of input” more carefully.
Insertion sort does different things depending on the contents of the list, so we must consider its worst, best,
and average case behavior. If the list is already sorted, one comparison is made for each of n-1 elements as
they are “inserted” into their current locations. So the best case behavior of Insertion sort is
BC(n) = n-1= O (n)
The worst case occurs when every inserted element must be placed at the beginning of the already sorted
portion of the list; this happens when the list is in reverse order. In this case, the first element inserted
requires one comparison, the second two, the third three, and so forth, and n-1 elements must be inserted.
Hence
WC(n) = Σi=1 to n-1i = n(n-1)/2 = O (n2)
The idea behind Selection sort is to make repeated passes through the list, each time finding the largest
(or smallest) value in the unsorted portion of the list, and placing it at the end (or beginning) of the
unsorted portion, thus shrinking the unsorted portion and growing the sorted portion. Thus, the
algorithm works by repeatedly “selecting” the item that goes at the end of the unsorted portion of the
list.
Selection Sort is essentially a Bubble Sort, except that rather than repeatedly swapping adjacent values
to get the next smallest record into place, we instead remember the position of the element to be
selected and do one swap at the end.
Although the number of comparisons that Selection sort makes is identical to the number that Bubble sort
makes, Selection sort usually runs considerable faster. This is because Bubble sort typically makes many
swaps on every pass through the list, while Selection sort makes only one. Nevertheless, neither of these
sorts is particularly fast.
[0]
[1]
[2]
[3]
[4]
[5]
[6]
[7]
An example of Selection Sort. Each column shows the array after the iteration with the indicated value of i in the outer for
loop. Numbers above the line in each column have been sorted and are in their final positions.
On the first pass through the list, Selection sort makes n-1 comparison; on the next pass, it makes n-2
comparisons; on the third, it makes n-3 comparisons, and so forth. It makes n-1 passes altogether, so its
complexity is O(n2)
Summary
Organizing and retrieving information is at the heart of most computer applications, and searching is
surely the most frequently performed of all computing tasks. Search can be viewed abstractly as a
process to determine if an element with a particular value is a member of a particular set. The more
common view of searching is an attempt to find the record within a collection of records that has a
particular key value or those records in a collection whose key values meet some criterion such as
falling within a range of values.
Objective:
This section allows students to grasp the following:
· Understand the searching technique concept and the purpose of searching operation.
· Understand the implementation of basic searching algorithm:
1. Sequential search
§ Sequential search on unsorted data.
§ Sequential search on sorted data.
2. Binary search
· Able to analyze the efficiency of the searching technique.
· Able to implement searching technique in problem solving.
Searching Definition
1. Clifford A. Shaffer [1997] defines searching as a process to determine whether an element is a
member of a certain data set.
2. The process of finding the location of an element with a specific value (key) within a collection
of elements.
3. The process can also be seen as an attempt to search for a certain record in a life.
a. Each record contains data field and key field
b. Key field is a group of character or numbers used as an identifier for each record.
c. Searching can be done based on the key field.
The sequential search (also called linear search) on array based always starts at the first element in the
list and continues until either the item is found in the list or the entire list is searched.
If the search item is found, its index (that is, its location in the array) is returned. If the search is
unsuccessful, -1 is returned.
· Basic sequential search usually is implemented to search item from unsorted list/ array.
· The technique can be implemented on a small size of list. This is because the efficiency of
sequential search is low compared to other searching techniques.
· In a sequential search:
1. Every element in the array will be examined sequentially, starting from the first element.
2. The process will be repeated until the last element of the array or until the searched data is
found.
· Used for searching that involves records stored in the main memory (RAM)
· Searching strategy:
1. Examines each element in the array one by one (sequentially) and compares its value with
the one being looked for – the search key
2. Search is successful if the search key matches with the value being compared in the array.
Searching process is terminated.
3. else, if no matches is found, the search process is continued to the last element of the array.
Search is failed array if there is no match found from the array.
Suppose that you want to determine whether key = 22 is in the list. The sequential search works as
follows:
First, you compare 22 with data [0] – that is, compare 22 with 11. Because data [0] ≠ 22, you then
compare with data [1] (that is 33, the second element in the array). Because data [1] ≠ 22, you
compare 22 with the next element in the array list – that is, compare 22 with data [2]. Because data [2]
= 22, return index of data [2] which is 2 and stop searching. This is a successful search.
Let us now search for key = 10. As before, the search starts with the first element in the array – that is,
at data [0]. This time the search key, which is 10, is compared with every element in the array.
Eventually, no more data is left in the array to compare with the search key. This is an unsuccessful
search. Return index = -1, to indicate that the search key is not found.
This section analyzes the performance of the sequential search algorithm in both the worst case and the
average case.
The statements before and after the loop are executed only once and, hence, require very little computer
time. The statements in the for loop are the ones that are repeated several times. For each iteration of the
loop, the search item is compared with an element in the list, and a few other statements are executed,
including some other comparisons. Clearly, the loop terminates as soon as the search item is found in the
list. Therefore, the execution of the other statements in the loop is directly related to the outcome of the key
comparison. Also, different programmers might implement the same algorithm differently, although the
number of key comparisons would typically be the same. The speed of a computer can also easily affect the
time an algorithm takes to perform, but not the number of key comparisons.
Therefore, when analyzing a search algorithm, we count the number of key comparisons because this
number gives us the most useful information. Furthermore, the criteria for counting the number of key
comparisons can be applied equally well to other search algorithms.
· If the searched key is located at the end of the list or the key is not found, then the loop will be
repeated based on the number of element in the list, O(n). (it’s worst case)
· If the list can be found at index 0, then searching time is, O(1). (It’s best case)
Problem:
Search key is compared with all elements in the list, O(n) time consuming for large datasets.
Solution:
The efficiency of basic search technique can be improved by searching on a sorted list.
For searching on ascending list, the search key will be compared one by one until:
1. The searched key is found.
2. Or until the searched key value is smaller than the item compared in the list.
=> This will minimize the searching process.
Example:
As you can see, the sequential search is not efficient for large list because, on average, the sequential search
algorithm is having to traverse the entire list, O(n). We therefore describe another search algorithm, called
the binary search, which is very fast. However, a binary search can be performed only on ordered list. We,
therefore, assume that the list is ordered. Sorting the list (array) does minimize the cost of traversing the
whole data set, but we can improve the searching efficiency by using the Binary search algorithm.
When people search for something in an ordered list (like a dictionary or a phone book) they do not start at
the first element and march through the list one element at a time. They jump into the middle of the list, see
where they are relative to what they are looking for, and then jump either forward or backward and look
again, continuing in this way until they find what they are looking for, or determine that it is not in the list.
The binary search algorithm uses the divide-and-conquer technique to search the list. Binary search
takes the same tack in searching for a key value in a sorted list: the key is compared with the middle
element in the list. If it is the key, the search is done; if the key is less than the middle element, then the
process is repeated for the first half of the list; if the key is greater than the middle element, then the process
is repeated for the second half of the list. Eventually, either the key is found in the list, or the list is reduced
to nothing (the empty list), at which point we know that the key is not present in the list.
Strategy
0 1 2 3 4 5 6 7 8 9
Binary search Starts by comparing the search key with the element at the middle.
i) If the value matches, return index to the calling function and stop searching (index = Middle)
ii) If the search key < the middle element, search will be focused on the elements between the first element
to the element before the middle element (middle-1) (change value of right to right = middle-1 and
calculate new middle=(left + right)/2)
Left Middle
0 1 2 3 4 5 6 7 8 9
Middle (new) Right
iii) If the search key > the middle element, search will only be focused on the elements between the
element next to middle and the last element (right) (change value of left to left= middle +1 and calculate
new value of middle, middle=(left + right )/2)
Middle Right
0 1 2 3 4 5 6 7 8 9
Left Middle (new)
Ins: Fikrezgy Yohannes CoSC2083-Data Structures and Algorithms 16 | P a g e
iv) Search is repeated until the searched key is found or the last element in the subset is traversed
(left>right)
· Binary Search starts searching by comparing element in the middle. Thus the searching process
start at n/2 for a list size = n.
· If the middle value does not match with the search key, then the searching area will be reduced
to the left or right sub-list only. This will reduce the searching area to ½ n.
· From half of the list, the second middle value will be identified. Again, if the middle value does
not match with the search key, the searching area will be reduced to the left or right sub list
only. The searching area will reduce ½ ( ½ n).
· The process of looking for middle point and reducing the searching area to the left or right sub-
list will be repeated until the middle value is equal to the middle value (search key is found) or
the last value in sub-list has been traverse.
· If the repetition occur k times, then at iteration k , the searching area is reduced to ( ½ )kn.
Best case for binary search happen if the search key is found in the middle array. O(1).
Worse case for binary search happens if the search key is not in the list or the searching size equal to 1.
Search Conclusion
· Searching is a process to allocate an element in a list and return the index of the searched
element.
· Basic searching techniques: sequential search and binary search.
· Sequential search can be implemented on sorted and unsorted list, while binary search can be
implemented only on sorted list.
· Sequential search on sorted data is more efficient than sequential search on unsorted data.
· Binary search is more efficient than sequential search.
· Basic searching techniques explained in this class are only suitable for small sets of data.
Hashing and indexing are suitable for searching large sets of data.