Module 5 - Chapter 10 - Searching - MSDS 6203 Data Systems and Algorithms N2A
Module 5 - Chapter 10 - Searching - MSDS 6203 Data Systems and Algorithms N2A
Lesson
Data elements can be stored in any kind of data structure, such as an array, link list, tree, or graph; the search operation is very important for
many applications, mostly whenever we want to know if a particular data element is present in an existing list of data items. In order to retrieve
the information efficiently, we require an efficient search algorithm.
A search operation is carried out to find the location of the desired data item from a collection of data items. The search algorithm returns the
location of the searched value where it is present in the list of items and if the data item is not present, it returns None .
Efficient searching is important to efficiently retrieve the location of the desired data item from a list of stored data items. For example, we
have a long list of data values, such as {1, 45, 65, 23, 65, 75, 23} , and we want to see if 75 is present in the list or not. It becomes
important to have an efficient search algorithm when the list of data items becomes large.
There are two different ways in which data can be organized, which can affect how a search algorithm works:
• First, the search algorithm is applied to a list of items that is already sorted; that is, it is applied to an ordered set of items. For
example, [1, 3, 5, 7, 9, 11, 13, 15, 17] .
• The search algorithm is applied to an unordered set of items, which is not sorted. For example, [11, 3, 45, 76, 99, 11, 13, 35, 37] .
Let us start with an introduction to searching and a definition and then look at the linear search algorithm.
Linear Search
The search operation is used to find out the index position of a given data item in a list of data items. If the searched item is available in the
given list of data items, then the search algorithm returns the index position where it is located; otherwise, it returns that the item is not found.
Here, the index position is the location of the desired item in the given list.
The simplest approach to search for an item in a list is to search linearly, in which we look for items one by one in the whole list. Let’s take an
example of six list items {60, 1, 88, 10, 11, 100} to understand the linear search algorithm, as shown in Figure 10.1:
Figure 10.1: An example of linear search
The preceding list has elements that can be accessed through the index. To find an element in the list, we can search for the given element
linearly one by one. This technique traverses the list of elements by using the index to move from the beginning of the list to the end. Each
element is checked, and if it does not match the search item, the next item is examined. By hopping from one item to the next, the list is
traversed sequentially. We use list items with integer values in this chapter to help you understand the concept, since integers can be
compared easily; however, a list item can hold any other data type as well.
The linear search approach depends on how the list items are stored in memory—whether they are already sorted in order or they are not
sorted. Let’s first see how the linear search algorithm works if the given list of items is not sorted.
The unordered linear search is a linear search algorithm in which the given list of date items is not sorted. We linearly match the desired data
item with the data items of the list one by one till the end of the list or until the desired data item is found. Consider an example list that
contains the elements 60 , 1 , 88 , 10 , and 100 —an unordered list. To perform a search operation on such a list, one proceeds with the first
item and compares that with the search item. If the search item is not matched, then the next element in the list is checked. This continues till
we reach the last element in the list or until a match is found.
In an unordered list of items, the search for the term 10 starts from the first element and moves to the next element in the list. Thus,
firstly 60 is compared with 10 , and since it is not equal, we compare 66 with the next element 1 , then 88 , and so on till we find the search
term in the list. Once the item is found, we return the index position of where we have found the desired item. This process is shown in Figure
10.2:
Here is the implementation in Python for the linear search on an unordered list of items:
if term == unordered_list[i]:
return i
return None
The search function takes two parameters; the first is the list that holds the data, and the second parameter is the item that we are looking
for, called the search term. On every pass of the for loop, we check if the search term is equal to the indexed item. If this is true, then there
is a match, and there is no need to proceed further with the search. We return the index position where the searched item is found in the list. If
the loops run to the end of the list with no match found, then None is returned to signify that there is no such item in the list.
We can use the following code snippet to check if a desired data element is present in the given list of data items:
search_term = 10
index_position = search(list1, search_term)
print(index_position)
list2 = ['packt', 'publish', 'data']
search_term2 = 'data'
Index_position2 = search(list2, search_term2)
print(Index_position2)
In the output of the above code, firstly, the index position 3 is returned when we search for data element 10 in list1 . And secondly, index
position 2 is returned when data item 'data' is searched for in list2 . We can use the same algorithm for searching a non-numeric data
item from a list of non-numeric data items in Python, since string elements can also be compared similarly to numeric data in Python.
When searching for any element from an unordered list of items, in the worst case the desired item may be in the last position or may not be
present in the list. In this situation we will have to compare the search item with all the elements of the list, i.e. n times if the total number of
data items in the list is n . Thus, the unordered linear search has a worst-case running time of O(n) . All the elements may need to be visited
before finding the search term. The worst-case scenario will be when the search term is located at the last position of the list.
Binary Search
• Jump Search
Divides the list into blocks and performs linear searches within these blocks, balancing between linear and binary search.
The jump search algorithm is an improvement over linear search for searching for a given element from an ordered (or sorted) list of
elements. This uses the divide-and-conquer strategy in order to search for the required element. In linear search, we compare the search
value with each element of the list, whereas in jump search, we compare the search value at different intervals in the list, which reduces
the number of comparisons.
In this algorithm, firstly, we divide the sorted list of data into subsets of data elements called blocks. Within each block, the highest value
will lie within the last element, as the array is sorted. Next, in this algorithm, we start comparing the search value with the last element of
each block. There can be three conditions:
1. If the search value is less than the last element of the block, we compare it with the next block.
2. If the search value is greater than the last element of the block, it means the desired search value must be present in the current block.
So, we apply linear search in this block and return the index position.
3. If the search value is the same as the compared element of the block, we return the index position of the element and we return the
candidate.
Generally, the size of the block is taken as , since it gives the best performance for a given array of length n .
In the worst-case situation, we will have to make n/m number of jumps (here, n is the total number of elements, and m is the block size) if
the last element of the last block is greater than the item to be searched, and we will need m - 1 comparisons for linear search in the last
block. Therefore, the total number of comparisons will be ((n/m) + m - 1), which will minimize when m = √n . So the size of the block is
taken as √n since it gives the best performance.
Let’s take an example list {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11} to search for a given element (say 10 ):
In the above example, we find the desired element 10 in 5 comparisons. Firstly, we compare the first value of the array with the desired
item A[0] <= item ; if it is true, then we increase the index by the block size (this is shown in step 1 in Figure 10.4). Next, we compare the
desired item with the last element of each block. If it is greater, then we move to the next block, such as from block 1 to block 3 (this is
shown in steps 2, 3, and 4 in Figure 10.4).
Further, when the desired search element becomes smaller than the last element of a block, we stop incrementing the index position and
then we do the linear search in the current block. Now, let us discuss the implementation of the jump searching algorithms. Firstly, we
implement the linear search algorithm, which is similar to what we discussed in the previous section.
for i in range(ordered_list_size):
if term == ordered_list[i]:
return i
return -1
In the above code, given an ordered list of elements, it returns the index of the location where a given data element is found in the list. It
returns –1 if the desired element is not found in the list. Next, we implement the jump_search() method as follows:
block_size = int(math.sqrt(list_size))
i = 0
while i != len(ordered_list)-1 and ordered_list[i] <= item:
print("Block under consideration - {}".format(ordered_list[i: i+block_size]))
if i+ block_size > len(ordered_list):
block_size = len(ordered_list) - i
block_list = ordered_list[i: i+block_size]
j = search_ordered(block_list, item)
if j == -1:
print("Element not found")
return
return i + j
if ordered_list[i + block_size -1] == item:
return i+block_size-1
elif ordered_list[i + block_size - 1] > item:
if j == -1:
return i + j
i += block_size
In the above code, firstly we assign the length of the list to the variable n , and then we compute the block size as . Next, we start
with the first element, index 0, and then continue searching until we reach the end of the list.
We start with the starting index i = 0 with a block of size m, and we continue incrementing until the window reaches the end of the list.
We compare whether ordered_list [I + block_size -1] == item . If they match, it returns the index position (i+ block_size -1) . The
code snippet for this is as follows:
return i+ block_size -1
If ordered_list [i+ block_size -1] > item , we proceed to carry out the linear search algorithm inside the current block block_array =
j = search_ordered(block_array, item)
if j == -1:
return
return i + j
In the above code, we use the linear search algorithm in the subarray. It returns –1 if the desired element is not found in the list;
otherwise, the index position of (i + j) is returned. Here, i is the index position until the previous block where we may find the
desired element and j is the position of the data element within the block where the desired element is matched. This process is also
depicted in Figure 10.5.
In this figure, we can see that i is in index position 5, and then j is the number of elements within the final block where we find the
desired element, i.e. 2 , so the final returned index will be 5 + 2 = 7 :
Figure 10.5: Demonstration of index position i and j for the search value 8
• Interpolation Search
Uses the value of the target element to estimate its position, improving efficiency for uniformly distributed data.
• Exponential Search
Combines binary search with exponential backtracking, suitable for unbounded or infinite lists.
• Database Queries
Efficiently retrieves records based on specific criteria.
• Search Engines
Indexes and searches web content rapidly and accurately.
• Data Retrieval Systems
Manages large volumes of data, ensuring quick access and updates.
Performance Analysis
Summary
Lessons Learned from This Module:
Searching algorithms are fundamental for retrieving data efficiently. This chapter explores basic search techniques like linear and binary
search, along with advanced algorithms such as jump, interpolation, and exponential search. It discusses searching in various data structures
and real-world applications like database queries and search engines. Performance analysis, practical examples, and best practices are
provided to optimize search operations.