0% found this document useful (0 votes)
44 views

Module 5 - Chapter 10 - Searching - MSDS 6203 Data Systems and Algorithms N2A

Uploaded by

izzieapptest
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views

Module 5 - Chapter 10 - Searching - MSDS 6203 Data Systems and Algorithms N2A

Uploaded by

izzieapptest
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Module 5: Chapter 10 - Searching

Lesson

Chapter 10: Searching Algorithms


An important operation for all data structures is searching for elements from a collection of data. There are various methods to search for an
element in data structures; in this chapter, we shall explore the different strategies that can be used to find elements in a collection of items.

• Importance of Searching in Data Structures


Searching is critical for many applications, ensuring quick access to data stored in various structures.
• Overview of Basic Search Techniques
Search techniques include linear and binary searches, among others, each suitable for different types of data and scenarios.

Data elements can be stored in any kind of data structure, such as an array, link list, tree, or graph; the search operation is very important for
many applications, mostly whenever we want to know if a particular data element is present in an existing list of data items. In order to retrieve
the information efficiently, we require an efficient search algorithm.

In this chapter, we will learn about the following:

• Various search algorithms


• Linear search algorithm
• Jump search algorithm
• Binary search algorithm
• Interpolation search algorithm
• Exponential search algorithm

Introduction to Searching Algorithms


Searching algorithms are fundamental to efficiently retrieve information from data structures.

A search operation is carried out to find the location of the desired data item from a collection of data items. The search algorithm returns the
location of the searched value where it is present in the list of items and if the data item is not present, it returns None .

Efficient searching is important to efficiently retrieve the location of the desired data item from a list of stored data items. For example, we
have a long list of data values, such as {1, 45, 65, 23, 65, 75, 23} , and we want to see if 75 is present in the list or not. It becomes
important to have an efficient search algorithm when the list of data items becomes large.

There are two different ways in which data can be organized, which can affect how a search algorithm works:

• First, the search algorithm is applied to a list of items that is already sorted; that is, it is applied to an ordered set of items. For
example, [1, 3, 5, 7, 9, 11, 13, 15, 17] .
• The search algorithm is applied to an unordered set of items, which is not sorted. For example, [11, 3, 45, 76, 99, 11, 13, 35, 37] .

Let us start with an introduction to searching and a definition and then look at the linear search algorithm.

Linear Search
The search operation is used to find out the index position of a given data item in a list of data items. If the searched item is available in the
given list of data items, then the search algorithm returns the index position where it is located; otherwise, it returns that the item is not found.
Here, the index position is the location of the desired item in the given list.

The simplest approach to search for an item in a list is to search linearly, in which we look for items one by one in the whole list. Let’s take an
example of six list items {60, 1, 88, 10, 11, 100} to understand the linear search algorithm, as shown in Figure 10.1:
Figure 10.1: An example of linear search

The preceding list has elements that can be accessed through the index. To find an element in the list, we can search for the given element
linearly one by one. This technique traverses the list of elements by using the index to move from the beginning of the list to the end. Each
element is checked, and if it does not match the search item, the next item is examined. By hopping from one item to the next, the list is
traversed sequentially. We use list items with integer values in this chapter to help you understand the concept, since integers can be
compared easily; however, a list item can hold any other data type as well.

The linear search approach depends on how the list items are stored in memory—whether they are already sorted in order or they are not
sorted. Let’s first see how the linear search algorithm works if the given list of items is not sorted.

Unordered linear search

The unordered linear search is a linear search algorithm in which the given list of date items is not sorted. We linearly match the desired data
item with the data items of the list one by one till the end of the list or until the desired data item is found. Consider an example list that
contains the elements 60 , 1 , 88 , 10 , and 100 —an unordered list. To perform a search operation on such a list, one proceeds with the first
item and compares that with the search item. If the search item is not matched, then the next element in the list is checked. This continues till
we reach the last element in the list or until a match is found.

In an unordered list of items, the search for the term 10 starts from the first element and moves to the next element in the list. Thus,
firstly 60 is compared with 10 , and since it is not equal, we compare 66 with the next element 1 , then 88 , and so on till we find the search
term in the list. Once the item is found, we return the index position of where we have found the desired item. This process is shown in Figure
10.2:

Figure 10.2: Unordered linear search

Here is the implementation in Python for the linear search on an unordered list of items:

def search(unordered_list, term):


for i, item in enumerate(unordered_list):

if term == unordered_list[i]:

return i

return None
The search function takes two parameters; the first is the list that holds the data, and the second parameter is the item that we are looking
for, called the search term. On every pass of the for loop, we check if the search term is equal to the indexed item. If this is true, then there
is a match, and there is no need to proceed further with the search. We return the index position where the searched item is found in the list. If
the loops run to the end of the list with no match found, then None is returned to signify that there is no such item in the list.

We can use the following code snippet to check if a desired data element is present in the given list of data items:

list1 = [60, 1, 88, 10, 11, 600]

search_term = 10
index_position = search(list1, search_term)
print(index_position)
list2 = ['packt', 'publish', 'data']
search_term2 = 'data'
Index_position2 = search(list2, search_term2)
print(Index_position2)

The output of the above code is as follows:

In the output of the above code, firstly, the index position 3 is returned when we search for data element 10 in list1 . And secondly, index
position 2 is returned when data item 'data' is searched for in list2 . We can use the same algorithm for searching a non-numeric data
item from a list of non-numeric data items in Python, since string elements can also be compared similarly to numeric data in Python.

When searching for any element from an unordered list of items, in the worst case the desired item may be in the last position or may not be
present in the list. In this situation we will have to compare the search item with all the elements of the list, i.e. n times if the total number of
data items in the list is n . Thus, the unordered linear search has a worst-case running time of O(n) . All the elements may need to be visited
before finding the search term. The worst-case scenario will be when the search term is located at the last position of the list.

• Concept and Implementation


Linear search sequentially checks each element of a list until the desired element is found or the list ends.
• Use Cases and Performance Analysis
Useful for small or unsorted datasets, but with a time complexity of O(n), it is less efficient for large datasets.

Binary Search

• Concept and Implementation


Binary search efficiently finds an element in a sorted list by repeatedly dividing the search interval in half.
• Pre-Requisites (Sorted Data)
Binary search requires the data to be sorted beforehand.
• Performance Analysis
With a time complexity of O(log n), binary search is significantly faster than linear search for large datasets.

Advanced Search Algorithms

• Jump Search
Divides the list into blocks and performs linear searches within these blocks, balancing between linear and binary search.

The jump search algorithm is an improvement over linear search for searching for a given element from an ordered (or sorted) list of
elements. This uses the divide-and-conquer strategy in order to search for the required element. In linear search, we compare the search
value with each element of the list, whereas in jump search, we compare the search value at different intervals in the list, which reduces
the number of comparisons.

In this algorithm, firstly, we divide the sorted list of data into subsets of data elements called blocks. Within each block, the highest value
will lie within the last element, as the array is sorted. Next, in this algorithm, we start comparing the search value with the last element of
each block. There can be three conditions:
1. If the search value is less than the last element of the block, we compare it with the next block.
2. If the search value is greater than the last element of the block, it means the desired search value must be present in the current block.
So, we apply linear search in this block and return the index position.
3. If the search value is the same as the compared element of the block, we return the index position of the element and we return the
candidate.

Generally, the size of the block is taken as , since it gives the best performance for a given array of length n .

In the worst-case situation, we will have to make n/m number of jumps (here, n is the total number of elements, and m is the block size) if
the last element of the last block is greater than the item to be searched, and we will need m - 1 comparisons for linear search in the last
block. Therefore, the total number of comparisons will be ((n/m) + m - 1), which will minimize when m = √n . So the size of the block is
taken as √n since it gives the best performance.

Let’s take an example list {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11} to search for a given element (say 10 ):

Figure 10.4: Illustration of the jump search algorithm

In the above example, we find the desired element 10 in 5 comparisons. Firstly, we compare the first value of the array with the desired
item A[0] <= item ; if it is true, then we increase the index by the block size (this is shown in step 1 in Figure 10.4). Next, we compare the
desired item with the last element of each block. If it is greater, then we move to the next block, such as from block 1 to block 3 (this is
shown in steps 2, 3, and 4 in Figure 10.4).

Further, when the desired search element becomes smaller than the last element of a block, we stop incrementing the index position and
then we do the linear search in the current block. Now, let us discuss the implementation of the jump searching algorithms. Firstly, we
implement the linear search algorithm, which is similar to what we discussed in the previous section.

It is given again here for the sake of completeness as follows:

def search_ordered(ordered_list, term):

print("Entering Linear Search")


ordered_list_size = len(ordered_list)

for i in range(ordered_list_size):
if term == ordered_list[i]:

return i

elif ordered_list[i] > term:


return -1

return -1

In the above code, given an ordered list of elements, it returns the index of the location where a given data element is found in the list. It
returns –1 if the desired element is not found in the list. Next, we implement the jump_search() method as follows:

def jump_search(ordered_list, item):


import math
print("Entering Jump Search")
list_size = len(ordered_list)

block_size = int(math.sqrt(list_size))
i = 0
while i != len(ordered_list)-1 and ordered_list[i] <= item:
print("Block under consideration - {}".format(ordered_list[i: i+block_size]))
if i+ block_size > len(ordered_list):

block_size = len(ordered_list) - i
block_list = ordered_list[i: i+block_size]
j = search_ordered(block_list, item)
if j == -1:
print("Element not found")

return
return i + j
if ordered_list[i + block_size -1] == item:
return i+block_size-1
elif ordered_list[i + block_size - 1] > item:

block_array = ordered_list[i: i + block_size - 1]


j = search_ordered(block_array, item)

if j == -1:

print("Element not found")


return

return i + j
i += block_size

In the above code, firstly we assign the length of the list to the variable n , and then we compute the block size as . Next, we start
with the first element, index 0, and then continue searching until we reach the end of the list.

We start with the starting index i = 0 with a block of size m, and we continue incrementing until the window reaches the end of the list.
We compare whether ordered_list [I + block_size -1] == item . If they match, it returns the index position (i+ block_size -1) . The
code snippet for this is as follows:

if ordered_list[i+ block_size -1] == item:

return i+ block_size -1

If ordered_list [i+ block_size -1] > item , we proceed to carry out the linear search algorithm inside the current block block_array =

ordered_list [i : i+ block_size-1] , as follows:

elif ordered_list[i+ block_size -1] > item:


block_array = ordered_list[i: i+ block_size -1]

j = search_ordered(block_array, item)

if j == -1:

print("Element not found")

return
return i + j

In the above code, we use the linear search algorithm in the subarray. It returns –1 if the desired element is not found in the list;
otherwise, the index position of (i + j) is returned. Here, i is the index position until the previous block where we may find the
desired element and j is the position of the data element within the block where the desired element is matched. This process is also
depicted in Figure 10.5.

In this figure, we can see that i is in index position 5, and then j is the number of elements within the final block where we find the
desired element, i.e. 2 , so the final returned index will be 5 + 2 = 7 :
Figure 10.5: Demonstration of index position i and j for the search value 8

• Interpolation Search
Uses the value of the target element to estimate its position, improving efficiency for uniformly distributed data.
• Exponential Search
Combines binary search with exponential backtracking, suitable for unbounded or infinite lists.

Search in Data Structures

• Search in Linked Lists


Typically uses linear search due to the sequential nature of linked lists.
• Search in Trees (Binary Search Trees)
Utilizes the tree structure for efficient searching, insertion, and deletion operations.
• Search in Hash Tables
Uses hash functions for constant time complexity in the average case.

Real-World Applications of Search Algorithms

• Database Queries
Efficiently retrieves records based on specific criteria.
• Search Engines
Indexes and searches web content rapidly and accurately.
• Data Retrieval Systems
Manages large volumes of data, ensuring quick access and updates.

Performance Analysis

• Time Complexity of Various Search Algorithms


Analyzing the time complexity helps in choosing the appropriate search algorithm for specific use cases.
• Space Complexity Considerations
Space complexity also plays a crucial role, particularly for large datasets.

Case Studies and Practical Examples

• Implementing a Search Feature in a Database


Designing a search algorithm to efficiently query large databases.
• Optimizing Search Operations in a Large Dataset
Applying advanced search techniques to handle extensive data efficiently.

Best Practices and Common Pitfalls


• Ensuring Efficient Search Implementation
Using the right algorithm for the data structure and dataset size to optimize performance.
• Avoiding Common Mistakes in Search Algorithms
Common errors include not considering the sorted state of data or the nature of the dataset.

Summary
Lessons Learned from This Module:
Searching algorithms are fundamental for retrieving data efficiently. This chapter explores basic search techniques like linear and binary
search, along with advanced algorithms such as jump, interpolation, and exponential search. It discusses searching in various data structures
and real-world applications like database queries and search engines. Performance analysis, practical examples, and best practices are
provided to optimize search operations.

You might also like