0% found this document useful (0 votes)
27 views16 pages

Ch-4: Sorting Methods

Advanced Computing Concepts [COMP 8547]. University of Windsor. My notes.

Uploaded by

Rashed Hasan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views16 pages

Ch-4: Sorting Methods

Advanced Computing Concepts [COMP 8547]. University of Windsor. My notes.

Uploaded by

Rashed Hasan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

MergeSort

The height h of the merge-sort tree is O(log n)


▪ at each recursive call we split the sequence into two halves
• The overall amount or work done at the nodes of depth i is O(n)
▪ we partition and merge 2i sequences of size n2i
▪ we make 2i+1 recursive calls
• Thus, the total worst-case running time of merge-sort is O(n log n)
• Complexity of mergesort can be analyzed using recurrences (studied
later)
• In-place mergesort is rather complex and not suitable for practical
applications
Quicksort
● not the fastest (lol)

worst case
The worst case for Quicksort occurs when the pivot is the unique minimum
or maximum element
• One of L and G has size n - 1 and the other has size 0
• The running time is proportional to the sum (comparisons)
avg case

In-place-quicksort
Heapsort
Heapsort is a comparison-based sorting algorithm that uses a binary heap
data structure to sort elements. It operates in two main phases:

1. Building a Max-Heap: In this phase, the array is transformed into a


max-heap, where the largest element is at the root (top of the heap).
This is done by starting from the last non-leaf node and performing a
"heapify" operation, ensuring that each subtree satisfies the heap
property.
2. Extracting Elements: Once the max-heap is constructed, the root
(largest element) is swapped with the last element in the heap, and
the size of the heap is reduced by one. The heapify process is
repeated to maintain the max-heap property for the remaining
elements. This extraction is repeated until the entire array is sorted.

Key Characteristics:
● Time complexity: Heapsort has a time complexity of O(n log n) in all
cases (best, worst, and average).
● In-place sorting: It sorts the array without requiring extra space,
aside from a constant amount for variables.
● Not stable: Heapsort does not preserve the relative order of equal
elements.

Heapsort is useful when in-place sorting is required with guaranteed O(n


log n) performance, but its lack of stability can be a drawback for some
applications.
Radix
Counting Sort
• Counting sort is a simplification (particular
case) of Radix sort
• Instead of using buckets, it
uses N “counters”
• The sorted list is obtained
directly from the counters
• The worst-case running
time is O(n + N)
• If N = O(n), the
running time is O(n)
• Counting sort can be
applied to arrays of positive
integers
• Strings and other types
have to be converted to
integers
Summary
Here’s a comparison of MergeSort, QuickSort, HeapSort, RadixSort, and CountingSort
based on various characteristics, along with cases where they are best used:

1. MergeSort

● Time Complexity:
○ Best: O(n log n)
○ Average: O(n log n)
○ Worst: O(n log n)
● Space Complexity: O(n) (requires additional memory for temporary arrays)
● Stability: Stable (preserves the order of equal elements)
● In-place: No (requires extra space)
● Divide-and-Conquer: Yes
● Recursive: Yes

Use Cases:

● When stability is important (e.g., sorting records by multiple fields).


● For sorting linked lists (because of its non-in-place nature, which works well with
pointers).
● When working with very large datasets that don’t fit into memory, as it can be
implemented as an external sorting algorithm.
● It has consistent O(n log n) time complexity in all cases.

2. QuickSort

● Time Complexity:
○ Best: O(n log n)
○ Average: O(n log n)
○ Worst: O(n²) (when the pivot is always the smallest or largest element)
● Space Complexity: O(log n) (in-place, except for recursive stack space)
● Stability: Not stable (unless specifically modified)
● In-place: Yes
● Divide-and-Conquer: Yes
● Recursive: Yes

Use Cases:

● When average-case performance is prioritized over worst-case (typically performs better


than merge sort due to lower memory overhead).
● In scenarios where in-place sorting is necessary.
● If space efficiency is important (less space overhead compared to merge sort).
● Optimizations like choosing a good pivot (e.g., median-of-three) can mitigate the
worst-case behavior.
● Used in practical applications (standard libraries) due to its average-case speed and
in-place nature.

3. HeapSort

● Time Complexity:
○ Best: O(n log n)
○ Average: O(n log n)
○ Worst: O(n log n)
● Space Complexity: O(1) (in-place)
● Stability: Not stable
● In-place: Yes
● Divide-and-Conquer: No
● Recursive: No

Use Cases:

● When in-place sorting is required with guaranteed O(n log n) performance.


● Suitable for situations where stability is not important.
● Often used in embedded systems or performance-critical applications where memory is
limited.
● It is generally slower than QuickSort in practice due to cache inefficiencies, but provides
a worst-case guarantee.

4. RadixSort

● Time Complexity:
○ Best: O(d*(n+k))
○ Average: O(d*(n+k))
○ Worst: O(d*(n+k))
Where d is the number of digits (or the length of the key), n is the number of
elements, and k is the range of digit values.
● Space Complexity: O(n + k) (requires additional space for digit buckets)
● Stability: Stable
● In-place: No
● Divide-and-Conquer: No
● Recursive: No
Use Cases:

● For sorting integers, fixed-length strings, or other structures where the comparison of
individual components (digits/characters) is easy and bounded (like in IP addresses).
● When the input size is large, but the range of values is limited (such as phone numbers
or dates).
● Best suited for datasets with known, fixed-size ranges, and where key length is small
compared to the number of elements.
● Often used in sorting large datasets where comparison-based sorts are too slow.

5. CountingSort

● Time Complexity:
○ Best: O(n + k)
○ Average: O(n + k)
○ Worst: O(n + k)
Where n is the number of elements, and k is the range of the input.
● Space Complexity: O(n + k)
● Stability: Stable
● In-place: No
● Divide-and-Conquer: No
● Recursive: No

Use Cases:

● Ideal for sorting when the range of input values (k) is not significantly larger than the
number of elements.
● Commonly used when sorting non-comparative items like integers or objects mapped to
small integers.
● It’s efficient when the range of the data is much smaller than the number of elements
(e.g., test scores, age groups).
● Not suitable when k is much larger than n because the space and time complexity grow
with k.
When to Use:

● MergeSort: When stable sorting is needed or when sorting linked lists.


● QuickSort: For general-purpose sorting where average performance is more important
than worst-case, and memory is limited.
● HeapSort: When guaranteed O(n log n) performance is needed with minimal memory
overhead.
● RadixSort: For sorting large datasets with fixed-length numeric or string data.
● CountingSort: For small integer ranges or categorical data where the number of distinct
values is relatively small compared to the number of elements.

Each sorting algorithm has strengths in specific scenarios, making them well-suited for different
types of data and constraints.

compare sorting algos based on ds size


The size of the dataset plays a key role in deciding which sorting algorithm to use, as different
algorithms perform better or worse depending on whether the dataset is small, medium, or
large. Here’s a comparison of which sorting algorithm to use based on the size of the dataset:

1. Small Datasets (< 1000 elements)

For small datasets, the constant factors in the algorithms’ time complexities matter more than
their asymptotic behavior. Therefore, algorithms with low overhead and fast execution on small
inputs tend to perform better.
● QuickSort: QuickSort is often the fastest for small datasets because it has a low
overhead and typically performs very well in practice. The average-case time complexity
of O(n log n) and the fact that it sorts in-place make it ideal for small arrays.
● MergeSort: If stability is required and the space overhead is acceptable, MergeSort can
be used. However, it’s generally slower than QuickSort for smaller datasets due to the
overhead of merging and additional memory usage.
● HeapSort: HeapSort is less efficient for small datasets due to the overhead of
maintaining the heap structure. It's generally not recommended unless the dataset is
particularly sensitive to worst-case performance or if space constraints make it
necessary.
● RadixSort and CountingSort: These non-comparative algorithms are overkill for very
small datasets unless the range of values is very small and simple (like integers in a
narrow range). CountingSort may be efficient if the dataset consists of small integers
with limited range, but it introduces extra space overhead.

2. Medium Datasets (1,000 - 100,000 elements)

For medium-sized datasets, algorithms with better average-case time complexities (O(n log n))
become more relevant.

● QuickSort: QuickSort remains a strong choice for medium datasets, due to its excellent
average-case performance and in-place sorting. It is widely used in practical applications
and libraries, making it highly optimized for these sizes.
● MergeSort: MergeSort is also effective for medium datasets, especially if stability is
important. The overhead of recursion and extra memory is more manageable here, and
it provides consistent O(n log n) performance.
● HeapSort: HeapSort could be considered for medium datasets if you need to guarantee
O(n log n) time complexity. However, it is generally slower than QuickSort and
MergeSort due to its inefficiency in cache usage.
● RadixSort: RadixSort can become a good option for medium datasets if the range of
values (or key length) is not too large. For example, sorting large numbers of integers or
fixed-length strings can be done efficiently with RadixSort as its time complexity can be
linear in terms of the dataset size.
● CountingSort: This can be a good choice if the range of values (k) is still small relative
to n. If you're sorting integers or other discrete values within a narrow range,
CountingSort can be faster than comparison-based sorts.

3. Large Datasets (> 100,000 elements)

For large datasets, time complexity and space efficiency become even more important.
Algorithms that scale well with the dataset size are preferred.

● QuickSort: While QuickSort can still be fast for large datasets, care needs to be taken to
avoid the worst-case O(n²) behavior, which can happen with poorly chosen pivots.
Randomized or median-of-three pivot selection can help mitigate this risk, making
QuickSort suitable for many large datasets.
● MergeSort: MergeSort is a great option for large datasets, especially if external sorting
is needed (i.e., when the dataset does not fit in memory). Its consistent O(n log n)
performance and stability make it suitable for large-scale sorting operations. The only
downside is the extra space overhead.
● HeapSort: HeapSort is an option for large datasets if in-place sorting is critical and you
cannot afford the additional memory overhead of MergeSort. However, it is usually
slower than QuickSort and MergeSort due to poor cache performance.
● RadixSort: RadixSort becomes very efficient for large datasets where the key size
(number of digits or characters) is small compared to the dataset size. Sorting large
numbers of integers, or strings with fixed lengths, can be very fast with RadixSort. Its
linear time complexity (in terms of the number of elements) makes it ideal for large-scale
sorting, provided the range of values is manageable.
● CountingSort: CountingSort is efficient for large datasets only if the range of values k is
small relative to n. For large datasets with a large range of values, CountingSort can
become impractical due to high space requirements.

You might also like