This document summarizes a lecture on sorting algorithms and data structures. It discusses quicksort and heapsort, describing their running times, worst cases, and average cases. Quicksort uses a divide and conquer approach, partitioning the array and recursively sorting subarrays. Its average case is O(n log n) but worst case is O(n^2). Heapsort uses a binary heap data structure to sort in O(n log n) time while sorting in place, like selection sort. The document also covers building heaps, priority queues, and heap applications.
Download as PPT, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
88 views
Algorithms and Data Structures: Simonas Šaltenis
This document summarizes a lecture on sorting algorithms and data structures. It discusses quicksort and heapsort, describing their running times, worst cases, and average cases. Quicksort uses a divide and conquer approach, partitioning the array and recursively sorting subarrays. Its average case is O(n log n) but worst case is O(n^2). Heapsort uses a binary heap data structure to sort in O(n log n) time while sorting in place, like selection sort. The document also covers building heaps, priority queues, and heap applications.
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 41
September 19, 2002 1
Algorithms and Data
Structures Lecture IV Simonas altenis Nykredit Center for Database Research Aalborg University [email protected] September 19, 2002 2 This Lecture Sorting algorithms Quicksort a popular algorithm, very fast on average Heapsort Heap data structure and priority queue ADT September 19, 2002 3 Why Sorting? When in doubt, sort one of the principles of algorithm design. Sorting used as a subroutine in many of the algorithms: Searching in databases: we can do binary search on sorted data A large number of computer graphics and computational geometry problems Closest pair, element uniqueness September 19, 2002 4 Why Sorting? (2) A large number of sorting algorithms are developed representing different algorithm design techniques. A lower bound for sorting O(n log n) is used to prove lower bounds of other problems September 19, 2002 5 Sorting Algorithms so far Insertion sort, selection sort Worst-case running time O(n 2 ); in-place Merge sort Worst-case running time O(n log n), but requires additional memory O(n); September 19, 2002 6 Quick Sort Characteristics sorts almost in "place," i.e., does not require an additional array like insertion sort, unlike merge sort very practical, average sort performance O(n log n) (with small constant factors), but worst case O(n 2 ) September 19, 2002 7 Quick Sort the Principle To understand quick-sort, lets look at a high-level description of the algorithm A divide-and-conquer algorithm Divide: partition array into 2 subarrays such that elements in the lower part <= elements in the higher part Conquer: recursively sort the 2 subarrays Combine: trivial since sorting is done in place September 19, 2002 8 Partitioning Linear time partitioning procedure Partition(A,p,r) 01 xA[r] 02 ip-1 03 jr+1 04 while TRUE 05 repeat jj-1 06 until A[j] sx 07 repeat ii+1 08 until A[i] >x 09 if i<j 10 then exchange A[i]A[j] 11 else return j 17 12 6 19 23 8 5 10 i j i j 10 12 6 19 23 8 5 17 j i 10 5 6 19 23 8 12 17 j i 10 5 6 8 23 19 12 17 i j 10 5 6 8 23 19 12 17 s X=10 s September 19, 2002 9 Quick Sort Algorithm Initial call Quicksort(A, 1, length[A]) Quicksort(A,p,r) 01 if p<r 02 then qPartition(A,p,r) 03 Quicksort(A,p,q) 04 Quicksort(A,q+1,r) September 19, 2002 10 Analysis of Quicksort Assume that all input elements are distinct The running time depends on the distribution of splits September 19, 2002 11 Best Case If we are lucky, Partition splits the array evenly ( ) 2 ( / 2) ( ) T n T n n = + O September 19, 2002 12 Worst Case What is the worst case? One side of the parition has only one element 1 1 2 ( ) (1) ( 1) ( ) ( 1) ( ) ( ) ( ) ( ) n k n k T n T T n n T n n k k n = = = + + O = +O = O = O = O
September 19, 2002 13
Worst Case (2) September 19, 2002 14 Worst Case (3) When does the worst case appear? input is sorted input reverse sorted Same recurrence for the worst case of insertion sort However, sorted input yields the best case for insertion sort! September 19, 2002 15 Analysis of Quicksort Suppose the split is 1/10 : 9/10 ( ) ( /10) (9 /10) ( ) ( log )! T n T n T n n n n = + + O = O September 19, 2002 16 An Average Case Scenario Suppose, we alternate lucky and unlucky cases to get an average behavior ( ) 2 ( / 2) ( ) lucky ( ) ( 1) ( ) unlucky we consequently get ( ) 2( ( / 2 1) ( / 2)) ( ) 2 ( / 2 1) ( ) ( log ) L n U n n U n L n n L n L n n n L n n n n = + O = + O = + O + O = + O = O n 1 n-1 (n-1)/2 (n-1)/2 ( ) n O (n-1)/2+1 (n-1)/2 n ( ) n O September 19, 2002 17 An Average Case Scenario (2) How can we make sure that we are usually lucky? Partition around the middle (n/2th) element? Partition around a random element (works well in practice) Randomized algorithm running time is independent of the input ordering no specific input triggers worst-case behavior the worst-case is only determined by the output of the random-number generator
September 19, 2002 18 Randomized Quicksort Assume all elements are distinct Partition around a random element Consequently, all splits (1:n-1, 2:n-2, ..., n- 1:1) are equally likely with probability 1/n
Randomization is a general tool to improve algorithms with bad worst-case but good average-case complexity September 19, 2002 19 Randomized Quicksort (2) Randomized-Partition(A,p,r) 01 iRandom(p,r) 02 exchange A[r] A[i] 03 return Partition(A,p,r) Randomized-Quicksort(A,p,r) 01 if p<r then 02 qRandomized-Partition(A,p,r) 03 Randomized-Quicksort(A,p,q) 04 Randomized-Quicksort(A,q+1,r) September 19, 2002 20 Selection Sort A takes O(n) and B takes O(1): O(n 2 ) in total Idea for improvement: use a data structure, to do both A and B in O(lg n) time, balancing the work, achieving a better trade-off, and a total running time O(n log n) Selection-Sort(A[1..n]): For i n downto 2 A: Find the largest element among A[1..i] B: Exchange it with A[i] September 19, 2002 21 Heap Sort Binary heap data structure A array Can be viewed as a nearly complete binary tree All levels, except the lowest one are completely filled The key in root is greater or equal than all its children, and the left and right subtrees are again binary heaps Two attributes length[A] heap-size[A] September 19, 2002 22 Heap Sort (3) 1 2 3 4 5 6 7 8 9 10 16 15 10 8 7 9 3 2 4 1 Parent (i) return i/2 Left (i) return 2i Right (i) return 2i+1 Heap propertiy: A[Parent(i)] > A[i] Level: 3 2 1 0 September 19, 2002 23 Heap Sort (4) Notice the implicit tree links; children of node i are 2i and 2i+1 Why is this useful? In a binary representation, a multiplication/division by two is left/right shift Adding 1 can be done by adding the lowest bit September 19, 2002 24 Heapify i is index into the array A Binary trees rooted at Left(i) and Right(i) are heaps But, A[i] might be smaller than its children, thus violating the heap property The method Heapify makes A a heap once more by moving A[i] down the heap until the heap property is satisfied again September 19, 2002 25 Heapify (2) September 19, 2002 26 Heapify Example September 19, 2002 27 Heapify: Running Time The running time of Heapify on a subtree of size n rooted at node i is determining the relationship between elements: O(1) plus the time to run Heapify on a subtree rooted at one of the children of i, where 2n/3 is the worst-case size of this subtree.
Alternatively Running time on a node of height h: O(h)
( ) (2 / 3) (1) ( ) (log ) T n T n T n O n s + O = September 19, 2002 28 Building a Heap Convert an array A[1...n], where n = length[A], into a heap Notice that the elements in the subarray A[(n/2 + 1)...n] are already 1-element heaps to begin with! Building a Heap September 19, 2002 30 Building a Heap: Analysis Correctness: induction on i, all trees rooted at m > i are heaps Running time: n calls to Heapify = n O(lg n) = O(n lg n) Good enough for an O(n lg n) bound on Heapsort, but sometimes we build heaps for other reasons, would be nice to have a tight bound Intuition: for most of the time Heapify works on smaller than n element heaps September 19, 2002 31 Building a Heap: Analysis (2) Definitions height of node: longest path from node to leaf height of tree: height of root
time to Heapify = O(height of subtree rooted at i) assume n = 2 k 1 (a complete binary tree k = lg n) ( ) ( ) lg lg 2 1 1 1 1 1 ( ) 2 3 ... 1 2 4 8 1/ 2 1 since 2 2 2 1 1/ 2 ( ) n n i i i i n n n T n O k i i O n O n ( (
= = + + + | | = + + + + | \ . | | = + = = | |
\ . =
September 19, 2002 32 Building a Heap: Analysis (3) How? By using the following "trick"
Therefore Build-Heap time is O(n) ( ) ( ) 0 1 2 1 2 1 1 1 if 1 //differentiate 1 1 //multiply by 1 1 //plug in 2 1 1/ 2 2 2 1/ 4 i i i i i i i i x x x i x x x x i x x x i
= = <
= =
= =
September 19, 2002 33
Heap Sort The total running time of heap sort is O(n lg n) + Build-Heap(A) time, which is O(n) O( ) n Heap Sort September 19, 2002 35 Heap Sort: Summary Heap sort uses a heap data structure to improve selection sort and make the running time asymptotically optimal Running time is O(n log n) like merge sort, but unlike selection, insertion, or bubble sorts Sorts in place like insertion, selection or bubble sorts, but unlike merge sort September 19, 2002 36 Priority Queues A priority queue is an ADT(abstract data type) for maintaining a set S of elements, each with an associated value called key A PQ supports the following operations Insert(S,x) insert element x in set S (SS{x}) Maximum(S) returns the element of S with the largest key Extract-Max(S) returns and removes the element of S with the largest key September 19, 2002 37 Priority Queues (2) Applications: job scheduling shared computing resources (Unix) Event simulation As a building block for other algorithms A Heap can be used to implement a PQ September 19, 2002 38 Priority Queues (3) Removal of max takes constant time on top of Heapify (lg ) n O September 19, 2002 39 Priority Queues (4) Insertion of a new element enlarge the PQ and propagate the new element from last place up the PQ tree is of height lg n, running time: (lg ) n O September 19, 2002 40 Priority Queues (5)
September 19, 2002 41 Next Week ADTs and Data Structures Definition of ADTs Elementary data structures Trees