0% found this document useful (0 votes)
59 views

Unit 2: Sorting and Order Statistics: Course Contents

Author: 張耀文 Yao-Wen Chang [email protected] http://cc.ee.ntu.edu.tw/~ywchang Graduate Institute of Electronics Engineering Department of Electrical Engineering National Taiwan University Fall 2004

Uploaded by

hrsli
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views

Unit 2: Sorting and Order Statistics: Course Contents

Author: 張耀文 Yao-Wen Chang [email protected] http://cc.ee.ntu.edu.tw/~ywchang Graduate Institute of Electronics Engineering Department of Electrical Engineering National Taiwan University Fall 2004

Uploaded by

hrsli
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Unit 2: Sorting and Order Statistics

․Course contents:
⎯ Heapsort
⎯ Quicksort
⎯ Sorting in linear time
⎯ Order statistics
․Readings:
⎯ Chapters 6, 7, 8, 9

Unit 2 Y.-W. Chang 1


Types of Sorting Algorithms
․A sorter is in-place if only a constant # of elements of the
input are ever stored outside the array.
․A sorter is comparison-based if the only operation on
keys is to compare two keys.
⎯ Insertion sort, merge sort, heapsort, quicksort
․The non-comparison-based sorters sort keys by looking
at the values of individual elements.
⎯ Counting sort: Assumes keys are in [1..k] and uses array
indexing to count the # of elements of each value.
⎯ Radix sort: Assumes each integer contains d digits, and
each digit is in [1..k'].
⎯ Bucket sort: Requires information for input distribution.

Unit 2 Y.-W. Chang 2


Sorting Algorithms

Unit 2 Y.-W. Chang 3


Binary Heap
․Binary heap data structure: represented by an array A
⎯ Complete binary tree, except that some rightmost leaves on
the bottom level may be missing.
⎯ Max-Heap property: A node's key ≥ its children's keys.
⎯ Min-Heap property: A node's key ≤ its children's keys.
․Implementation
⎯ Root: A[1].
⎯ For A[i], LEFT child is A[2i], RIGHT child is A[2i+1], and
PARENT is A[⎣i/2⎦].
⎯ heap-size[A] (# of elements in the heap stored within A) ≤
length[A] (# of elements in A).

Unit 2 Y.-W. Chang 4


MAX-HEAPIFY: Maintaining the Heap Property
․ Assume that subtrees RIGHT(i) and LEFT(i) are heaps, but
A[i] may be smaller than its children.
․ MAX-HEAPIFY(A, i) will “float down” the value at A[i] so that
the subtree rooted at A[i] becomes a heap.

Unit 2 Y.-W. Chang 5


MAX-HEAPIFY: Complexity
MAX-HEAPIFY(A, i)
1. l ← LEFT(i)
2. r ← RIGHT(i)
3. if l ≤ heap-size[A] and A[l] > A[i]
4. then largest ← l
5. else largest ← i
6. if r ≤ heap-size[A] and A[r] > A[largest]
7. then largest ← r
8. if largest ≠ i
9. then exchange A[i] ↔ A[largest]
10. MAX-HEAPIFY(A, largest)

․Worst case: last row of


binary tree is half empty ⇒
children's subtrees have
size ≤ 2n/3.
․Recurrence: T(n) ≤ T(2n/3)
+ θ (1) ⇒ T(n) = O(lgn)
Unit 2 Y.-W. Chang 6
BUILD-MAX-HEAP: Building a Max-Heap
․Intuition: Use MAX-HEAPIFY in a bottom-up manner to
convert A into a heap.
⎯ Leaves are already heaps, start at parents of leaves, and
work upward till the root.

Unit 2 Y.-W. Chang 7


BUILD-MAX-HEAP: Complexity
BUILD-MAX-HEAP(A)
1. heap-size[A] ← length[A]
2. for i ← ⎣length[A]/2 ⎦ downto 1
3. do MAX-HEAPIFY(A,i)
․Naive analysis: O(n lg n) time in total.
⎯ About n/2 calls to HEAPIFY.
⎯ Each takes O(lg n) time.
․Careful analysis: O(n) time in total.
⎯ Each MAX-HEAPIFY takes O(h) time (h: tree height).
⎯ At most ⎡ n/2h+1 ⎤ nodes of height h in an n-element array.
⎡ n ⎤
∑ ⎣⎢lg n ⎦⎥
(#nodes in height h)O(h) = ∑ h =0
⎣⎢lg n ⎦⎥
⎯ T(n) = ⎢ h +1 ⎥O ( h ) =
h =0 ⎢2 ⎥
⎡ hh ⎤
O ( n ∑ h =0 ⎢ hh ⎥ ) = O ( n )
⎢⎣lg n ⎥⎦

⎢22 ⎥
⎯ Note: (1) cf. height & depth, (2) Won't improve the overall
complexity of the heap sort.
Unit 2 Y.-W. Chang 8
Tree Height and Depth
․Height of a node: # of edges on the longest simple
downward path from the node to a leaf
․Depth: Length of the path from the root to a node

height = 3 depth = 0

height = 2 depth = 1

height = 1 depth = 2

height = 0 depth = 3

Unit 2 Y.-W. Chang 9


HEAPSORT
HEAPSORT(A)
1. BUILD-MAX-HEAP(A)
2. for i ← length[A] downto 2
3. do exchange A[1] ↔ A[i]
4. heap-size[A] ← heap-size[A] - 1
5. MAX-HEAPIFY(A,1)

․ Time complexity: O(n lg n).


․ Space complexity: O(n) for array, in-place. (Stable??)
Unit 2 Y.-W. Chang 10
Priority Queues
․A priority queue is a data structure on sets of keys; a
max-priority queue supports the following operations:
⎯ INSERT(S, x): insert x into set S.
⎯ MAXIMUM(S): return the largest key in S.
⎯ EXTRACT-MAX(S): return and remove the largest key in S.
⎯ INCREASE-KEY(S, x, k): increase the value of element x’s key
to the new value k.
․These operations can be easily supported using a heap.
⎯ INSERT: Insert the node at the end and fix heap in O(lg n) time.
⎯ MAXIMUM: read the first element in O(1) time.
⎯ INCREASE-KEY: traverse a path from the target node toward
the root to find a proper place for the new key in O(lg n) time.
⎯ EXTRACT-MAX: delete the 1st element, replace it with the last,
decrement the element counter, then heapify in O(lg n) time.
․Compare with an array?
Unit 2 Y.-W. Chang 11
Heap: EXTRACT-MAX and INSERT
HEAP-EXTRACT-MAX(A)
1. if heap-size[A] < 1
2. then error “heap underflow”
3. max ← A[1]
4. A[1] ← A[heap-size[A]]
5. heap-size[A] ← heap-size[A] -1
6. MAX-HEAPIFY(A,1)
7. return max

MAX-HEAP-INSERT(A,key)
1. heap-size[A] ← heap-size[A] + 1
2. i ← heap-size[A]
3. while i > 1 and A[PARENT(i)] < key
4. do A[i] ← A[PARENT(i)]
5. i ← PARENT(i)
6. A[i] ← key

Unit 2 Y.-W. Chang 12


Quicksort

․A divide-and-conquer algorithm
⎯ Divide: Partition A[p..r] into A[p..q] and A[q+1..r]; each key in
A[p..q] ≤ each key in A[q+1..r].
⎯ Conquer: Recursively sort two subarrays.
⎯ Combine: Do nothing; quicksort is an in-place algorithm.

QUICKSORT(A, p, r)
/* Call QUICKSORT(A, 1, length[A]) to sort an entire array */
1. if p < r then
2. q ← PARTITION(A, p, r)
3. QUICKSORT(A, p, q)
4. QUICKSORT(A, q+1, r)

Unit 2 Y.-W. Chang 13


Quicksort: Partition
PARTITION(A, p, r)
1. x ← A[p] /* break up A wrt x */
2. i ← p -1
3. j ← r +1
4. while TRUE do
5. repeat j ← j -1
6. until A[j] ≤ x
7. repeat i ← i +1
8. until A[i] ≥ x
9. if i < j
10. then exchange A[i] ↔ A[j]
11. else return j

․ Partition A into two subarrays A[j] ≤ x and A[i] ≥ x.


․ PARTITION runs in θ(n) time, where n = r - p + 1.
․ Ways to pick x: always pick A[p], pick a key at random, pick the median
of several keys, etc.
Unit 2 Y.-W. Chang 14
Quicksort Runtime Analysis: Best Case
․A divide-and-conquer algorithm
T(n)=T(q - p + 1) + T(r - q) + θ(n)
⎯ Depends on the position of q in A[p..r], but ???
․Best-, worst-, average-case analyses?
․Best case: Perfectly balanced splits---each partition
gives an n/2 : n/2 split.
T(n) =T(n/2)+T(n/2) + θ(n)
= 2T(n/2) + θ(n)
․Time complexity: θ(n lg n)
⎯ Master method? Iteration? Substitution?

Unit 2 Y.-W. Chang 15


Quicksort Runtime Analysis: Worst Case
․ Worst case: Each partition gives a 1 : n - 1 split.
T(n) = T(1) + T(n-1) + θ(n)
= T(1) + (T(1) + T(n-2) + θ(n-1)) + θ(n)
=…
⎛ k =n


= nT(1) + θ ⎜ k ⎟
⎝ k =1 ⎠
= θ(n2)

Unit 2 Y.-W. Chang 16


More on Worst-Case Analysis

․The real upperbound:


T(n) = max1≤ q ≤ n −1
(T(q) + T(n-q) + θ(n))
․Guess T(n) ≤ cn2 and verify it inductively:
T(n) ≤ max
1≤ q ≤ n −1
(cq 2 + c(n-q)2 + θ(n))

= c max (q2 + (n-q)2) + θ(n)


1≤ q ≤ n −1
․q2 + (n-q)2 is maximum at its endpoints:
T(n) ≤ c12 + c(n-1)2 + θ(n)
= cn2 - 2c(n-1) + θ(n)
≤ cn2

Unit 2 Y.-W. Chang 17


Quicksort: Average-Case Analysis
․Intuition: Some splits will be close to balanced and
others imbalanced; good and bad splits will be
randomly distributed in the recursion tree.
․Observation: Asymptotically bad run time occurs only
when we have many bad splits in a row.
⎯ A bad split followed by a good split results in a good partitioning
after one extra step!
⎯ Thus, we will still get O(nlgn) run time.

Unit 2 Y.-W. Chang 18


Randomized Quicksort
․How to modify quicksort to get good average-case
behavior on all inputs?
․Randomization!
⎯ Randomly permute input, or
⎯ Choose the partitioning element x randomly at each iteration.

RANDOMIZED-PARTITION(A, p, r)
1. i ← RANDOM(p, r)
2. exchange A[p] ↔ A[i]
3. return PARTITION(A, p, r)
RANDOMIZED-QUICKSORT(A, p, r)
1. if p < r then
2. q ← RANDOMIZED-PARTITION(A, p, r)
3. RANDOMIZED-QUICKSORT(A, p, q)
4. RANDOMIZED-QUICKSORT(A, q+1, r)

Unit 2 Y.-W. Chang 19


Average-Case Recurrence
․Assume that all keys are distinct.
․Partition into lower side : upper side = 1 : n - 1 with
probability 2/n; others with probability 1/n. Why?
․Partition at an index q:

Unit 2 Y.-W. Chang 20


Average-Case Recurrence (cont'd)
․ Guess T(n) ≤ an lg n + b and verify it inductively:

․ Need to show that

․ Substituting ∑ k =1 klgk, we have T(n) ≤ anlgn + b.


n −1

․ Practically, quicksort is often 2-3 times faster than merge sort or


heap sort.

Unit 2 Y.-W. Chang 21


Decision-Tree Model for Comparison-Based Sorter
․Consider only the comparisons in the sorter.
․Correspond to each internal node in the tree to a
comparison.
․Start at root and do the first comparison: ≤ ⇒ go to the
left branch; > ⇒ go to the right branch.
․Represent each leaf an ordering of the input (n! leaves!)

Unit 2 Y.-W. Chang 22


Ω(nlgn) Lower Bound for Comparison-Based Sorters
․ There must be n! leaves in the decision tree.
․ Worst-case # of comparisons = #edges of the longest path in the
tree height.
․ Theorem: Any decision tree that sorts n elements has height
Ω(nlgn).
Let h be the height of the tree T.
⎯ T has ≥ n! leaves.

⎯ T is binary, so has ≤ 2 leaves.


h

2h ≥ n!
h ≥ lgn! n
⎛n⎞
= Ω(nlgn) /* Stirling's approx n! > ⎜ ⎟ */
⎝e⎠
․ Thus, any comparison-based sorter takes Ω(nlgn) time in the
worst case.
․ Merge sort and heapsort are asymptotically optimal comparison
sorts.

Unit 2 Y.-W. Chang 23


Counting Sort: A Non-comparison-Based Sorter
․ Requirement: Input integers are in known range [1..k].
․ Idea: For each x, find # of elements ≤ x (say m, excluding x) and
put x in the (m +1)st slot.
․ Runs in O(n+k) time, but needs extra O(n+k) space.
․ Example: A: input; B: output; C: working array.

Unit 2 Y.-W. Chang 24


Counting Sort
COUNTING-SORT(A, B, k)
1. for i ← 1 to k do
2. C[i] ← 0
3. for j ← 1 to length[A] do
4. C[A[j]] ← C[A[j]] + 1
5. /* C[i] now contains the number of elements equal to i. */
6. for i ← 2 to k do
7. C[i] ← C[i] + C[i-1]
8. /* C[i] now contains the number of elements ≤ i. */
9. for j ← length[A] downto 1 do
10. B[C[A[j]]] ← A[j]
11. C[A[j]] ← C[A[j]] - 1
․Linear time if k = O(n).
․Stable sorters: counting sort, insertion sort, merge sort.
․Unstable sorters: heap sort, quicksort.

Unit 2 Y.-W. Chang 25


Radix Sort
RADIX-SORT(A, B, k)
1. for i ← 1 to d do
2. Use a stable sort to sort array A on digit i

928 520 101 101


101 101 401 228
401 401 308 308
228 928 520 329
329 228 928 401
308 308 228 520
520 329 329 928

․Time complexity: Θ(d(n+k)) for n d-digit numbers in


which each digit has k possible values.
⎯ Which sorter?

Unit 2 Y.-W. Chang 26


Order Statistics
․Def: Let A be an ordered set containing n elements.
The i-th order statistic is the i-th smallest element.
⎯ Minimum: 1st order statistic
⎯ Maximum: n-th order statistic
⎯ Median = .

․The Selection Problem: Find the i-th order statistic for


a given i.
⎯ Input: A set A of n (distinct) numbers and a number i, 1 ≤ i ≤ n.
⎯ Output: The element x ∈ A that is larger than exactly (i -1)
elements of A.
․Naive selection: sort A and return A[i].
⎯ Time complexity: O(nlgn).
⎯ Can we do better??

Unit 2 Y.-W. Chang 27


Finding Minimum (Maximum)
Minimum(A)
1. min ← A[1];
2. for i ← 2 to length[A] do
3. if min > A[i]
4. then min ← A[i];
5. return min;

․Exactly n-1 comparisons.


⎯ Best possible?
⎯ Expected # of times executed for line 4: O(lgn).
․Naive simultaneous minimum and maximum: 2n-3
comparisons.
⎯ Best possible?

Unit 2 Y.-W. Chang 28


Simultaneous Minimum and Maximum

․ T(n): # of comparisons used for n elements.


⎧1, if n=2
T (n) = ⎨
⎩ 2T ( n / 2) + 2, if n>2

․ Assume n = 2k.
T(n) = 2T(n/2) + 2
= 2(2T(n/4) + 2) + 2
= 2k-1T(2) + (2k-1 + 2k-2 + … + 2)
= 2k-1 + 2k - 2
= 3n/2 - 2
․ This divide-and-conquer algorithm is optimal!
Unit 2 Y.-W. Chang 29
Selection in Linear Expected Time
Randomized-Select(A,p,r,i)
1. if p = r
2. then return A[p];
3. q ← Randomized-Partition(A,p,r);
4. k ← q – p + 1;
5. if i ≤ k
6. then return Randomized-Select(A,p,q,i);
7. else return Randomized-Select(A,q+1,r, i-k).
․ Randomized-Partition first swaps A[p] with a random element of A
and then proceeds as in regular PARTITION.
․ Randomized-Select is like Randomized-Quicksort, except that we
only need to make one recursive call.
․ Time complexity
⎯ Worst case: 1:n-1 partitions.

⎯ T(n) = T(n-1) + θ(n) = θ(n )


2

⎯ Best case: T(n) = θ(n)

⎯ Average case? Like quicksort, asymptotically close to best case.

Unit 2 Y.-W. Chang 30


Selection in Linear Expected Time: Average Case

․ Assume T(n) ≤ cn.

․ Thus, on average, Randomized-Select runs in linear time.


Unit 2 Y.-W. Chang 31
Selection in Worst-Case Linear Time
․ Key: Guarantee a good split when array is partitioned.
․ Select(A, p, r, i)
1. Divide input array A into ⎣ n/5⎦ groups of size 5 (possibly with a
leftover group of size < 5).
2. Find the median of each of the ⎡n/5⎤ groups.
3. Call Select recursively to find the median x of the ⎡n/5⎤ medians.
4. Partition array around x, splitting it into two arrays of A[p, q]
(with k elements) and A[q+1, r] (with n-k elements).
5. if (i ≤ k) then Select(A, p, q, i) else Select(A, q + 1, r, i - k).

Unit 2 Y.-W. Chang 32


Runtime Analysis
․ Main idea: Select guarantees that x causes a good
partition; at least

elements > x (or < x) → worst-case split has 7n/10 + 6


elements in the bigger subproblem.
․ Run time: T(n) = T( ⎡n/5⎤ ) + T(7n/10+6) + O(n).
1. O(n): break into groups.
2. O(n): finding medians (constant time for 5 elements).
3. T(⎡n/5⎤): recursive call to find median of median.
4. O(n): partition.
5. T(7n/10+6): searching in the bigger partition.
․ Apply the substitution method to prove that T(n)=O(n).

Unit 2 Y.-W. Chang 33

You might also like