Lecture 4 - Quicksort
Lecture 4 - Quicksort
Quicksort
Merge Sort - Discussion
• Running time insensitive of the input
• Advantages:
– Guaranteed to run in (nlgn)
• Disadvantage
– Requires extra space N
2
Quicksort
A[p…q] ≤ A[q+1…r]
• Sort an array A[p…r]
• Divide
– Partition the array A into 2 subarrays A[p..q] and A[q+1..r],
such that each element of A[p..q] is smaller than or equal to
each element in A[q+1..r]
– The index (pivot) q is computed
• Conquer
– Recursively sort A[p..q] and A[q+1..r] using Quicksort
• Combine
– Trivial: the arrays are sorted in place no work needed to
combine them: the entire array is now sorted
3
QUICKSORT
Alg.: QUICKSORT(A, p, r)
if p < r
then q PARTITION(A, p, r)
QUICKSORT (A, p, q)
4
Partitioning the Array
• Idea
– Select a pivot element x around which to partition
– Grows two regions A[p…i] x x A[j…r]
A[p…i] x
x A[j…r]
i j
5
Partitioning the Array
Alg. PARTITION (A, p, r)
1. x A[p] (pivot)
A: ap ar
2. i p – 1
3. j r + 1 i j
A[p…q] ≤ A[q+1…r]
4. while TRUE
5. do repeat j j – 1 A: ap ar
6. until A[j] ≤ x
j=q i
7. repeat i i + 1
8. until A[i] ≥ x
9. if i < j
10. then exchange A[i] A[j] Running time: (n)
n=r–p+1
11. else return j
6
Example
A[p…r]
5 3 2 6 4 1 3 7 5 3 2 6 4 1 3 7
i j i j
3 3 2 6 4 1 5 7 3 3 2 6 4 1 5 7
i j i j
A[p…q] A[q+1…r]
3 3 2 1 4 6 5 7 3 3 2 1 4 6 5 7
i j j i
7
Another Way to PARTITION
• The pivot element is not included in any of the
two subarrays
8
Another Way to PARTITION
Alg.: PARTITION(A, p, r)
A[p…i] ≤ A[i+1…j-1] ≥
x ← A[r] x x
p i i+1 j-1 r
i←p-1
for j ← p to r - 1
do if A[ j ] ≤ x i j
unknown
then i ← i + 1 pivot
exchange A[i] ↔
A[j]
exchange A[i + 1] ↔ A[r]
return i + 1
Chooses the last element of the array as a pivot
Grows a subarray [p..i] of elements ≤ x
Grows a subarray [i+1..j-1] of elements >x
Running Time: (n), where n=r-p+1
9
Example
10
Loop Invariant
A[p…i] ≤ A[i+1…j-1] >
x x
p i i+1 j-1 r
x
unknown
pivot
i p,j r
x
unknown
pivot
12
Loop Invariant
A[p…i] ≤ A[i+1…j-1] >
x x
p i i+1 j-1 r
x
unknown
pivot
Maintenance: While the loop is running
– if A[ j ] ≤ pivot, then i is incremented, A[ j ]
and A[i] are swapped and then j is
incremented
– If A[ j ] > pivot, then increment only j
13
Maintenance of Loop Invariant
p i j r
If A[j] > pivot: >
x x
• only increment j
≤x >x
p i j r
x
≤x >x
p i j r
If A[j] ≤ pivot: ≤
x x
• i is incremented, A[j] ≤x >x
and A[i] are p i j r
swapped and then j x
is incremented
≤x >x
14
Loop Invariant
A[p…i] ≤ A[i+1…j-1] >
x x
p i i+1 j-1 j=r
x
pivot
15
Performance of Quicksort
• Worst-case partitioning
– One region has one element and one has n – 1 elements
– Maximally unbalanced
n n
• Recurrence 1 n-1 n
1 n-2 n-1
T(n) = T(n – 1) + (n), n-2
n 1 n-3
T(1) = (1) 1
2 3
T(n) = T(n – 1) + (n) 1 1 2
n (n ) 2
k 1
16
Performance of Quicksort
• Best-case partitioning
– Partitioning produces two regions of size n/2
• Recurrence
T(n) = 2T(n/2) + (n)
T(n) = (nlgn) (Master theorem)
17
Performance of Quicksort
• Balanced partitioning
– Average case closer to best case than worst case
– Partitioning always produces a constant split
• E.g.:
9-to-1 proportional split
T(n) = T(9n/10) + T(n/10) + n
18
19
20
More Intuition
• Average case
– All permutations of the input numbers are equally likely
– On a random input array, we will have a mix of well balanced
and unbalanced splits
– Good and bad splits are randomly distributed across throughout
the tree
n combined cost: n combined cost:
1 n-1 2n-1 = (n) n = (n)
(n – 1)/2 + 1 (n – 1)/2
(n – 1)/2 (n – 1)/2
22
Randomizing Quicksort
• Two ways to achieve this:
23
24
Randomized Algorithms
• The behaviour is determined in part by values produced
by a random-number generator within each run of the
algorithm:
– RANDOM(a, b) returns an integer r, where a ≤ r ≤ b and each
of the b-a+1 possible values of r is equally likely
• No input can elicit worst case behavior
– Worst case occurs only if we get “unlucky” numbers from the
random number generator
– Despite this possibility, worst case = (n2) and average
case = O(n lg n)
25
Randomized PARTITION
Alg.: RANDOMIZED-PARTITION(A, p, r)
i ← RANDOM(p, r)
return PARTITION(A, p, r)
26
Randomized Quicksort
Alg. : RANDOMIZED-QUICKSORT(A, p, r)
if p < r
then q ← RANDOMIZED-PARTITION(A, p, r)
RANDOMIZED-QUICKSORT(A, p, q-1)
RANDOMIZED-QUICKSORT(A, q + 1,
r)
27
Revision
• A random variable:
– A variable that takes on one of multiple
different values
– each occurring with some probability.
• When there are a finite (or countable)
number of such values, the random
variable is discrete.
• Random variables contrast with "regular“
variables, which have a fixed (though
often unknown) value.
28
Revision
• For instance, a single roll of a standard die
can be modeled by the discrete random
variable
X={
1. die shows 1
2. die shows 2
3. die shows 3
4. die shows 4
5. die shows 5
6. die shows 6
}
where each case occurs with probability 1/6.
29
Revision
30
31
Revision
32
How to see that?
33
Randomized Quicksort Analysis
34
35
36
37
38
39
40
41
42
43
44
45
46
Sorting Challenge 1
Problem: Sort a file of huge records with tiny
keys
Example application: Reorganize your MP-3 files
47
Sorting Files with Huge Records and
Small Keys
• Selection sort?
48
Sorting Challenge 2
Problem: Sort a huge randomly-ordered file
of small records
Application: Process transaction record for a
phone company
49
Sorting Huge Randomly - Ordered Files
• Selection sort?
– NO, always takes quadratic time
• Bubble sort?
– NO, quadratic time for randomly-ordered keys
• Insertion sort?
– NO, quadratic time for randomly-ordered keys
50
Sorting Challenge 3
Problem: sort a file that is already almost in
order
Applications:
– Re-sort a huge database after a few changes
– Doublecheck that someone else sorted a file
Which sorting method to use?
A. A system sort, guaranteed to run in time NlgN
B. Selection sort
C. Bubble sort
D. A custom algorithm for almost in-order files
E. Insertion sort
51
Sorting Files That are Almost in Order
• Selection sort?
– NO, always takes quadratic time
• Bubble sort?
– NO, bad for some definitions of “almost in order”
– Ex: B C D E F G H I J K L M N O P Q R S T U V W X Y Z A
• Insertion sort?
– YES, takes linear time for most definitions of “almost
in order”
• Fast system sort or custom method?
– Probably not: insertion sort simpler and faster
52