0% found this document useful (0 votes)
9 views

Cormen Sort

The document describes how bucket sort works and analyzes its running time. Bucket sort assumes inputs are uniformly distributed between 0 and 1, placing each value into a bucket based on its position. It sorts each bucket using insertion sort and concatenates the buckets to output the sorted list. The analysis shows that if inputs are uniformly distributed, bucket sort runs in expected linear time.

Uploaded by

Gayathri U
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Cormen Sort

The document describes how bucket sort works and analyzes its running time. Bucket sort assumes inputs are uniformly distributed between 0 and 1, placing each value into a bucket based on its position. It sorts each bucket using insertion sort and concatenates the buckets to output the sorted list. The analysis shows that if inputs are uniformly distributed, bucket sort runs in expected linear time.

Uploaded by

Gayathri U
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

174 Chapter 8 Sorting in Linear Time

8.4 Bucket sort

Bucket sort runs in linear time when the input is drawn from a uniform distribution.
Like counting sort, bucket sort is fast because it assumes something about the input.
Whereas counting sort assumes that the input consists of integers in a small range,
bucket sort assumes that the input is generated by a random process that distributes
elements uniformly over the interval [0, 1). (See Section C.2 for a definition of
uniform distribution.)
The idea of bucket sort is to divide the interval [0, 1) into n equal-sized subin-
tervals, or buckets, and then distribute the n input numbers into the buckets. Since
the inputs are uniformly distributed over [0, 1), we don’t expect many numbers to
fall into each bucket. To produce the output, we simply sort the numbers in each
bucket and then go through the buckets in order, listing the elements in each.
Our code for bucket sort assumes that the input is an n-element array A and
that each element A[i] in the array satisfies 0 ≤ A[i] < 1. The code requires an
auxiliary array B[0 . . n − 1] of linked lists (buckets) and assumes that there is a
mechanism for maintaining such lists. (Section 10.2 describes how to implement
basic operations on linked lists.)

B UCKET-S ORT ( A)
1 n ← length[A]
2 for i ← 1 to n
3 do insert A[i] into list B[⌊n A[i]⌋]
4 for i ← 0 to n − 1
5 do sort list B[i] with insertion sort
6 concatenate the lists B[0], B[1], . . . , B[n − 1] together in order
Figure 8.4 shows the operation of bucket sort on an input array of 10 numbers.
To see that this algorithm works, consider two elements A[i] and A[ j ]. As-
sume without loss of generality that A[i] ≤ A[ j ]. Since ⌊n A[i]⌋ ≤ ⌊n A[ j ]⌋,
element A[i] is placed either into the same bucket as A[ j ] or into a bucket with a
lower index. If A[i] and A[ j ] are placed into the same bucket, then the for loop of
lines 4–5 puts them into the proper order. If A[i] and A[ j ] are placed into different
buckets, then line 6 puts them into the proper order. Therefore, bucket sort works
correctly.
To analyze the running time, observe that all lines except line 5 take O(n) time in
the worst case. It remains to balance the total time taken by the n calls to insertion
sort in line 5.
To analyze the cost of the calls to insertion sort, let n i be the random variable
denoting the number of elements placed in bucket B[i]. Since insertion sort runs
in quadratic time (see Section 2.2), the running time of bucket sort is
8.4 Bucket sort 175

A B
1 .78 0
2 .17 1 .12 .17
3 .39 2 .21 .23 .26
4 .26 3 .39
5 .72 4
6 .94 5
7 .21 6 .68
8 .12 7 .72 .78
9 .23 8
10 .68 9 .94
(a) (b)

Figure 8.4 The operation of B UCKET-S ORT . (a) The input array A[1 . . 10]. (b) The array B[0 . . 9]
of sorted lists (buckets) after line 5 of the algorithm. Bucket i holds values in the half-open
interval [i/10, (i + 1)/10). The sorted output consists of a concatenation in order of the lists
B[0], B[1], . . . , B[9].

n −1
!
T (n ) = !(n ) + O(n 2i ) .
i=0

Taking expectations of both sides and using linearity of expectation, we have


" #
!n −1
E [T (n )] = E !(n ) + O(n 2i )
i=0
n −1
!
= !(n ) + E [O(n 2i )] (by linearity of expectation)
i=0
n −1
!
= !(n ) + O (E [n 2i ]) (by equation (C.21)) . (8.1)
i=0

We claim that
E [n 2i ] = 2 − 1/n (8.2)
for i = 0, 1, . . . , n − 1. It is no surprise that each bucket i has the same value of
E [n 2i ], since each value in the input array A is equally likely to fall in any bucket.
To prove equation (8.2), we define indicator random variables
X i j = I { A[ j ] falls in bucket i }
for i = 0, 1, . . . , n − 1 and j = 1, 2, . . . , n . Thus,
176 Chapter 8 Sorting in Linear Time

n
!
ni = Xij .
j =1

To compute E [n 2i ], we expand the square and regroup terms:


"# $2 %
! n
E [n 2i ] = E Xij
j =1
" %
!n !
n
= E X i j X ik
j =1 k=1
⎡ n ⎤
! ! !
2
= E⎣ Xij + X i j X ik ⎦
j =1 1≤ j ≤n 1≤k≤n
k̸= j
n
! ! !
* +
= E X i2j + E [X i j X ik ] , (8.3)
j =1 1≤ j ≤n 1≤k≤n
k̸= j

where the last line follows by linearity of expectation. We evaluate the two sum-
mations separately. Indicator random variable X i j is 1 with probability 1/n and 0
otherwise, and therefore
, -
* 2+ 1 1
E Xij = 1 · + 0 · 1 −
n n
1
= .
n
When k ̸= j , the variables X i j and X ik are independent, and hence
E [X i j X ik ] = E [X i j ] E [X ik ]
1 1
= ·
n n
1
= .
n2
Substituting these two expected values in equation (8.3), we obtain
! n ! ! 1
1
E [n 2i ] = +
j =1
n 1≤ j ≤n 1≤k≤n n 2
k̸= j
1 1
= n· + n(n − 1) · 2
n n
n−1
= 1+
n
1
= 2− ,
n
8.4 Bucket sort 177

which proves equation (8.2).


Using this expected value in equation (8.1), we conclude that the expected time
for bucket sort is !(n) + n · O(2 − 1/n) = !(n). Thus, the entire bucket sort
algorithm runs in linear expected time.
Even if the input is not drawn from a uniform distribution, bucket sort may still
run in linear time. As long as the input has the property that the sum of the squares
of the bucket sizes is linear in the total number of elements, equation (8.1) tells us
that bucket sort will run in linear time.

Exercises

8.4-1
Using Figure 8.4 as a model, illustrate the operation of B UCKET-S ORT on the array
A = ⟨.79, .13, .16, .64, .39, .20, .89, .53, .71, .42⟩.

8.4-2
What is the worst-case running time for the bucket-sort algorithm? What simple
change to the algorithm preserves its linear expected running time and makes its
worst-case running time O(n lg n)?

8.4-3
Let X be a random variable that is equal to the number of heads in two flips of a
fair coin. What is E [X 2 ]? What is E2 [X ]?

8.4-4 ⋆
We are given n points in the unit circle, pi = (xi , yi ), such that 0 < xi2 + yi2 ≤ 1
for i = 1, 2, . . . , n. Suppose that the points are uniformly distributed; that is,
the probability of finding a point in any region of the circle is proportional to the
area of that region. Design
! a !(n) expected-time algorithm to sort the n points by
their distances di = xi2 + yi2 from the origin. (Hint: Design the bucket sizes in
B UCKET-S ORT to reflect the uniform distribution of the points in the unit circle.)

8.4-5 ⋆
A probability distribution function P(x) for a random variable X is defined by
P(x) = Pr {X ≤ x}. Suppose that a list of n random variables X 1 , X 2 , . . . , X n
is drawn from a continuous probability distribution function P that is computable
in O(1) time. Show how to sort these numbers in linear expected time.

You might also like