Writeup (1)
Writeup (1)
Lecture 9: August 29
Instructor: Prof. Prateek Vishnoi Indian Institute of Technology, Mandi
Currently, we are going to take a little break from the sorting algorithms in order to search for another
challenge. However, we will eventually return to the sorting algorithm known as the Heap Sort. Let’s try to
solve another problem instead.
Problem Statement
First Approach
Sort the array and return the k th element.
We have already seen the proof of correctness and analysis of different sorting techniques.
From our previous knowledge, we come to the conclusion that this algorithm takes O(n log n) time.
Lighter Question
Approach
Initialize a variable min = A[1]
Iterate over the array and check whether the element in an array is smaller than min or not. If YES,
update min to that element, else IGNORE.
Time complexity of the upper approach is O(n).
Convince yourself that this method can’t be extended to all k. Specially when, k = Θ(n).
Remember the Quick Sort, where pivot element divides the array into two parts such that all the elements
on left subarray < p(pivot) and all the elements that are present on right subarray are > p. Can we use this
Somehow??
9-1
9-2 Lecture 9: August 29
Randomised Solution
QuickSelect Algorithm
QU ICKSELECT (A, k)
Pick a pivot element p at random from A.
Split A into subarrays LEF T and RIGHT by comparing each element to p using PARTITION
function of Quick Sort.
Complexity Analysis
Best Case
Best case occurs when the pivot selected at the first iterations turns out to be equal to k. In this case,
partition function will be called only once thus recurrence relation will be :
Worst case
Worst Case occurs when the pivot selected divides array into two parts of size n − 1 and 0 and pivot is not
the k th order statistic at each iteration, till it contains the single element. Recurrence relation will be :
T (n) = T (n − 1) + Θ(n)
Average Case
It is really hard to find out the exact average number of comparisions thus we will try to bound the recurrence
somehow. Formally, let T (n) denotes the number of comparisons performed by QuickSelect on any input of
size n. Then recurrence relation looks like :
T (n) ≤ cn + T (X)
Lecture 9: August 29 9-3
where X is a random variable that takes value in between {0, (n − 1)}. X is a random variable corresponding
to the size of the sub problem that is solved recursively.
E[T (n)] ≤ cn + E[T (X)] (9.1)
We can’t just go and solve this recurrence because we don’t yet know what X or E[T (X)]look like yet.
Before giving a formal proof, here’s some intuition. First of all, how large is X , the size of the array
given to the recursive call? It depends on two things: the value of k and the randomly chosen pivot. After
partitioning the input into LEFT and RIGHT, whose size adds up to (n − 1), the algorithm recursively
calls QuickSelect on one of them, but which one? Since we are interested in the behavior for a worst-case
input, we can assume pessimistically that the value of k will always make us choose the bigger of LEFT and
RIGHT. Therefore the question becomes: if we choose a random pivot and split the input into LEFT and
RIGHT, how large is the larger of the two of them? Well, possible sizes of the splits (ignoring rounding) are:
(0, n − 1), (1, n − 2), (2, n − 3) . . . (n/2 − 2, n/2 + 1), (n/2 − 1, n/2)
Thus, it can be easily seen that the larger of the two is a random number between {(n − 1), (n − 2) . . . (n/2 +
1), n/2}
So, the expected size of the larger half is about 3n/4, again, ignoring rounding errors. Another way to say this
is that if we split a candy bar at random into two pieces, the expected size of the larger piece is 3/4 of the bar.
We need to correctly analyze the expected value of T (X). To do so, we can consider with what probabilities
does X take on certain values and analyze the corresponding behavior of T . So, with what probability is X
at most 3/4? This happens when the smaller of LEFT and RIGHT is at least one quarter of the elements,
i.e., when the pivot is not in the bottom quarter or top quarter. This means the pivot needs to be in the
middle half of the data, which happens with probability 1/2. The other half the time, the size of X will be
larger, at most n − 1.
" !# " !#
1 3n 1 1 3n 1
E[T (X)] ≤ E T + E[T (n − 1)] ≤ E T + E[T (n)]
2 4 2 2 4 2
After placing the bound on the above equation 9.1 we get,
" !#
1 3n 1
E[T (n)] ≤ cn + E T + E[T (n)]
2 4 2