Search in A Sorted Matrix
Search in A Sorted Matrix
Computer Science
ELSEVIER Theoretical Computer Science 188 ( 1997) 22 I-230
Note
Optimal algorithms for generalized searching
in sorted matrices
Abstract
We present a set of optimal and asymptotically optimal sequential and parallel algorithms
for the problem of searching on an no x n sorted matrix in the general case when noG/T.
Our two sequential algorithms have a time complexity of O(nzlog(2nlm)) which is shown
to be optimal. Our parallel algorithm runs in O(log(logm/loglogm) log(2n!n*~)) time using
ml log(logm/log logm) processors on a COMMON CRCW PRAM, where O<: < I is a mono-
tonically decreasing function on IU, which is asymptotically work-optimal. The two sequential
algorithms differ mainly in the ways of matrix partitioning: one uses row-searching and the other
applies diagonal-searching. The parallel algorithm is based on some non-trivial matrix partition-
ing and processor allocation schemes. All the proposed algorithms can be easily generahzcd for
searching on a set of sorted matrices.
KL~~XYII&:CRCW PRAM; Matrix search problem; Optimal algorithm; Processors: Sorted matrix.
Time complexity; Work-optimal
I. Introduction
We say that a matrix is sorted if all elements in each row and column are sorted in
non-decreasing (lexicographical) order, respectively. Order statistics, especially selec-
tion, on sorted matrices has received much attention [2,7,8, 11, 12, 15, 16, 18, 191 due
to its important applications in many fields [8,9, 13, 14, 171. Closely related to selec-
tion is the problem of searching a sorted matrix for the occurrence of a given element
(key), which we call the mulr& seuvch problem. This problem arises in many
* E-mail: h.shen~cit.gu.edu.au.
This work was done while the author was visiting Department of Computer Science, Abe Akademi
University, Finland.
applications such as image processing and computational biology, and hence, has at-
tracted considerable attention [ 1,3-6, 161.
It has been proven that searching on an n x n sorted matrix requires n(n) time [3].
Optimal sequential algorithms for this case exist in the literature [ 1,5]. Work-optimal
parallel algorithm for this case has been given in [16], that runs in O(log logn) time
on a COMMON CRCW PRAM. Other parallel algorithms have also been developed
for searching on sorted matrices [4] and on matrices with sorted columns [lo].
In this paper we study the problem of generalized searching in m x n sorted matrix,
where m <n. Clearly for this problem, O(n) is no more a lower bound when m = o(n),
and hence simply applying the existing n x IZ matrix searching algorithms will not be
able to reach optimum in this case. Neither can trivial generalization of the existing
results by splitting X into [n/ml m x m submatrices and searching each submatrix
individually reduce the total work to below O(n) (0( [H/mlm) = O(n)). It seems that
not much work has been reported on optimal solutions to the generalized matrix search
problem in the case m <n.
The main contributions of this paper are the following:
l We propose two optimal sequential algorithms based on row-searching and diagonal-
searching respectively, both running in O(m log(2nlm)) time. We claim the optimality
by showing that n(m log(n/m)) is a lower time bound for the matrix search problem
in the general case when m dn.
l We present an asymptotically work-optimal parallel algorithm that runs in
O(log(log m/ log log m) log(2n/m- )) time using m/ log(log m/ log log m) processors
on a COMMON CRCW PRAM, where O<z < 1 is a monotonically decreasing
function m.
We present our optimal sequential algorithms in Section 2 and asymptotically optimal
parallel algorithm in Section 3, and conclude the paper in Section 4 with some open
problems for future research.
Throughout the paper we assume that X is an m x n sorted matrix, 4 <m <n, and
e the element to be searched for. When m < 4, simply applying the naive algorithm
searching rows one by one will reach the optimum.
The basic idea behind both our algorithms is the following: searching proceeds in
pkast~s on some submatrices with reduced sizes, where in each phase a maximal number
of elements which cannot be candidates for e are discarded.
We lay X in the Cartesian plane and let X(0,0) (the small est) be at the southwest
comer and X(m - 1, n - 1) (the largest) at the northeast comer. Our first algorithm
works by repeatedly searching for a pivot element on the middle row of X which splits
X into submatrices. The algorithm is given as the following procedure and runs by
call Search-l(e, X[(O,O),(m - I.n - I)]):
Lemma I. Duriaq e&z phase of recursion in ulolorithm Search-l ull the elements
disccrrdtd umnot he candidates ,fbr e.
Proof. The lemma results directly from a standard argument based on the following
fact:
In each phase of recursion X is divided into 4 submatrices according to the pivot
element found in Step 3: Xsw = X[(r,c).(J,j)], XNW = X[(F+ l,c),(r,j)], Xyr, =
X[(r + l,,j + l),(~,c)] and Xs, = X[(v,j + l),(F,c)]. Clearly, e @ Xsw if r > ?c,,.
and e $XNE if e < .~,~+l. 0
Now we analyze the time complexity of the algorithm. Let t(m,n) be the time
complexity for searching on X. Clearly, the algorithm decomposes t(m,n) into three
parts required for finding x,~,.,, searching on X,, and searching on X~E. So we have
the following recurrence:
It is easy to verify that t(m, n) is maximized when IXNwI = IXsEI, that is, j = n/2.
In this case X is halved in both dimensions in each phase of recursion, so at the end
there are m remaining submatrices, all with dimension 1 x n/m, to be searched. Thus,
we obtain the solution of Eq. (1) as follows and leave the detailed proof to the reader:
Our second algorithm splits X in each phase via searching for a pivot on the main
diagonal of the middle m x m submatrix, rather than on the middle row of X. The
main diayonal of a matrix is drawn from its southwest corner to northeast corner. The
algorithm is presented as follows:
Clearly, t(m,n) reaches maximum when d = m/2. In this case each phase of re-
cursion halves both dimensions, so there are m submatrices of dimension 1 x n/m
remaining at the end which will be searched for e. Thus, it is easy to show that the
solution of Eq. (3) is
We now show that 0(m log(n/m)) is a lower bound on the time complexity for our
search problem, and hence prove the optimality of both of the above algorithms.
H. Shrnl Theorriicul Computer Srirnce 188 11997) 221 230 225
: : : zw
: :: : : : : :
:
: :
: :. .... .. . .. .. ...... ...._..,................_
Proof. Along the same line as in [3] for proving the lower bound f?(n) for seraching
in an n x n square sorted matrix, we use the following argument for our proof.
Construct off-diagonal slice X; = {X[ 2, ( m ~ i ~ l)n/m]. X[i,(m - i - I)njm +
11,. .X[i, (m - i)rz/m ~ l]} m row i of X, 0 6 i < m - 1, as depicted in Fig. 1. Clearly,
X, contains a sorted sequence of nJm elements.
We know that searching X, for e in the worst case for large rank of e requires
log I/Y,, = log(n/m) comparisons and hence, O(log(n/nz)) time for any i. Since V-r, E
X, and Vx, E X, there is no order between xi and x, for 06 i # j <m - 1 (this can be
easily seen from Fig. 1), searching for e in X0 UX, U. . UX,,_ 1 requires to search each
individual X, for e, for i = 0, 1, . . . ,m -- 1, which in the worst case takes e(m log(n,~m))
time. The lemma follows immediately from the fact that r may fall into any X, and
hence searching X contains searching U;li Xi. [3
We say that an algorithm is optimal if its time complexity matches the lower bound
for the problem. By Eqs. (2) and (4) and Lemma 2, we obtain our first theorem:
Now, we consider the generalized matrix search problem in the parallel environment.
For searching on an n x n sorted matrix X, an algorithm running in time O(log log n )
using O(n/log log n) processors on a COMMON CRCW PRAM was given in [ 161. For
226 H. Shenl Theoretical Computer Science 188 (1997) 221-230
Proof. We use a standard approach and draw a main diagonal to connect x,in to x,,,
in each submatrix. A submatrix may contain e iff x,,, <e <xmax. Clearly all these
main diagonals lie in at most u + v diagonals of X. Since each diagonal of X is sorted
from its southwest end to the northeast end, there are at most one pair of points xi,j
and x~+I,~+~ which overlaps with the submatrixs extreme points on the diagonal such
that xi,j < e < ~i+l,j+l. Hence e may be a non-extreme element in at most u + v
submatrices.
Our algorithm works by partitioning X into rnli2 x mJ2--E cells of size m1!2 x
nmlm f2 for any small constant O< E < i, and identify all active cells (at most
m12 + m1/2-c). Repeat this partitioning process until the splitting factor on the vertical
direction (m12- ) shrinks to 1. Finally, search the sorted arrays for e in every active
cells using binary search. We present the algorithm as follows:
The correctness of the algorithm is implied by the fact that only active cells may
contain e and thus all the inactive cells can be discarded in each phase of partitioning.
Now we analyze the above algorithm. Clearly, the while-loop in CRCW-Search iter-
ates lo& l/i:) times. The following lemmas are needed for our analysis:
Lemma 4. The total number of pvwessors used in Step 2 the while-loop in CRCW-
Search is bounded by 3.g()mPi., ~~herr 1 < L = 1 + rn-, <2 und 0 6 1: < I /2.
Proof. In the ith phase partitioning, since each cell in CELLS;_, is partitioned into
nz- x m2mP, new cells and among these cells there are at most m2- + m- + active
cells by Lemma 3, O<c < A, the total number of active cells in CELLS, is
lCELLS0~ = 1,
where 1, = 1 + m-.
In Step 2.2 there are m2- x m2--s = m2m-P processors assigned to each cell in
CEL.LS,_l, making the total number of processors required for this step to be
(6)
228 H. Shen I Theoretical Computer Science 188 (1997) 221-230
Clearly, pi is the total number of processors required for the ith iteration of the
while-loop, since it is needed also for Step 2.3 and ICELLS,I d pi processors, by Eqs.
(5) and (6), are used in Step 2.4.
The number of processors required for Step 2 the while-loop is the maximum number
of processors required for each iteration:
p = max(pi) = ~W(l+.+, q
i (7)
Because we use a COMMON CRCW PRAM, Step 2.3 can be completed in O(1)
time. This is achieved by simply letting every processor write the result of its com-
parison to a shared variable s with initial value 1 - write 1 if = and 0 if
f, so that at the end, we know that e is found if s = 1 and not found otherwise.
All other steps inside the while-loop of Step 2 can clearly be done in 0( 1) time. So
Step 2 requires a total time of O(log(l/s)). The value of E is chosen such that the total
work of Step 2 is optimal:
By Eq. (9), Step 2 iterates i* = log(log m/log log m) times, and when it terminates,
since 2 = 1 + rn-& = 1 + l/ logm, we have the number of cells in CELLSi*
ICELLSi* 1= )L*mc:112p
= (1 + l/logm)*ml-lz*
log m
o<z=
( (log
log log m
+I
>
loglogm- 1 /logm
>
< 1.
Proof. We show the processor allocation scheme for Step 3. By Eq. (10) at the most
2m/logm active cells in total exist when entering Step 3. So in Step 3.1, we can
allocate log m/(2 log(logm/ log logm)) processors to each active cell. By Eq. (11)
there are log m (sorted) rows in each cell, therefore, we shall assign a group of
2 log(logm!loglogm) rows to one processor within the cell (Step 3.2.1). Since a row
in each cell is of length 2n,/mpz by Eq. (12), binary search on it in Step 3.2.2 requires
O(log(2n/m~ )) time. Thus, the total time required for Step 3 is O(log(log m/ log log m)
log(2n!m-)). 0
4. Concluding remarks
Acknowledgements
The author wishes to thank RalphhJohan Back for his support during the course
of this work while the author was visiting Abe Akademi University. Thanks also to
an anonymous referee for various constructive comments that have strengthened the
results of this paper.