0% found this document useful (0 votes)
40 views14 pages

Randomized Algorithms Dsa CP3151

(1) Randomized algorithms introduce randomness into algorithms such that some actions are taken at random. This means the runtime is also randomized. The average-case runtime is more important than worst-case. (2) Randomized algorithms can produce wrong solutions occasionally but the probability of error is very small. (3) This section presents a randomized algorithm for solving the closest pair problem in linear (O(n)) average-case time, compared to O(n log n) worst-case time for previous algorithms. It works by randomly clustering points and only computing distances within clusters.

Uploaded by

4023 Keerthana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views14 pages

Randomized Algorithms Dsa CP3151

(1) Randomized algorithms introduce randomness into algorithms such that some actions are taken at random. This means the runtime is also randomized. The average-case runtime is more important than worst-case. (2) Randomized algorithms can produce wrong solutions occasionally but the probability of error is very small. (3) This section presents a randomized algorithm for solving the closest pair problem in linear (O(n)) average-case time, compared to O(n log n) worst-case time for previous algorithms. It works by randomly clustering points and only computing distances within clusters.

Uploaded by

4023 Keerthana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

c apter

Randomized Algorithms

Theconcept of randomized algorithms is relatively new. In every algorithm introduced


so far,each step in the algorithmis deterministic.That is, we never, in the middle of
executingan algorithm, make an arbitrary choice. In randomized algorithms, which we
shallintroduce in this chapter, we do make arbitrarychoices. This means that some
actionsare taken at random.
Since some actions are executed at random, the randomized algorithms introduced
laterhave the following properties:

(I) In the case of optimization problems, a randomized algorithm gives an optimal


solution. However, since random actions are taken in the algorithm, the time
complexity of a randomized optimization algorithm is randomized. Thus, the
average-case time complexity of a randomized optimization algorithm is more
important than its worst-case time complexity.
(2) In the case of decision problems, a randomizedalgorithm does make mistakes
from time to time. Yet, it can be said that the probabilityof producing wrong
solutions is exceedingly small; otherwise, this randomized algorithm is not going
to be useful.

A RANDOMIZED ALGORITHM TO SOLVE THE CLOSEST PAIR PROBLEM


Theclosest pair problem was introduced in Chapter 4. In Chapter 4, we showed that this
problemcan be solved by the divide-and-conquer approach in O(n log n) time. This time
complexityis for worst cases. In this section, we shall show that there exists a
randomizedalgorithm and the average-casetime complexity of using this algorithm to
solvethe closest pair problem is O(n).

553
554 CHAPTER I I

plane. The closest pair


Let .V2.... vn be n points in the 2-dimensional
distance between and
problem is to find the closest pair and x. for which the
straightfonvard approach
.vjis the smallest. among all possible pairs of points. A
to solve this problem is to evaluate all the n(n — 1)/2 inter-distances and find the
minimum among all these distances.
The main idea of the randomizedalgorithm is based upon the following
observation: If neo points Xiand xj are distant from each other, then their distance
unlikelybe the shortest and thus can well be ignored. With this idea, the
randomized algorithm would first partition the points into several clusters in such
a way that points within each cluster are close to one another. We then only
calculate distances among points within the same cluster.
Let us consider Figure 11—1.As shown in Figure 11—1,there are six points.
If we partition these six points into three clusters: Sl = {Xl,x2), S2 = {xs,x6) and
S3 = {x3,x4}, then we shall now only calculate three distances, namely d(Xl,h),
and d(X3,xo. Afterwards, we find the minimum among these three
distances. If we do not partition points into clusters, we must calculate
(6 • 5)/2 = 15 distances.

FIGURE 11-1 The partition of points.

Of course, this discussionis quite misleadingbecause there is no guarantee


that the strategy would work. In fact, the strategy may be seen as a divide-
without-conquer strategy. There is a dividing process, but no merging processe
Let us consider figure 11—2. In Figure 11—2,we can see that the closest pair is
{Xl,x3). Yet, we have partitioned these two points into two different clusteß •
If we partition the entire space into squares with side lengths equal to ö
which is not smaller than the shortest distance, then after all the within-cluster
RANDOMIZEDAIM)RJTHMS
555

FIGURE 11—2 A case to show the importance of inter-cluster distances.

x
x

x ext

distancesare calculated, we may double the side length of the square and produce
larger squares. The shortest distance must be within one of these enlarged
squares.Figure 11—3shows how four enlarged squares correspond to one square.
Eachenlarged square belongs to a certain type, indicated in Figure 11—3.

FIGURE 11—3 The productionof four enlarged squares.

a b a b b

c d c c
type I type 2
type 3 type 4

Let us imagine that the entire space is already divided into squares with
widthö. Then the enlargement of these squares will induce four sets of
enlarged
squares,denoted as Tl, T2, T3and T4 corresponding to type 1, type 2, type
3 and
type4 squares respectively.A typical case is illustrated in Figure 11—4.
Of course, the critical question is to find the appropriate mesh size ö.
If ö is
too large, the original square will be very large and a large number of
distances
willbe calculated.In fact, if ö is very large, there is almost no dividing
and our
problembecomes the original problem. On the other hand, cannot be
because too small
it cannot be smaller than the shortest distance. In the randomized
algorithm,we randomly select a subset of points and find the shortest
distance
amongthis subset of points. This shortest distance will become
our 6.
556 CHAPII.R 1 1

FIGURE 11—4 Four sets of enlarged squares.

(u)

Algorithm 11—1 A randomized algorithm for finding a closest


pair
2
Input: A set S consisting of n elementsXl, x2, ... ,xn, where S C R
Output: The closest pair in S.
Step 1. Randomlychoose a set Sl = {Xil,Xn,.. where m = n
Find the closest pair of Sl and let the distance between this pair of
points be denoted as 5.
Step 2. Construct a set of squares T with mesh size b.
Step 3. Construct four sets of squares Tl, T2, T3 and T4 derived from T by
doubling the mesh size to 28.
Step 4. For each Ti, find the induced decompositionS = U IJ
I i S 4, where is a non-empty intersectionof S with a square of Te
RANOOMl,'Vt)At-GoprrtlMs
557

5. For each .v„.xq. e SP. compute d(x , xq). Lct xo and be thc pair Of
points with the shortest distance among thcsc pairs. Rcturn xo and Xl,as
pain
thC closest

Example 11—1
given a set S of 27 points, shown in Figure 11—5.In Step l, we randomly
9 elements Xl, x2, ,x9. It can be seen that the closest pair is
We use the distance between x, and x2 as the mesh size to construct
squaresas required in Step 2. There will be four sets of squares T1, T2,T3

110:26,0: 251, 46,0: 261,146: 66, o: 251


68, 48 651).
: 661).
(10:25, 8: 361,[2b: 46, ö•.361,(46: 6b, 6: 36), ... , [46: 66, 36:56)).
35, 361, 58, 36), ... , 56,36: 561).

illustrating the randomized closest pair algorithm.


FIGURE11—5 An example
66

56
AAUUEU
4b

36

26

2b 36 46 56 66

distance computationsis
The total number of mutual
<+
N(TI).. 28.
c; +
N(T2): + < = 26.
+ + + C32
C•42
N(T3).. + + = 24.
C} + < + c} + C; = 22.
N(T4)•.
573
RANDOMVTD

Action of B:

(a) Suppose that again, b b b), b4


(b) Suppose that - 9, 4, 7, 10.
(c) It can be easily proved that (WI,w2, vv3,Wd)= (12, 3, l. 9).
(d) Send (12, 3, l, 9) to A.

Action of A:

(a) Receive (12, 3, l. 9) from B.


(b) (13, 12) e QR, = O.
(13, 3) e QR, O.
(13, 1) e QR, = O.
(13, 9) E QR, % = O.
(c) Send (0, O,O,0) to B.

Action of B:
Since it is not true that bi = q for all i, B accepts the fact that 4 is a quadratic
residue mod 13.
The theoretical basis of this randomized algorithm is a number theory and is
beyond the scope of this book.

A RANDOMIZEDLINEAR TIME ALGORITHMFOR MINIMUM


SPANNING TREES
In Chapter 4, we presented two minimum spanning tree algorithms based on the
greedy method. One of these tree algorithms is the Kruskal's algorithm whose
time complexity is O(n log n) and the other algorithm is the Prim's algorithm.
The time complexity of a sophisticated version of this algorithm is
O(n+ ma(m, n)) where n(m) is the number of nodes (edges) of a graph and
a(m, n) is the inverse Ackermann's function. In this section, we shall present a
randomized minimum spanning tree algorithm whose expected time complexity
is O(n+ m).
This algorithm is based upon a so-called Boruvka step, proposed by Boruvka
in 1926. The following lemma illustrates the idea behind the Boruvka step:
574 CllArn:R Il

LEMMAI: Let VI and V2 be non-empty vertex sets where VIU = V and


end
VI n V2= d)and let the edge (v, u) be the minimum-weighted edge with one
the
point in VI and another end point in V2.Then, (v, u) must be contained in
minimum spanning tree in G.

Lemma I can be depicted in another form as follows: In a graph G, for any


node u, among all edges incident on u, if edge (u, v) has the smallest weight, then
prove
(u, v) must be an edge in the minimum spanning tree of G. It is easy to
edge
Lemma l. Consider Figure 11—7.For node c, among all edges incident on c,
the
(c, e) has the smallest weight. Therefore, edge (c, e) must be included in
minimum spanning tree of G. Similarly, it can be easily proved that edge (f, g)
must also be included.

FIGURE 11-7 A graph.


4

2 3 7
5 2

6
4 3
3 5
1

2 6 4
4

9 2

Now, let us select all edges of the graph in Figure 11—7which must be
included in the minimum spanning tree based upon Lemma 1. The resulting
connected components are shown in Figure 11—8.In the graph in Figure 11—8,
all of the dotted lines are connected components edges.
Let us contract all nodes in each connected component to one node. There
are thus five nodes now, shown in Figure 11—9.After we eliminate multiple edges
and loops, the result is shown in Figure 11—10.
Since the resulting graph consists of more than one node, we may apply
Lemma 1 again. The result is shown in Figure Il—Il and the selected edges are
(a, d),(c, l) and (g, h).
After contracting the nodes in each connected components, we have now two
nodes, shown in Figure 11—12.
RANDOMIZEDALGORffHMS 575

FIGURE 11—8 The selectionof


edges in the Boruvka step.

FIGURE 11—9 The construction of nodes.

cde

hjk

ilmn

FIGURE 11—10 The result of applying the first Boruvka step.

ab
cde

hjk

ilmn
576

edges.
FIGURE Il—Il The second selection of
fg

ede
3

hjk

ilmn

FIGURE 11—12 The second construction of edges.


7
abcdeilmn

fghjk

Again. after we eliminate multiple edges and select edge (i. h), we can now
contract all nodes into one node. The process is completed. All the edges selected
constitute the minimum spanning tree which is shown in Figure 11—13.

FIGURE 11—13 The minimum spanning tree obtained by Boruvka step.


4

g
3
3

Boruvka's algorithm for finding the minimum spanning tree applies the
Boruvka step recursively until the resulting graph is reduced to a single vertex.
Let the input of the Boruvkastep be a graph G(V,E) and the output be a graph
G'(V', E'). The Boruvkastep is described next.
Al oogrtbfMs
RANDOMIZED 577

The Boruvka Step


1. For each node u, find the edge (u, v) with the smallest weight connected
to it. Find all the connected components
determined by the marked
edges.
2. Contract each connected component determined by the marked edges
into one single vertex. Let the resulting graph be G'(V', E'). Eliminate
multiple edges and loops.

The time complexity for one Boruvka step is + m) where IVI= n and
IEi = m. Since G is connected, m > n. Hence, + m) = O(m). Since each
connected component determined by the marked edges contains at least two
vertices,after each Boruvka step is executed, the number of remaining edges is
smaller than one half of the original one. Hence, the total number of Boruvka
steps executed is O(log n). The total time complexity of the Boruvka algorithm
is O(m log n).
To use the Boruvka step efficiently,we must use a new concept. Consider
Figure 11—14. In Figure graph GSis a subgraphof graph G in Figure
A minimum spanning forest F of GSis shown in Figure 11—14(c). In
Figure 11—14(d), the minimum spanning forest F is imbedded in the original
graph G. All the edges which are not edges of.F are now dotted edges. Let us
consideredge (e,f). The weight of (e,f) is 7. Yet, there is a path between e and
f in the forest F, namely (e, d) —i(d, g) + (g,J). The weight of (e,J) is larger
than the maximum weight of edges in this path. According to a lemma which will
be presented below, the edge (e,f) cannot be an edge of a minimum spanning
tree of G. Before presenting the lemma, let us define a term called F-heavy.
Let w(x,y) denote the weight of edge (x,y) in G. Let Gs denote a subgraph
of G. Let F denote a minimum spanning forest of Gs. Let WF(x,y) denote the
maximumweight of an edge in the path connectingx and y in F. If x and y are
not connected in F, let WF(x,y) = 00.We say that edge (x, y) is F-heavy (F-light)
with respect to F if w(x, y) > WF(x,y)(w(x, y) WF(x,y)).
Consider Figure 11—14(d). We can see that edges (e,f), (a, d) and (c, g) are
all F-heavy with respect to F. Having defined this new concept, we can have the
followinglemma which is quite importantfor us to use the Boruvka step.

2: Let Gs be a subgraphof a graph G(V,E). Let F be a minimum


LEMMA
spanning forest of Gs. The F-heavy edges in G with respect to F cannot be
minimumspanning tree edges of G.
578 CHAPTER I I

FIGURE 11-14 F-heavy edges.

12

10
10

10

(a) (b)

10

10

(c) (d)

We shall not prove the above lemma. With this lemma, we know that edges
(e,f), (a, d) and (c, g) cannot be minimum spanning tree edges.
We need another lemma for fully utilizing the Boruvka step mechanism. This
is lemma 3.

LEMMA
3: Let H be a subgraph obtained from G by including each edge
independently with probability p, and let F be the minimum spanning forest of
H. The expected number of F-light edges in G is at most n/p where n is the
number of vertices of G.
579
randomi,'cd minimum spanning trcc
algorithm iq next

Algorithm11—50 A randomized minimumspanning tree algorithm


Input: A weighted connected graph G.
output: A minimum spanning trcc of G.
Step 1. Apply thc Boruvka step three times. Let the resulting graph be
Gl(Vt, El). If Gl contains onc node, return the set edges marked in
of
Step I and cxiJ.
Step 2. Obtain a subgraph H of Gl by selecting each edge independently with
probability 1/2. Apply the algorithm recursively to H to obtain a
minimum spanning forest F of H. Get a graph G2(V2,E2) by deleting
all F-heavy edges in Gl with respect to F.
Step 3. Apply the algorithm recursively to G2.

Let us analyze the time complexity of Algorithm 11—5next.


Let T(IVI,IEI) denote the expected running time of the algorithm for graph

Each execution of Step 1 takes O(IVI+ IEI) time. After Step I is executed,
we have IVII IV1/8 and IEII < IEI. For Step 2, the time needed to compute H is
O(IVII+ IEll) = O(IVI+ IEl). The time needed for computing F is
T(IVII,IEll/2) T(lVl/8, IEl/2). The time needed for deleting all F-heavy edges is
O(IVII+ IE Il) = O(lvl + IEI). Using Lemma 3, we have the expected value for
IE21is at most 211/21 IV1/4. Hence, the expected time needed to execute Step 3
141)= T(lvl/8, Ivl/4). Let IVI= n and IEI = m. Then we have the
is T(lV21,
following recurrence relation:

T(n, m) T(n/8, m/2) + T(n/8, n/4) + + m),

for some constant c. It can be proved that

T(n, m) < 2c •(n + m).

We encourage the reader to check this solution by substituting (2) into (l).
Hence,the expected running time of the algorithm is O(n + m).
580 CHAPTER
Il

N OTE's AND
Gill (1987)
A survey of randon)ized algoritluns can be found in Maffioli (1986).
and Kurtz ( 1987) also surveyed randonii'/,ed algorithlns. It is appropriate to point
out that the randotniz.edalgorithtn Incans different things to dilTcrent people, It
That is,
sotnetitne.sdenotes an algorithln which is good in average case analysis.
this algorithlil behaves dit•tlerentlyfor different sets ol' data. We insist, in this
book, that a randonlizeclalgorithln is an algoritlun which uses Clippingof coins
in the process. In other words, for the same input data, because Of the randomized
process, the prograni may behave very tnuch differently.
That the closest pair problem can be solved by a randomized algorithm was
proposed by Rabin (1976). For recent results, see Clarkson (1988). For testing
pritnes by randomized see Solovay and Strassen (1977) and Rabin
( 1980).That the prime ntnnber problem is a polynojnial problelll was proved
lately by Agrawal, Kayal and Saxena (2()()4).The randomized algorithm for
pattern matching appeared in Karp and Rabin (1987). The randomized algorithm
for interactive proofs can be found in Goldwasser, Micali and Rackoff (1988).
Galil, Haber and Yung (1989) suggested a further improvement of their method.
The randomized minimum spanning tree algorithm can be found in Karger, Klein
and Tarjan (1995). The Boruvka step can be found in Boruvka (1926) and the
method to delete F-heavy edges can be found in Dixon, Rauch and Tarjan (1992).
Randomized algorithms are also discussed extensively in Brassard and
Bratiey (1988).

FURTHER READING MATERIALS


Randomized algorithlns can be classified into two kinds: sequential and parallel.
Although this textbook is restricted to sequential algorithms, we are still going
to recommend some randomized parallel algorithms.
For randomized sequential algorithms, we recommend Agarwal and Sharir
(1996); Anderson and Woll (1997); Chazelle, Edelsbrunner,Guibas, Sharir and
Snoeyink (1993); Cheriyan and Harerup (1995); Clarkson (1987); Clarkson
(1988): d'Amore and Liberatore (1994); Dyer and Frieze (1989); Goldwasser and
Micali (1984); Kannan, Mount and Tayur (1995); Karger and Stein (1996); Karp
(1986); Karp, Motwani and Raghavan (1988); Matousek (1991); Matousek
(1995); Megiddo and Zemel (1986); Mulmuley, Vazirani and Vazirani (1987);
Raghavan and Thompson (1987); Teia (1993); Ting and Yao (1994); Wenger
(1997); wu and Tang (1992); Yao (1991); and Zemel (1987).
581

For randomized parallel algorithms, wc recommend Alon, Bahai and Ital


(1086):Anderson (1987): Luby (1986) and Spirakis (1988).
A largc batch of newly-publishedpapers include Aicllo. RaJagopalan and
s'cnkatcsan (1998): Albers (2002); Alberts and Hcnzingcr (1995); Arora and
Brinkman(2002): Banal and Grove (2000); Chen and Hwang (2003): Deng and
Mahajan (1991): Epstein. Noga. Seiden, Sgall and Woegingcr (1999): Froda
(2000); I-lar-PeIed (2000); Kleffe and Borodovsky (1992): Klein and
subramanian (1997); Leonardi, Spaccamela, Presciutti and Ros (2001
Meacham(1981) and Sgall (1996).

Exercises
11.1 Write a program to implement the randomized algorithm for
solving the closest pair problem. Test your algorithms.
11.2 Use the randomized prime number testing algorithm to determine
whether the following numbers are prime or not:
13, 15, 17.
11.3 Use the randomized pattern matching algorithm on the following
two strings:
x = 0101
Y = OOIOIII
11.4 Use the algorithm introducedin Section 11—5to determine whether
5 is a quadratic residue of 13 or not. Show an example in which you
would draw a wrong conclusion.
11.5 Read Section 8—5and 8—6of Brassard and Bratley 1988.

You might also like