Randomized Algorithms Dsa CP3151
Randomized Algorithms Dsa CP3151
Randomized Algorithms
553
554 CHAPTER I I
x
x
x ext
distancesare calculated, we may double the side length of the square and produce
larger squares. The shortest distance must be within one of these enlarged
squares.Figure 11—3shows how four enlarged squares correspond to one square.
Eachenlarged square belongs to a certain type, indicated in Figure 11—3.
a b a b b
c d c c
type I type 2
type 3 type 4
Let us imagine that the entire space is already divided into squares with
widthö. Then the enlargement of these squares will induce four sets of
enlarged
squares,denoted as Tl, T2, T3and T4 corresponding to type 1, type 2, type
3 and
type4 squares respectively.A typical case is illustrated in Figure 11—4.
Of course, the critical question is to find the appropriate mesh size ö.
If ö is
too large, the original square will be very large and a large number of
distances
willbe calculated.In fact, if ö is very large, there is almost no dividing
and our
problembecomes the original problem. On the other hand, cannot be
because too small
it cannot be smaller than the shortest distance. In the randomized
algorithm,we randomly select a subset of points and find the shortest
distance
amongthis subset of points. This shortest distance will become
our 6.
556 CHAPII.R 1 1
(u)
5. For each .v„.xq. e SP. compute d(x , xq). Lct xo and be thc pair Of
points with the shortest distance among thcsc pairs. Rcturn xo and Xl,as
pain
thC closest
Example 11—1
given a set S of 27 points, shown in Figure 11—5.In Step l, we randomly
9 elements Xl, x2, ,x9. It can be seen that the closest pair is
We use the distance between x, and x2 as the mesh size to construct
squaresas required in Step 2. There will be four sets of squares T1, T2,T3
56
AAUUEU
4b
36
26
2b 36 46 56 66
distance computationsis
The total number of mutual
<+
N(TI).. 28.
c; +
N(T2): + < = 26.
+ + + C32
C•42
N(T3).. + + = 24.
C} + < + c} + C; = 22.
N(T4)•.
573
RANDOMVTD
Action of B:
Action of A:
Action of B:
Since it is not true that bi = q for all i, B accepts the fact that 4 is a quadratic
residue mod 13.
The theoretical basis of this randomized algorithm is a number theory and is
beyond the scope of this book.
2 3 7
5 2
6
4 3
3 5
1
2 6 4
4
9 2
Now, let us select all edges of the graph in Figure 11—7which must be
included in the minimum spanning tree based upon Lemma 1. The resulting
connected components are shown in Figure 11—8.In the graph in Figure 11—8,
all of the dotted lines are connected components edges.
Let us contract all nodes in each connected component to one node. There
are thus five nodes now, shown in Figure 11—9.After we eliminate multiple edges
and loops, the result is shown in Figure 11—10.
Since the resulting graph consists of more than one node, we may apply
Lemma 1 again. The result is shown in Figure Il—Il and the selected edges are
(a, d),(c, l) and (g, h).
After contracting the nodes in each connected components, we have now two
nodes, shown in Figure 11—12.
RANDOMIZEDALGORffHMS 575
cde
hjk
ilmn
ab
cde
hjk
ilmn
576
edges.
FIGURE Il—Il The second selection of
fg
ede
3
hjk
ilmn
fghjk
Again. after we eliminate multiple edges and select edge (i. h), we can now
contract all nodes into one node. The process is completed. All the edges selected
constitute the minimum spanning tree which is shown in Figure 11—13.
g
3
3
Boruvka's algorithm for finding the minimum spanning tree applies the
Boruvka step recursively until the resulting graph is reduced to a single vertex.
Let the input of the Boruvkastep be a graph G(V,E) and the output be a graph
G'(V', E'). The Boruvkastep is described next.
Al oogrtbfMs
RANDOMIZED 577
The time complexity for one Boruvka step is + m) where IVI= n and
IEi = m. Since G is connected, m > n. Hence, + m) = O(m). Since each
connected component determined by the marked edges contains at least two
vertices,after each Boruvka step is executed, the number of remaining edges is
smaller than one half of the original one. Hence, the total number of Boruvka
steps executed is O(log n). The total time complexity of the Boruvka algorithm
is O(m log n).
To use the Boruvka step efficiently,we must use a new concept. Consider
Figure 11—14. In Figure graph GSis a subgraphof graph G in Figure
A minimum spanning forest F of GSis shown in Figure 11—14(c). In
Figure 11—14(d), the minimum spanning forest F is imbedded in the original
graph G. All the edges which are not edges of.F are now dotted edges. Let us
consideredge (e,f). The weight of (e,f) is 7. Yet, there is a path between e and
f in the forest F, namely (e, d) —i(d, g) + (g,J). The weight of (e,J) is larger
than the maximum weight of edges in this path. According to a lemma which will
be presented below, the edge (e,f) cannot be an edge of a minimum spanning
tree of G. Before presenting the lemma, let us define a term called F-heavy.
Let w(x,y) denote the weight of edge (x,y) in G. Let Gs denote a subgraph
of G. Let F denote a minimum spanning forest of Gs. Let WF(x,y) denote the
maximumweight of an edge in the path connectingx and y in F. If x and y are
not connected in F, let WF(x,y) = 00.We say that edge (x, y) is F-heavy (F-light)
with respect to F if w(x, y) > WF(x,y)(w(x, y) WF(x,y)).
Consider Figure 11—14(d). We can see that edges (e,f), (a, d) and (c, g) are
all F-heavy with respect to F. Having defined this new concept, we can have the
followinglemma which is quite importantfor us to use the Boruvka step.
12
10
10
10
(a) (b)
10
10
(c) (d)
We shall not prove the above lemma. With this lemma, we know that edges
(e,f), (a, d) and (c, g) cannot be minimum spanning tree edges.
We need another lemma for fully utilizing the Boruvka step mechanism. This
is lemma 3.
LEMMA
3: Let H be a subgraph obtained from G by including each edge
independently with probability p, and let F be the minimum spanning forest of
H. The expected number of F-light edges in G is at most n/p where n is the
number of vertices of G.
579
randomi,'cd minimum spanning trcc
algorithm iq next
Each execution of Step 1 takes O(IVI+ IEI) time. After Step I is executed,
we have IVII IV1/8 and IEII < IEI. For Step 2, the time needed to compute H is
O(IVII+ IEll) = O(IVI+ IEl). The time needed for computing F is
T(IVII,IEll/2) T(lVl/8, IEl/2). The time needed for deleting all F-heavy edges is
O(IVII+ IE Il) = O(lvl + IEI). Using Lemma 3, we have the expected value for
IE21is at most 211/21 IV1/4. Hence, the expected time needed to execute Step 3
141)= T(lvl/8, Ivl/4). Let IVI= n and IEI = m. Then we have the
is T(lV21,
following recurrence relation:
We encourage the reader to check this solution by substituting (2) into (l).
Hence,the expected running time of the algorithm is O(n + m).
580 CHAPTER
Il
N OTE's AND
Gill (1987)
A survey of randon)ized algoritluns can be found in Maffioli (1986).
and Kurtz ( 1987) also surveyed randonii'/,ed algorithlns. It is appropriate to point
out that the randotniz.edalgorithtn Incans different things to dilTcrent people, It
That is,
sotnetitne.sdenotes an algorithln which is good in average case analysis.
this algorithlil behaves dit•tlerentlyfor different sets ol' data. We insist, in this
book, that a randonlizeclalgorithln is an algoritlun which uses Clippingof coins
in the process. In other words, for the same input data, because Of the randomized
process, the prograni may behave very tnuch differently.
That the closest pair problem can be solved by a randomized algorithm was
proposed by Rabin (1976). For recent results, see Clarkson (1988). For testing
pritnes by randomized see Solovay and Strassen (1977) and Rabin
( 1980).That the prime ntnnber problem is a polynojnial problelll was proved
lately by Agrawal, Kayal and Saxena (2()()4).The randomized algorithm for
pattern matching appeared in Karp and Rabin (1987). The randomized algorithm
for interactive proofs can be found in Goldwasser, Micali and Rackoff (1988).
Galil, Haber and Yung (1989) suggested a further improvement of their method.
The randomized minimum spanning tree algorithm can be found in Karger, Klein
and Tarjan (1995). The Boruvka step can be found in Boruvka (1926) and the
method to delete F-heavy edges can be found in Dixon, Rauch and Tarjan (1992).
Randomized algorithms are also discussed extensively in Brassard and
Bratiey (1988).
Exercises
11.1 Write a program to implement the randomized algorithm for
solving the closest pair problem. Test your algorithms.
11.2 Use the randomized prime number testing algorithm to determine
whether the following numbers are prime or not:
13, 15, 17.
11.3 Use the randomized pattern matching algorithm on the following
two strings:
x = 0101
Y = OOIOIII
11.4 Use the algorithm introducedin Section 11—5to determine whether
5 is a quadratic residue of 13 or not. Show an example in which you
would draw a wrong conclusion.
11.5 Read Section 8—5and 8—6of Brassard and Bratley 1988.