0% found this document useful (0 votes)
43 views

Lecture 11

The document discusses Kruskal's algorithm for finding a minimum spanning tree (MST) in a graph. It introduces the union-find disjoint set data structure and describes how Kruskal's algorithm uses it. Several implementations of the union-find data structure are presented, including linked lists, trees, and improved versions that use weights or ranks to balance trees during merging.

Uploaded by

bunty da
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views

Lecture 11

The document discusses Kruskal's algorithm for finding a minimum spanning tree (MST) in a graph. It introduces the union-find disjoint set data structure and describes how Kruskal's algorithm uses it. Several implementations of the union-find data structure are presented, including linked lists, trees, and improved versions that use weights or ranks to balance trees during merging.

Uploaded by

bunty da
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

CSC 263 Lecture 11

September 13, 2006

19 Kruskal’s Algorithm for MCST


Kruskal’s algorithm uses a Union-Find ADT. We need to define this before proceeding with the
algorithm.

19.1 The Disjoint Set ADT (also called the Union-Find ADT)
Two sets A and B are disjoint if their intersection is empty: A ∩ B = ∅. In other words, if there
is no element in both sets, then the sets are disjoint. The following abstract data type, called
“Disjoint Set” or “Union-Find,” deals with a group of sets where each set is disjoint from every
other set (i.e. they are pairwise disjoint).
Object: A collection of nonempty, pairwise disjoint sets: S1 , . . . , Sk . Each set contains a special
element called its representative.
Operations:

• MAKE-SET(x): Takes an element x that is not in any of the current sets, and adds the set
{x} to the collection. The representative of this new set is x.

• FIND-SET(x): Given an element x, return the representative of the set that contains x (or
some NIL if x does not belong to any set).

• UNION(x,y): Given two distinct elements x and y, let Si be the set that contains x and
Sj be the set that contains y. This operation adds the set Si ∪ Sj to the collection and it
removes Si and Sj (since all the sets must be disjoint). It also picks a representative for the
new set (how it chooses the representative is implementation dependent). Note: if x and y
originally belong to the same set, then Union(x,y) has no effect.

The Union-Find ADT provides us with an easy method for testing whether an undirected graph
is connected:

For all v in V do
MAKE-SET(v)
For all (u,v) in E do
UNION(u,v)

Now we can test whether there is a path between u and v by testing FIND-SET(u) = FIND-SET(v).

58
19.2 Pseudocode for Kruskal
KRUSKAL-MST(G=(V,E),w:E->Z)
A := {};
sort edges so w(e_1) <= w(e_2) <= ... <= w(e_m);
for each vertex v in V, MAKE-SET(v);
for i := 1 to m do
(let (u_i,v_i) = e_i)
if FIND-SET(u_i) != FIND-SET(v_i) then
UNION(u_i,v_i);
A := A U {e_i};
end if
end for
END KRUSKAL-MST

Intuitively, Kruskal’s algorithm grows an MCST A by repeatedly adding the “lightest” edge
from E that won’t create a cycle.

19.3 Correctness
We can argue correctness in a similar way to the way we proved correctness for Prim’s algorithm.

Theorem. If G = (V, E) is a connected, undirected, weighted graph, A is a subgraph of some


MCST T of G, and e is any edge of minimum weight which does not create a cycle with A, then
A ∪ {e} is a subset of some MCST of G.

Proof. We use a similar argument as before. If e is part of T , then we are finished. If not, then e
forms a cycle with T . If so, there must be some other edge e0 that is in T but not contained in A
(because e does not form a cycle with A). Also, e0 cannot form a cycle with A, because otherwise,
it would form a cycle with T . By assumption, w(e) ≤ w(e0 ). Let T 0 = T ∪ {e} − {e0 }. Then, as
before, w(T 0 ) ≤ w(T ) and A ∪ {e} ⊆ T 0 .

19.4 Data Structures for Union-Find


1. Linked lists: Represent each set by a linked list, where each node is an element. The
representative element is the head of the list. Each node contains a pointer back to the head.
The head also contains a pointer to the tail. We can implement the operations as follows
(listx is the list containing x and listy is the list containing y):

• MAKE-SET(x): Just create a list of one node containing x. Time: O(1).


• FIND-SET(x): Just follow x’s pointer back to the head and return the head. Time:
O(1).
• UNION(x,y): Append listy to the end of listx . Since we can find the head of listy
and the tail of listx in constant time, this takes O(1) time. The representative of this
combined list is the head of listx , but the nodes of listy still point to the head of listy .
To update them to point to the head of listx , it takes time Θ( length of listy ).

59
The worst-case sequence complexity for m of these operations is certainly O(m 2 ): no list will
contain more than m elements since we can’t call MAKE-SET more than m times. The most
expensive operation is UNION; if we call this m times on lists of length m, it will take time
O(m2 ). Obviously this an overestimate of the time since we can’t call both MAKE-SET and
UNION m times.
We can show, however, that the worst-case sequence complexity of m operations is Ω(m 2 ).
To do this, we have to give a sequence that will take time Ω(m2 ): start by calling MAKE-SET
m/2 + 1 times on elements x1 , x2 , . . . , xm/2+1 . Now do the loop:

for i = 2 to m/2 do
UNION (x_i, x_1)

This will create a longer and longer list that keeps getting appended to a single element. The
execution of the loop takes time Θ(m2 ).

2. Linked lists with union-by-weight: Everything remains the same except we will store the
length of each linked list at the head. Whenever we do a UNION, we will take the shorter list
and append it to the longer list. So, UNION(x,y) will no longer take O( length of list y ), but
rather O(min{length(listx ), length(listy )}). This type of union is called “union-by-weight”
(where “weight” just refers to the length of the list).
It might seem like union-by-weight doesn’t make much of a difference, but it greatly affects
the worst-case sequence complexity. Consider a sequence of m operations and let n be the
number of MAKE-SET operations in the sequence (so there are never more than n elements in
total). UNION is the only expensive operation and it’s expensive because of the number of
times we might have to update pointers to the head of the list. For some arbitrary element
x, we want to prove an upper bound on the number of times that x’s head pointer can be
updated during the sequence of m operations. Note that this happens only when list x is
unioned with a list that is no shorter (because we update pointers only for the shorter list).
This means that each time x’s back pointer is updated, x’s new list is at least twice the
size of its old list. But the length of listx can double only log n times before it has length
greater than n (which it can’t have because there are only n elements). So we update x’s
head pointer at most log n times. Since x could be any of n possible elements, we do total of
at most n log n pointer updates. So the cost for all the UNION’s in the sequence is O(n log n).
The other operations can cost at most O(m) so the total worst-case sequence complexity is
O(m + n log n).

3. Trees: Represent each set by a tree, where each element points to its parent and the root
points back to itself. The representative of a set is the root. Note that the trees are not
necessarily binary trees: the number of children of a node can be arbitrarily large (or small).

• MAKE-SET(x): Just create a tree with a single node x. Time: O(1).


• FIND-SET(x): Follow the parent pointers from x until you reach the root. Return
root. Time: Θ( height of tree ).
• UNION(x,y): Let rootx be the root of the tree containing x, treex , and let rooty be the
root of the tree containing y, treey . We can find rootx and rooty using FIND-SET(x) and

60
FIND-SET(y). Then make rooty a child of root x. Since we have to do both FIND-SETs,
the running time is Θ(max{height(treex ), height(treey )}).
root_x root_y

y
x

root_x
UNION (x,y)
root_y

The worst-case sequence complexity for m operations is just like the linked list case, since we
can create a tree which is just a list:

for i = 1 to m/4 do
MAKE-SET(x_i)
for i = 1 to m/4 - 1 do
UNION(x_(i+1), x_i)

61
X(m/4-1)
Xm/4 m/4 - 1

.
.
.

X1

UNION(Xm/4, X(m/4-1))

Xm/4

X(m/4-1)
m/4 - 1

.
.
.

X1
Creating this tree takes m/4 MAKE-SET operations and m/4−1 UNION operations. The running
time for m/2 + 1 FIND-SET operations on x1 now is m/4(m/2 + 1) = Θ(m2 ).

Exercise. How do we know there is not a sequence of operations that takes longer than
Θ(m2 )?

4. Trees with union-by-rank: We improved the performance of the linked-list implemen-


tation by using “weight” or “size” information during UNION. We will do the same thing for
trees, using “rank” information. The rank of a tree is an integer that will be stored at the
root:

• MAKE-SET(x): Same as before. Set rank = 0.


• UNION(x,y): If rank(treex ) ≥ rank(treey ) then make rooty a child of rootx . Other-
wise, make rootx a child of rooty . The rank of the combined tree is rank(treex ) + 1 if
rank(treex ) = rank(treey ), and max{rank(treex ), rank(treey )} otherwise. The running
time is still Θ(max{height(treex ), height(treey )}).
• FIND-SET(x): Same as before.

62
We can prove two things about union-by-rank:

(a) The rank of any tree created by a sequence of these operations is equal to its height.
(b) The rank of any tree created by a sequence of these operations is O(log n), where n is
the number of MAKE-SETs in the sequence.

These two facts imply that the running times of FIND-SET and UNION are O(log n), so the
worst-case sequence complexity of m operations is O(m log n).

5. Trees with union-by-rank and path compression: In addition to doing union-by-rank,


there is another way to improve the tree implementation of Union-Find: When performing
FIND-SET(x), keep track of the nodes visited on the path from x to rootx (in a stack or queue),
and once the root is found, update the parent pointers of each of these nodes to point directly
to the root. This at most doubles the running time of the current FIND-SET operation, but
it can speed up future FIND-SETs. This technique is called “path compression.”
This is the state-of-the-art data structure for Union-Find. Its worst case sequence complexity
is O(m log∗ n) (see section 22.4 of the text for a proof). The function log ∗ n is a very slowly
growing function; it is equal to the number of times you need to apply log to n before the
answer is less than 1. For example, if n = 15, then 3 < log n < 4, so 1 < log log n < 2 and
2
22
log log log n < 1. So log∗ n = 3. Also, if n = 265536 = 22 , then log∗ n = 5.

19.5 Complexity of Kruskal’s Algorithm


Let’s assume that m, the number of edges, is at least n − 1, where n is the number of vertices,
otherwise G is not connected and there is no spanning tree. Sorting the edges can be done in
time O(m log m) using mergesort, for example. Let’s also assume that we implement Union-Find
using linked-lists with union-by-weight. We do n MAKE-SETs, at most 2m FIND-SETs and at most
m UNIONs. The first two take time O(n) and O(m), respectively. The last can take time at most
O(n log n) since in that amount of time we would have built up the set of all vertices. Hence, the
running time of Kruskal is O(m log m + n + m + n log n) = O(m log m).

63

You might also like