Lecture 11
Lecture 11
19.1 The Disjoint Set ADT (also called the Union-Find ADT)
Two sets A and B are disjoint if their intersection is empty: A ∩ B = ∅. In other words, if there
is no element in both sets, then the sets are disjoint. The following abstract data type, called
“Disjoint Set” or “Union-Find,” deals with a group of sets where each set is disjoint from every
other set (i.e. they are pairwise disjoint).
Object: A collection of nonempty, pairwise disjoint sets: S1 , . . . , Sk . Each set contains a special
element called its representative.
Operations:
• MAKE-SET(x): Takes an element x that is not in any of the current sets, and adds the set
{x} to the collection. The representative of this new set is x.
• FIND-SET(x): Given an element x, return the representative of the set that contains x (or
some NIL if x does not belong to any set).
• UNION(x,y): Given two distinct elements x and y, let Si be the set that contains x and
Sj be the set that contains y. This operation adds the set Si ∪ Sj to the collection and it
removes Si and Sj (since all the sets must be disjoint). It also picks a representative for the
new set (how it chooses the representative is implementation dependent). Note: if x and y
originally belong to the same set, then Union(x,y) has no effect.
The Union-Find ADT provides us with an easy method for testing whether an undirected graph
is connected:
For all v in V do
MAKE-SET(v)
For all (u,v) in E do
UNION(u,v)
Now we can test whether there is a path between u and v by testing FIND-SET(u) = FIND-SET(v).
58
19.2 Pseudocode for Kruskal
KRUSKAL-MST(G=(V,E),w:E->Z)
A := {};
sort edges so w(e_1) <= w(e_2) <= ... <= w(e_m);
for each vertex v in V, MAKE-SET(v);
for i := 1 to m do
(let (u_i,v_i) = e_i)
if FIND-SET(u_i) != FIND-SET(v_i) then
UNION(u_i,v_i);
A := A U {e_i};
end if
end for
END KRUSKAL-MST
Intuitively, Kruskal’s algorithm grows an MCST A by repeatedly adding the “lightest” edge
from E that won’t create a cycle.
19.3 Correctness
We can argue correctness in a similar way to the way we proved correctness for Prim’s algorithm.
Proof. We use a similar argument as before. If e is part of T , then we are finished. If not, then e
forms a cycle with T . If so, there must be some other edge e0 that is in T but not contained in A
(because e does not form a cycle with A). Also, e0 cannot form a cycle with A, because otherwise,
it would form a cycle with T . By assumption, w(e) ≤ w(e0 ). Let T 0 = T ∪ {e} − {e0 }. Then, as
before, w(T 0 ) ≤ w(T ) and A ∪ {e} ⊆ T 0 .
59
The worst-case sequence complexity for m of these operations is certainly O(m 2 ): no list will
contain more than m elements since we can’t call MAKE-SET more than m times. The most
expensive operation is UNION; if we call this m times on lists of length m, it will take time
O(m2 ). Obviously this an overestimate of the time since we can’t call both MAKE-SET and
UNION m times.
We can show, however, that the worst-case sequence complexity of m operations is Ω(m 2 ).
To do this, we have to give a sequence that will take time Ω(m2 ): start by calling MAKE-SET
m/2 + 1 times on elements x1 , x2 , . . . , xm/2+1 . Now do the loop:
for i = 2 to m/2 do
UNION (x_i, x_1)
This will create a longer and longer list that keeps getting appended to a single element. The
execution of the loop takes time Θ(m2 ).
2. Linked lists with union-by-weight: Everything remains the same except we will store the
length of each linked list at the head. Whenever we do a UNION, we will take the shorter list
and append it to the longer list. So, UNION(x,y) will no longer take O( length of list y ), but
rather O(min{length(listx ), length(listy )}). This type of union is called “union-by-weight”
(where “weight” just refers to the length of the list).
It might seem like union-by-weight doesn’t make much of a difference, but it greatly affects
the worst-case sequence complexity. Consider a sequence of m operations and let n be the
number of MAKE-SET operations in the sequence (so there are never more than n elements in
total). UNION is the only expensive operation and it’s expensive because of the number of
times we might have to update pointers to the head of the list. For some arbitrary element
x, we want to prove an upper bound on the number of times that x’s head pointer can be
updated during the sequence of m operations. Note that this happens only when list x is
unioned with a list that is no shorter (because we update pointers only for the shorter list).
This means that each time x’s back pointer is updated, x’s new list is at least twice the
size of its old list. But the length of listx can double only log n times before it has length
greater than n (which it can’t have because there are only n elements). So we update x’s
head pointer at most log n times. Since x could be any of n possible elements, we do total of
at most n log n pointer updates. So the cost for all the UNION’s in the sequence is O(n log n).
The other operations can cost at most O(m) so the total worst-case sequence complexity is
O(m + n log n).
3. Trees: Represent each set by a tree, where each element points to its parent and the root
points back to itself. The representative of a set is the root. Note that the trees are not
necessarily binary trees: the number of children of a node can be arbitrarily large (or small).
60
FIND-SET(y). Then make rooty a child of root x. Since we have to do both FIND-SETs,
the running time is Θ(max{height(treex ), height(treey )}).
root_x root_y
y
x
root_x
UNION (x,y)
root_y
The worst-case sequence complexity for m operations is just like the linked list case, since we
can create a tree which is just a list:
for i = 1 to m/4 do
MAKE-SET(x_i)
for i = 1 to m/4 - 1 do
UNION(x_(i+1), x_i)
61
X(m/4-1)
Xm/4 m/4 - 1
.
.
.
X1
UNION(Xm/4, X(m/4-1))
Xm/4
X(m/4-1)
m/4 - 1
.
.
.
X1
Creating this tree takes m/4 MAKE-SET operations and m/4−1 UNION operations. The running
time for m/2 + 1 FIND-SET operations on x1 now is m/4(m/2 + 1) = Θ(m2 ).
Exercise. How do we know there is not a sequence of operations that takes longer than
Θ(m2 )?
62
We can prove two things about union-by-rank:
(a) The rank of any tree created by a sequence of these operations is equal to its height.
(b) The rank of any tree created by a sequence of these operations is O(log n), where n is
the number of MAKE-SETs in the sequence.
These two facts imply that the running times of FIND-SET and UNION are O(log n), so the
worst-case sequence complexity of m operations is O(m log n).
63