minimum span
minimum span
Def. A spanning tree of G is a subgraph T that is: Def. A spanning tree of G is a subgraph T that is:
Connected. Connected.
Acyclic. Acyclic.
Includes all of the vertices. Includes all of the vertices.
3 4
Minimum spanning tree Minimum spanning tree
Def. A spanning tree of G is a subgraph T that is: Def. A spanning tree of G is a subgraph T that is:
Connected. Connected.
Acyclic. Acyclic.
Includes all of the vertices. Includes all of the vertices.
5 6
Given. Undirected graph G with positive edge weights (connected). Given. Undirected graph G with positive edge weights (connected).
Goal. Find a min weight spanning tree. Goal. Find a min weight spanning tree.
24 24
4 4
6 23 9 6 23 9
18 18
5 5
11 11
16 16
8 8
7 7
10 14 10 14
21 21
http://www.flickr.com/photos/ewedistrict/21980840 http://algo.inria.fr/broutin/gallery.html
9 10
MST dithering
http://www.bccrc.ca/ci/ta01_archlevel.html
http://www.flickr.com/photos/quasimondo/2695389651
11 12
Applications
13
Graph is connected. Def. A cut in a graph is a partition of its vertices into two (nonempty) sets.
Edge weights are distinct. Def. A crossing edge connects a vertex in one set with a vertex in the other.
Cut property. Given any cut, the crossing edge of min weight is in the MST.
Consequence. MST exists and is unique.
5
7
2 10 no two edge
6 4
weights are equal
13
9 1
3
12 8
14 16
20
Def. A cut in a graph is a partition of its vertices into two (nonempty) sets. Start with all edges colored gray.
Def. A crossing edge connects a vertex in one set with a vertex in the other. Find cut with no black crossing edges; color its min-weight edge black.
Repeat until V - 1 edges are colored black.
Cut property. Given any cut, the crossing edge of min weight is in the MST.
0-7 0.16
Pf. Suppose min-weight crossing edge e is not in the MST.
2-3 0.17
Adding e to the MST creates a cycle. 1 1-7 0.19
3
Some other edge f in cycle must be a crossing edge. 0-2 0.26
5
Removing f and adding e is also a spanning tree. 5-7 0.28
7 1-3 0.29
Since weight of e is less than the weight of f, 2
1-5 0.32
that spanning tree is lower weight. f 0 2-7 0.34
Contradiction. ▪ 4-5 0.35
4 6 1-2 0.36
e 4-7 0.37
0-4 0.38
an edge-weighted graph 6-2 0.40
the MST does
not contain e 3-6 0.52
adding e to MST 6-0 0.58
creates a cycle 6-4 0.93
17 18
Start with all edges colored gray. Proposition. The greedy algorithm computes the MST.
Find cut with no black crossing edges; color its min-weight edge black.
Repeat until V - 1 edges are colored black. Pf.
Any edge colored black is in the MST (via cut property).
Fewer than V - 1 black edges ⇒ cut with no black crossing edges.
1 (consider cut whose vertices are any one connected component)
3
5
7
2
4 6
fewer than V-1 edges colored black aacut
cutwith
withnonoblack
blackcrossing
crossingedges
edges fewer
fewerthan
than V-1
V-1 edges
edgescolored
coloredblack
black a cut with no black cro
MST edges
0-2 5-7 6-2 0-7 2-3 1-7 4-5
19 20
2 6 0.17 1 5 0.02
0 4 -0.99
weights can be 0 or negative 1 6 0
4 6 0.62 0 2 0.22
Greedy MST algorithm: efficient implementations Removing two simplifying assumptions5 6 0.88
1 5 0.02
1
1
2 0.50
3 0.97
0 4 -0.99 2 6 0.17
1 6 0
MST may not be unique
Proposition. The greedy algorithm computes the MST. Q. What if edge weights0are not all distinct?
2 0.22
when weights have equal values
1 2 0.50
A. Greedy MST algorithm 1 3still correct if equal weights are present!
0.97 1 2 1.00
2 6 0.17 1 3 0.50
Efficient implementations. Choose cut? Find min-weight edge? (our correctness proof fails, but that can be fixed) 2 4 1.00
MST may not be unique
3 4 0.50
Ex 1. Kruskal's algorithm. [stay tuned] when weights have equal values
1 2 1.00 1 2 1.00
Ex 2. Prim's algorithm. [stay tuned] 1 3 0.50 1 3 0.50
2 4 1.00 2 4 1.00
Ex 3. Borüvka's algorithm. 3 4 0.50 3 4 0.50
1 2 1.00
1 3 0.50 Various MST anomalies
2 4 1.00
Q. What if graph is not 3 4 0.50
connected?
A. Compute minimum spanning forest = MST of each component.
Various MST anomalies
no MST if graph is not connected
4 5 0.61
4 6 0.62
5 6 0.88
1 5 0.11
2 3 0.35
0 3 0.6
1 6 0.10
0 2 0.22
can independently compute
MSTs of components
21 22
weights need not be
proportional to distance
4 6 0.62
5 6 0.88
Greed is good 1 5 0.02
0 4 0.64
1 6 0.90
0 2 0.22
1 2 0.50
1 3 0.97
2 6 0.17
1 2 1.00
http://algs4.cs.princeton.edu ‣ context 1 3 0.50
2 4 1.00
3 4 0.50
Gordon Gecko (Michael Douglas) address to Teldar Paper Stockholders in Wall Street (1986)
23
Weighted edge API Weighted edge: Java implementation
Edge(int v, int w, double weight) create a weighted edge v-w public Edge(int v, int w, double weight)
{ constructor
this.v = v;
int either() either endpoint
this.w = w;
this.weight = weight;
int other(int v) the endpoint that's not v }
int compareTo(Edge that) compare this edge to that edge public int either()
{ return v; } either endpoint
MST(EdgeWeightedGraph G) constructor
31
Kruskal's algorithm demo Kruskal's algorithm demo
Consider edges in ascending order of weight. Consider edges in ascending order of weight.
Add next edge to tree T unless doing so would create a cycle. graph edges
Add next edge to tree T unless doing so would create a cycle.
sorted by weight
35 36
Kruskal's algorithm: implementation challenge Kruskal's algorithm: implementation challenge
Challenge. Would adding edge v–w to tree T create a cycle? If not, add it. Challenge. Would adding edge v–w to tree T create a cycle? If not, add it.
v w v
Case 1: adding v–w creates a cycle Case 2: add v–w to T and merge sets containing v and w
37 38
public KruskalMST(EdgeWeightedGraph G) build priority queue Pf. operation frequency time per op
{ (or sort)
MinPQ<Edge> pq = new MinPQ<Edge>(G.edges()); build pq 1 E
0-7 0.16
4.3 M INIMUM S PANNING T REES 2-3 0.17
1 1-7 0.19
3
‣ introduction 5
0-2
5-7
0.26
0.28
‣ greedy algorithm 7
1-3 0.29
2 1-5 0.32
‣ edge-weighted graph API
Algorithms 0
2-7
4-5
0.34
0.35
‣ Kruskal's algorithm 1-2 0.36
4-7 0.37
R OBERT S EDGEWICK | K EVIN W AYNE
‣ Prim's algorithm 4 6
0-4 0.38
42
1
3
5
7
2
4 6
MST edges
0-7 1-7 0-2 2-3 5-7 4-5 6-2
43 44
Prim's algorithm: proof of correctness Prim's algorithm: implementation challenge
Proposition. [Jarník 1930, Dijkstra 1957, Prim 1959] Challenge. Find the min weight edge with exactly one endpoint in T.
Prim's algorithm computes the MST.
How difficult?
Pf. Prim's algorithm is a special case of the greedy MST algorithm. E try all edges
45 46
Challenge. Find the min weight edge with exactly one endpoint in T. Start with vertex 0 and greedily grow tree T.
Add to T the min weight edge with exactly one endpoint in T.
Lazy solution. Maintain a PQ of edges with (at least) one endpoint in T. Repeat until V - 1 edges.
Key = edge; priority = weight of edge.
0-7 0.16
Delete-min to determine next edge e = v–w to add to T.
2-3 0.17
Disregard if both endpoints v and w are marked (both in T). 1 1-7 0.19
3
Otherwise, let w be the unmarked vertex (not in T ): 0-2 0.26
5
– add to PQ any edge incident to w (assuming other endpoint not in T) 5-7 0.28
7 1-3 0.29
– add e to T and mark w 2
1-5 0.32
1-7 is min weight edge with 0 2-7 0.34
exactly one endpoint in T
priority queue 4-5 0.35
of crossing edges
4 6 1-2 0.36
1-7 0.19 4-7 0.37
0-2 0.26
0-4 0.38
5-7 0.28
2-7 0.34 an edge-weighted graph 6-2 0.40
4-7 0.37 3-6 0.52
0-4 0.38
6-0 0.58
6-0 0.58
6-4 0.93
47 48
Prim's algorithm (lazy) demo Prim's algorithm: lazy implementation
public LazyPrimMST(WeightedGraph G)
1 {
3 pq = new MinPQ<Edge>();
5 mst = new Queue<Edge>();
marked = new boolean[G.V()];
7 visit(G, 0); assume G is connected
2
51 52
Prim's algorithm: eager implementation Prim's algorithm (eager) demo
Challenge. Find min weight edge with exactly one endpoint in T. Start with vertex 0 and greedily grow tree T.
Add to T the min weight edge with exactly one endpoint in T.
pq has at most one entry per vertex
Repeat until V - 1 edges.
Eager solution. Maintain a PQ of vertices connected by an edge to T,
0-7 0.16
where priority of vertex v = weight of shortest edge connecting v to T.
2-3 0.17
Delete min vertex v and add its associated edge e = v–w to T. 1 1-7 0.19
3
Update PQ by considering all edges e = v–x incident to v 0-2 0.26
5
– ignore if x is already in T 5-7 0.28
7 1-3 0.29
– add x to PQ if not already on it 2
1-5 0.32
– decrease priority of x if v–x becomes shortest edge connecting x to T 0 2-7 0.34
4-5 0.35
0 4 6 1-2 0.36
1 1-7 0.19
4-7 0.37
2 0-2 0.26 red: on PQ
3 1-3 0.29 0-4 0.38
4 0-4 0.38
5 5-7 0.28 an edge-weighted graph 6-2 0.40
6 6-0 0.58 3-6 0.52
7 0-7 0.16
6-0 0.58
6-4 0.93
black: on MST
53 54
Start with vertex 0 and greedily grow tree T. Associate an index between 0 and N - 1 with each key in a priority queue.
Add to T the min weight edge with exactly one endpoint in T. Supports insert and delete-the-minimum.
Repeat until V - 1 edges. Supports decrease-key given the index of the key.
v edgeTo[] distTo[]
0 - -
public class IndexMinPQ<Key extends Comparable<Key>>
1
3 7 0–7 0.16 create indexed priority queue
IndexMinPQ(int N)
5 1 1–7 0.19 with indices 0, 1, …, N – 1
2 0–2 0.26
7 void insert(int i, Key key) associate key with index i
2 3 2–3 0.17
5 5–7 0.28 void decreaseKey(int i, Key key) decrease the key associated with index i
0 4 4–5 0.35
boolean contains(int i) is i an index on the priority queue?
6 6–2 0.40
4 6 remove a minimal key and return its
int delMin()
associated index
55 56
Indexed priority queue implementation Prim's algorithm: which priority queue?
Binary heap implementation. [see Section 2.4 of textbook] Depends on PQ implementation: V insert, V delete-min, E decrease-key.
Start with same code as MinPQ.
Maintain parallel arrays keys[], pq[], and qp[] so that: PQ implementation insert delete-min decrease-key total
– keys[i] is the priority of i
– pq[i] is the index of the key in heap position i unordered array 1 V 1 V2
60
Euclidean MST Scientific application: clustering
Given N points in the plane, find MST connecting them, where the distances k-clustering. Divide a set of objects classify into k coherent groups.
between point pairs are their Euclidean distances. Distance function. Numeric value specifying "closeness" of two objects.
Goal. Divide into clusters so that objects in different clusters are far apart.
Applications.
Routing in mobile ad hoc networks.
Document categorization for web search.
Brute force. Compute ~ N 2/ 2 distances and run Prim's algorithm. Similarity searching in medical image databases.
Ingenuity. Exploit geometry and do it in ~ c N log N. Skycat: cluster 109 sky objects into stars, quasars, galaxies.
61 62
k-clustering. Divide a set of objects classify into k coherent groups. “Well-known” algorithm in science literature for single-link clustering:
Distance function. Numeric value specifying "closeness" of two objects. Form V clusters of one object each.
Find the closest pair of objects such that each object is in a different
Single link. Distance between two clusters equals the distance cluster, and merge the two clusters.
between the two closest objects (one in each cluster). Repeat until there are exactly k clusters.
Single-link clustering. Given an integer k, find a k-clustering that Observation. This is Kruskal's algorithm.
maximizes the distance between two closest clusters. (stopping when k connected components)
4-clustering
Alternate solution. Run Prim; then delete k – 1 max weight edges.
63 64
Dendrogram of cancers in human
gene 1
gene n
gene expressed
Reference: Botstein & Brown group
gene not expressed
65