[Go to graphs; shortest path problems]
Minimum-Cost Spanning Tree (MST)
Definition: Given connected graph G with positive edge weights, find a minimum weight subset of edges that connects all of the vertices.
A spanning tree of a graph G is a subgraph T that is connected and acyclic.
MST of G is always a spanning tree.
The subset of edges (1) has minimum total cost as measured by summing the weights of all of the edges in the subset, and (2) keeps the vertices connected.
Applications
- Construction of electrical power network.
- Otakar Boruvka (1926)
- Electrical Power Company of Western Moravia in Brno (part of the Czech Republic); most economical construction of electrical power network.
- Concrete engineering problem is now a cornerstone problem in combinatorial optimization.
- Network design (telephone, electrical, hydraulic, TV cable, computer, road)
e.g., Connecting a set of cities by telephone lines in such a way as to require the least amount of cable. - Cluster analysis
- MST in Molecular Epidemiology (and caveats)(see a paper)
- (For bored!) Approximation algorithms for NP-hard problems such as the Traveling Salesperson Problem:
TSP, the traveling salesperson problem: "A salesperson is required to visit each of the $n$ given cities once and only once, starting from any city and returning to the original place of departure. What route, or tour, should he choose in order to minimize the total distance traveled?" (i.e., the goal is to find a Hamiltonian cycle with minimum cost)
MST heuristic (TSP-MST) is a 2-approximation algorithm
(an interesting movie explored the traveling salesperson problem and possible implications of solving the P=NP? problem)
More definitions: connectivity & cut
- A graph is said to be connected if every pair of vertices in the graph is connected.
- A graph $G$ is said to be $k$-connected (or $k$-vertex connected, or $k$-point connected) if there does not exist a set of $k-1$ vertices whose removal disconnects the graph, i.e., the vertex connectivity of $G$ is $\geq k$.
- A graph G = (V,E) can be partitioned into two disjoint sets, A and B ($A\cup B = V$, $A\cap B = \{\}$), by simply removing edges connecting the two parts. $cut(A, B) = \sum_{u \in A, v \in B} w(u, v)$.
Prim's algorithm
- Start with ANY vertex s and greedily grow a tree T from s. At each step, add the edge with smallest weight (the cheapest edge) that connects one node in the tree and one node not yet in the tree.
- Very similar to Dijkstra's algorithm: at each step when a
vertex is processed, Dijkstra's algorithm seeks the next closest vertex
to the start vertex, while Prim's algorithm seeks for the next clostest
vertex to ANY vertex currently in the MST. (MinSpanTree.java)
- Prim's algorithm is a greedy algorithm; does it work correctly?
- Cycle property: Let $C$ be any cycle in $G$, and let $f$ be the
max cost edge belonging to C. Then the MST $T^*$ does not contain $f$.
(Simplifying assumption: all edge weights are distinct.)
Proof by contradiction:
- Suppose $f$ belongs to $T^*$.
- Deleting $f$ from $T^*$ disconnects T*. Let $S$ be one side of the cut.
- Some other edge in $C$, say $e$, has exactly one endpoint in $S$.
- $T = T^* \cup \{e\} - \{f\}$ is also a spanning tree
- Since $c_e < c_f$, $cost(T) < cost(T^*)$, this is a contradition.
- Cut property: Let $S$ be any subset of vertices, and let $e$ be
the min cost edge with exactly one endpoint in S. Then the MST $T^*$
contains $e$. (Simplifying assumption: all edge weights are distinct.)
Proof by contradiction:
- Suppose $e$ does not belong to $T^*$.
- Adding $e$ to $T^*$ creates a (unique) cycle $C$ in $T^*$.
- Some other edge in $C$, say $f$, has exactly one endpoint in $S$.
- $T = T^* \cup \{e\} - \{f\}$ is also a spanning tree.
- Since $c_e < c_f$, $cost(T) < cost(T^*)$; this is a contradiction.
- Prim's algorithm computes the MST.
Proof:- Let $S$ be the subset of vertices in current tree $T$.
- Prim adds the cheapest edge $e$ with exactly one endpoint in $S$.
- Cut property asserts that $e$ is in the MST.
- The MST satisfies the optimal substructure property, i.e.:
"an optimal (minimum spanning) tree is composed of optimal (minimum spanning) subtrees".
- Let T be an MST of G, and $e$ an edge (u,v) in T.
- Removing $e$ (u,v) will partition T into two trees: T1 and T2.
- Claim: T1 is an MST of G1 = (V1,E1) and T2 is an MST of G2 = (V2, E2).
(Do V1 and V2 share vertices? why?)
- Proof by "cut and paste":
w(T) = w(u, v) + w(T1) + w(T2)
there cannot be a better tree than T1 or T2,
otherwise, using cut and paste,
a spanning tree T' could be created with smaller total weight than T.
The greedy choice property, i.e. "a locally optimal choice is globally optimal" is the distinctive feature of greedy algorithms.
-
Kruskal's Algorithm
Consider edges in ascending order of weight (cost). Add the next edge to T unless doing so would create a cycle (how to check?).
The edges can be processed in order of weight by using a min-heap, which is generally faster than sorting the edges first, because in practice we need only visit a small fraction of the edges before completing the MST.
More details: the algorithm first partitions the set of vertices into |V| subsets, each containing one vertex; then iteratively processes the edges in order of weight, adding an edge to the MST and merge their corresponding subsets into one (skipping the edge that connects two vertices in the same subset), until only one subset remains. Checking if two nodes are in different subsets and merging two subsets can be done efficiently using "parent pointer implementation" of trees (each subset is a tree) (check Shaffer 6.2) Read more on Union-Find data structure.
In this graph (Figure 11.20), edge D->F was skipped; adding it would otherwise create a cycleReverse-delete algorithm
The reverse-delete algorithm is the reverse of Kruskal's algorithm: start with the original graph, then delete edges.Application: Clustering of Maximum Spacing (slides)
Intuition: Greedily cluster objects in increasing order of distance; stop when there are k clusters
- Initialization: Let $C$ be a set of $n$ clusters, with each object in its own cluster.
- Process pairs of objects in increasing order of distance.
Let (p, q) be the next pair with $p \in C_p$ and $q \in C_q$.
If $C_p != C_q$m add new cluster $C_p \cup C_q$ to $C$, delete $C_p$ and $C_q$ from C.
- Stop when there are $k$ clusters in C.
- Same as Kruskal's algorithm but don't add last $k-1$ edges in MST.
Functional brain network analysis using minimum spanning trees in Multiple Sclerosis.
see the paper; how it works; the algorithm used to infer MST: Krusal's algorithm