Friday, 1 August 2014

What's Up With MST Fast Implementations?!

In the last post we looked at Prim's and Kruskal's Algorithms and proved that they were correct i.e. that they produced minimum spanning trees (MSTs). In this post we'll look at their running times and consider if there are ways to speed them up (spoiler alert: there is, and it's beautiful).

Prim's:

In order to know how to go faster than our current implementation of Prim's Algorithm, we need to know how fast our implementation is! Let's look at the pseudocode again (this time using the set notation where V is the set of vertices, X is the set of vertices in our MST so far, V-X is the set of vertices not in the MST yet, and T is the set of edges in our MST so far)

1 Pick some vertex and add it to X
2 While not all vertices are in X:
3     If there is an edge crossing the cut (X,V-X) of minimum cost:
4         Add the edge to T
5         Move its incident vertex in V-X from V-X to X
6 return the graph (X,T)
     
What would be the asymptotic running time of this algorithm?

at line 1 we just need to pick some vertex so that can be done in constant, O(1), time. As for the while loop, you're going to be adding n-1 vertices to X because there are n vertices and you added the first in step 1, meaning there are O(n-1) = O(n) loops. For each loop you need to go work out whether an edge is crossing the cut, meaning you need to look at each edge and determine if one endpoint is in X and the other is in V-X (the vertices could store their set as a boolean value). This means we'll need to scan all m edges with each loop, giving us O(m) time for each loop (lines 4,5 are both constant time so we don't need to worry about those). So with O(n) loops each with O(m) time complexity we have an overall time complexity of O(nm).

Considering that m >=  n - 1 (as shown in the previous post on MSTs) our best case scenario is O(n^2), which is reason enough to ask 'can we do better?'

So with each loop we're looking for the minimum edge crossing the cut, and a data structure whose life is all about repeated minimum computations is the heap. Recall from the post on heaps that heaps can have nodes inserted and deleted in logarithmic time, and also perform the extract-min operation in logarithmic time. Also, building the heap from scratch can be done in linear time (if you do it right).

A good idea would be to give each edge a node in the heap so that the root is always the cheapest edge. This is good for looking at the cheapest edges first but the odds of the graph's cheapest edge crossing the cut (X,V-X) at some point in the algorithm is fairly low. A better idea is to use vertices, with the key of each node corresponding to the vertex's edge cost for its edge crossing the cut (meaning if it doesn't have an edge crossing the cut it's key will be set to inf.). This means we'll be changing the key values of the nodes a lot which will require a lot more reassembly of the heap than if we used edges, but it also means that the root of the heap will always give you the vertex you want to add to X (and the corresponding edge) meaning you can just perform a series of extract-mins and you'll end up with the MST.

So with our heap we want to maintain two invariants:
1) the elements of the heap are vertices in V-X
2) for v in V-X, key[v] = cheapest edge (u,v) with u in X, where if no such edge exists, key[v] = inf.

What's the time complexity of building this heap in the first place? Well first you need to find out what the key of each vertex is, which requires you to check every edge once. You go from edge to edge asking 'is this edge crossing the cut?' and if so, if the edge cost is lower than the incident vertex's current key then the vertex's key becomes the cost of the edge. So that's O(m) time. The next step is simply building the heap using the keys which can be done in linear time so we end up with a total of O(m+n) speed.

So if we just went and did n extract-mins to make our spanning tree, that would be O(nlogn) time, but we need to do more than that because after extracting a vertex, and putting it in X, the cut changes and vertices which previously had a key of inf. i.e. no edge crossing the cut, may now need to have their key re-evaluated. Luckily the only vertices we ever need to consider after adding a vertex v to X are the ones incident to v which are in V-X (we'll call such a vertex w), as their edges are going to end up crossing the cut. For a vertex w we simply delete their corresponding key from the heap (this will require us to keep track of where a vertex's key is on a heap, which can just be stored with the vertex itself) then recompute w's key i.e. the cost of the cheapest edge incident to w which crosses the cut, then we insert w's new key back into the heap.

How many times will we be doing this? Each edge can only be responsible for at most one delete-insert combo, as one of its incident vertices will get sucked into X and then with the incident vertex still in V-X we just see whether this edge cost is lower than the vertex's current key and if so, we delete the vertex's key, assign the edge cost to the vertex's key and re-insert it. As deletions and insertions both take O(logn) time and we can have at most m of these operations, we'll end up with O(m*2logn) = O(mlogn) time in terms of maintaining invariant #2 across the life of the algorithm.

To recap:
1) initialising the keys and building the heap costs O(m+n)
2) extract-mins cost O(logn) and there are n-1 of them to do giving us O(nlogn) for that
3) maintaining invariant #2 costs O(mlogn) time

So our total time complexity is O(m + n + nlogn + mlogn) and as m >= n-1 we can always use m as an upperbound for n meaning our mlogn term is the 'strongest', leaving us with a final time complexity of O(mlogn), which is far better than the naive implementation running time of O(mn).

Moving onto Kruskal's:

What's the time complexity of the naive implementation of Kruskal's? Well considering that we go through each edge from cheapest to most expensive, we must first sort them by cost, which can be done via a quicksort in O(mlogm) time. As for each iteration of the algorithm's while loop, if the edge doesn't produce a cycle then we add it, and the definition of a cycle is where you've got two paths between u and v which don't share any edges. If placing an edge which is incident to u and v produces a cycle, then there must already be a path between u and v. In order to see whether such a path exists we can just do a simple graph search in linear time (I'll leave the implentation of that to another post), and considering we may consider every edge at some point then our time complexity for this is O(mn). That leaves a total running time of O(mlogm + mn) and working out which is the strongest term is half-simple: m is bounded by n^2 because if you consider the adjacency matrix you can have a max of n(n+1)/2 edges. You can therefore replace logm with log(n^2) which is 2logn and as they differ by a constant they are interchangeable, so our running time ends up being O(mlogn + mn) and obviously n is stronger than logn so our strongest term is the mn thus we're left with O(mn).

What would really speed this algorithm up would be a data structure that let us remember whether two nodes have a path between eachother i.e. they're in the same cluster. How about we make a data structure called a union-find which works like this: each cluster has a 'leader' vertex which acts as an identifier for that cluster, and all vertices store the index of their cluster's leader (or SOMETHING to identify it). This means that to check whether two vertices are in the same cluster we can just ask whether two vertices have the same leader, meaning this step can be done in constant time. This is called the 'Find' operation so it's no surprise that the second operation of this data structure is the 'Union' operation. This is where you get two clusters and fuse them together, which is what happens in Kruskal's when you add an edge that bridges two clusters together. If we perform the union operation on clusters A and B where A has fewer vertices (we can store the size of the clusters in a hash table or something), then we'd want to set the leader of all the vertices in A to the leader of B, because it will require less changes. Now we need to ask how many times can this happen in a single run of the algorithm?

Let's consider a single vertex v. Say in the first iteration it's lucky enough to form a cluster with another vertex u which is incident to the first edge added to the MST. Because each vertex started off as its own leader (as they were in clusters of size 1) now we just randomly choose which one becomes the new leader, and maybe that leader is u, so v switches its leader to u (meaning 1 switch so far). We're trying to find the worst case scenario for our vertex in terms of number of switches so if the next edge just connects another vertex to our cluster of two, v's leader won't change because it belongs to the larger cluster. But if instead the second edge gets added to a completely different place, forming another cluster of two, and then those two clusters need to be combined in the next iteration, then worst case says our vertex v will need to swtich its leader again (2 switches now). Notice that each switch means the cluster has at least doubled in size, and starting a cluster of size 1, the cluster can only double in size log2(n) times, meaning worst case scenario is that a vertex switches its leader O(logn) times. Considering there are n vertices this means we'll never have anymore than O(nlogn) switches.

So combining our initial sorting O(mlogm) and our cycle-checks (which includes the total O(nlogn) leader swtiches and finding whether two vertices are in the same cluster which is O(1) and happens O(m) times) we end up with a time complexity of O(mlogm + nlogn + m) where the strongest term is mlogm so we end up with O(mlogm) timecomplexity, meaning now our bottleneck is the sorting that occurs at the beginning of the algorithm.

This is the same running time we got with the speedy version of Prim's as O(mlogm) and O(mlogn) are, as explained above, interchangeable.

NOT BAD. So not-bad infact that I had a tear in my eye when I found out why the faster implementation of Prim's Algorithm with the O(mlogn) factor worked, and similarly the way that the union-find data structure for Kruskal's gives each vertex a max of log2(n) leader changes is awesome.

Sorry I couldn't snaz this up with pictures, but I have a lot to get through in a very short period of time. Next up will be the storage of signed integers, and maybe some stuff on floating point values, then I'll dive headfirst (with little regard for my mental health) into evolutionary dynamics.

No comments:

Post a Comment