Sunday, 11 May 2014

What's Up With Dijkstra's Algorithm?!

What's Up With Walter White's Dijkstra's Algorithm?!

YEAH MR DIJKSTRA, YEAH COMPUTER SCIENCE!

Alrighty so we've nailed the terminology of graphs and we've touched on how edges can have weights which represent things such as lengths. Well now it is time to make an algorithm which will takes the input of 2 vertices A and B and tells you the path that will get you from A to B.

Here's the graph we'll be using to work out this problem:

We want to get from vertex 0 to vertex 4 via the shortest possible path. One thing we want to be able to do is work out the distance that each vertex has from the starting vertex, but as there are multiple paths to take there's a high chance we'll start off thinking the 'tentative' distance to a vertex is X and find out there's another distance Y (via a different path) where Y<X and in that case we'd want to replace X with Y. So for example you could start off at 0 (whose distance from itself is 0) and look at its neighbours and think "the distance to 1 is 14 and the distance to 6 is 8". You want to store this in the vertices so how about making a list called dist[] with 7 elements (the number of vertices) and the index refers to the vertex e.g. I'll set dist[1] to 14 and dist[6] to 8. It might also be useful to know the previous vertex that 1 and 6 are connected to, so that by the time we reach the end vertex we can trace a path backwards to 0. So we'll make another list called prev[] and we'll make prev[6]=0 and prev[1]=0.

After this is done, there's no reason to ever need to refer back to 0 because we've considered all of its neighbours and we know for sure what its tentative distance is (0 by definition). Hence we say this vertex has been 'visited' and it would be good to keep track of what vertexes have been visited, so we'll make another list called visited[] (this one can be boolean) where each vertex starts off 'false' (i.e. unvisited) and after we've analysed all of its neighbours we consider the vertex to be visited and set the value to 'true'. Note the terminology I'm using here: I'm saying that if a vertex has been visited it's own tentative distance has already been worked out through being analysed by previous vertexes and all of its neighbours have been analysed by it. If a vertex is being analysed, I mean the algorithm is deciding whether the distance from the vertex being visited to the analysed vertex can give the analysed vertex a shorter tentative distance (and if so, changing dist[] and prev[] to match the new shorter path).

The process that happens when you consider the neighbours of a vertex is you look through the set of edges E which is in the form {u,v,w} where u and v are the connected vertices and w is the weight or 'length' of the edge. You check to see if the vertex you;re visiting shows up in any of the edges by searching through the first two columns (where u and v are). For each edge which does have your vertex in it, the next thing you want to do is determine whether the other vertex has already been visited or not because if it has, it's already given the current vertex a tentative distance and its own tentative distance has already been determined, so it doesn't need to be considered.

dist[], prev[] and visited[] are the only three lists we'll need for this algorithm so don't worry this isn't going to get out of hand.

Now in deciding which vertex to visit next (we can choose between 1 and 6) it's a good idea to pick the one with the lowest tentative distance, which seems intuitive considering we're looking for a short path to the end vertex, but I'll give you the real reason in a bit. So we'll move to 6. 6 has three neighbours but we've already considered its link with 0 so let's now just look at 1 and 5. The tentative distance of 6 plus the distance between 6 and 1 along their link is 8+2 = 10. 10 is less than 14, the current tentative distance of 1 so we've found a faster path to it. The reason this is important in terms of finding the shortest path to the destination vertex is that if that path happens to involve vertex 1, we'd want to ensure that getting from 0 to 1 is done in the shortest path possible too. So, knowing 10<14 we'll set 10 as the new tentative distance of 1 i.e. dist[1]=10 and as now you'll be going from 6 to 1 on the path to the destination vertex, we'll set 6 as the new previous vertex of 1 i.e. prev[1]=6.

Now we can look at another of 6's neighbours: vertex 5. It doesn't have a tentative distance yet but the idea is to replace higher distances with lower ones so how about when this algorithm starts we set the tentative distance of every vertex (except the start vertex) to infinity. This means as soon as you find a distance to it, that distance will be less than infinity and therefore it'll replace the infinity. There's also an added benefit in that if you look at the list of unvisited vertices and all of their distances are infinity, it means there's clearly no way to reach them as they haven't been considered neighbours (and thus given a distance) by any of the visited vertices, so this is good for stopping the algorithm and saying 'the destination is disconnected from the starting vertex'. Anyway for now, we know that adding dist[6] and the distance between 6 and 5 along their link gives us 8+4 = 12 and 12<infinity so we'll set dist[5]=12 and we'll set prev[5]=6. Now that we've analysed all of 6's neighbours, we'll set it to visited meaning visited[6]=true. And then we move on to the next unvisited vertex with the shortest tentative distance.

But before we do that, one question you may have right now is what happens when the vertex being analysed can actually provide a shorter distance for the vertex currently being visited. The answer is that due to only visiting vertices with the shortest tentative distance, we ensure this doesn't happen. For example, just looking at vertices 0, 6 and 1, we visited 6 and not 1 because we knew that no matter how short the link was between 6 and 1, dist[6]<dist[1] meaning dist[1]+distancebetween(1,6) will also be greater than dist[6] so there's no chance dist[6] will change because only smaller distances will change it. By the way the distancebetween function is going to be based on retreiving the weight from the edge that has 1 and 6 as its two vertices. So now we know the real reason for only visiting vertices of the shortest tentative distance; to ensure you never need to change the tentative distance of the vertex you're visiting and the distance changing only occurs one way. The only time this would change is if edges could be negative, and I speculate that in that case you'd only visit vertices once all of their edges had been analysed by other visited vertices.

Let's look at where our graph is at now. Red = dist[vertex], Blue = prev[vertex]. I set prev[0] to -1 so that when you're tracing back through the path at the end of the algorithm you can say 'stop when the prev[currentvertex]=-1' but you could also say 'stop when the currentvertex is the startingvertex'.

We know what to do now, choose the next unvisited vertex with the lowest distance to visit. looks like that's 1. So after handling that we'll have:



Now despite having analysed 4, our destination vertex, we don't know if we know the shortest path to it. Going 0,1,2,7,4 looks like it's going to have a better distance than the current distance of 32. So we simply continue the algorithm until we are able to visit 4 (i.e. when 4 is the unvisited vertex with the lowest distance).

At this point we can say we know the shortest distance to 4, however if there were more vertices left that hadn't been visited, we know that they couldn't provide a shorter distance to 4 or else they'd have already been visited due to our process of only visiting the vertices with the shortest tentative distance. So now we know the destination vertex is next in line to be visited so the algorithm of calculating distances is done. All that's left is tracing back the path and it's convenient that we have all those blue numbers telling us from which vertices all the shortest tentative distances come from. So from 4, we go to 3 and then to 2 and then to 1 and then to 6 and then to 0. This could be set up as a stack where you just do this:

currentvertex = endvertex // which is 4
do
     push to stack currentvertex
     currentvertex = prev[currentvertex]
until prev[currentvertex] = -1

giving you a stack that looks like:

So that's what's up with Dijkstra's Algorithm. Note that in conventionally a vertex is considered 'visited' as soon as you start visiting it rather than afterwards which lets you say 'end the loop once the end vertex is visited' (meaning you won't actually analyse any remaining neighbours) but it makes more sense to me to use the past tense of visited in regards to actually having moved on to another vertex. I'll handle the complexity of the algorithm after I do a post on big O notation and I may look into some more efficient alterations to the original algorithm. Until then see you next time, and I apologise for not taking every opportunity to make a breaking bad joke, bitch.


No comments:

Post a Comment