What's Up With Computer Science?!: August 2014

Wednesday, 6 August 2014

What's Up With Evolutionary Dynamics?!

This post is going to introduce some stuff from the incredible field of evolutionary dynamics. Evo-dynamics is about applying game theory to biology and evolution via mathematically models. It's been used to explain things like cooperation in biology and even concepts like altruism. We'll start off by looking at the Prisoner's Dilemma:

The Prisoner's Dilemma (PD) shows up in both game theory and evolutionary dynamics as a game which motivates the questions of 'when is cooperation a good idea' and 'why is cooperation sometimes selected for'. The idea is that you have two separated prisoners involved in the same crime who are each given two choices:

1) confess to the crime, thus defecting against your friend
2) keep your mouth shut, thus cooperating with your friend

There are three possible outcomes:

1) if both prisoners cooperate (stay quiet), they both get a year in prison
2) if both prisoners defect (confess), they both get 7 years in prison
3) if one prisoner cooperates whilst the other defects, the cooperative prisoner gets 10 years and the defector gets no years

This can be described in payoff-matrix form where the row = your choice and the column = their choice:

So which choice is rational?

Let's say they cooperate (remain silent). If you defect you go free, whereas if you cooperated you'd get a year in prison
Let's say they defect. If you defect you only get 7 years in prison, whereas if you cooperated you'd get 10.

No matter what the other prisoner chooses, it is rational for you to defect in order to minimise your sentence. When there is a set of strategies that can be played with no player having any good reason to switch strategies, it's called a Nash Equilibrium. So in this case, defect-defect is a Nash Equilibrium because either player could only do worse by switching to cooperation.

What would happen if we applied a similar game to a population whose fitness is dependent on how the outcome of the game.

So each individual in our population is hardwired to either cooperate or defect every time they come across another individual. Let's handle some terminology: we've got an infinite population so we can describe the abundance of one 'species' (strategy) as a frequency from [0,1]. By the way when I say species, it could be referring to two members of the same species with different strategies. In this example we'll say x is the frequency of C (cooperators) and 1-x is the frequency of D (defectors).

The fitness of a species is an indication of how fast its frequency increases, and it's given by:

which reads 'for a species 'i', the fitness of 'i' is the sum of its payoffs against every species (including itself) multiplied by the frequency of that species' so the vector x is the set of the species, 'a' is the payoff matrix and x_j is the frequency of a given species. So to calculate it for our species C and D, the fitness of C is 3*(x) + 0*(1-x) = 3x = 3x and the fitness of D is 5*(x) + 1*(1-x) = 4x + 1. How do these fitnesses affect reproduction (i.e. changes in frequency)? We can answer that with the replicator equation:

which can be read as 'the rate at which a species frequency changes with respect to time is given by that species frequency times the difference between its fitness and the average fitness across all species'. So phi(x) represents the average fitness. This is intuitive due to the species increasing its frequency if it has an above-average fitness. So if the fitness of a species is equal to the average, its frequency won't change, and if its fitness is greater than the average, its frequency will rise, and otherwise it will fall.

Considering the replicator equation we can see what happens when we have a population of C's and D's. If we start with a 50/50 split in the population, the average fitness is 3x/2 + (4x+1)/2 = (7x+1)/2. Clearly the D's have a fitness above the average so they will be increasing their frequency and the C's will thus tend to a frequency of 0. At that point, when x = 0, the population will be in equilibrium with the replication rates all being 0 (because there's no more C's). BUT what if we started off with x = 1 i.e. there are no D's? This means now the average fitness is just the fitness of C's which is 3x and so the population won't change meaning this is another equilibrium point. The difference between the two equilibrium points is that the first is a 'stable' equilibrium, meaning small deviations from it will be corrected as the system naturally tends back towards the equilibrium point, whereas the second is not a stable equilibrium, as for example if you were to add a minuscule amount of D's to the mix, their superior strategy against cooperators would have them easily invade the population and move the frequency of C's down from 1.

The things I'm describing here can be represented by a simplex which is the set of all points which describe the frequencies of each species in a population. If we were only considering a population of one species, the simplex would be 0-dimensional i.e. it would be a single point. Because we're dealing with two species, we have a line:

The right of the simplex represents when x = 1 (i.e. there are only C's) and the left represents when x = 0 (i.e. there are only D's). a closed circle represents a stable equilibrium and an open circle represents an unstable equilibrium. The arrow says that if you lie on the arrow then the frequency of C's moves in that direction (in this case, towards 0).

Considering the only thing which effects this simplex is the payoff matrix, let's take a payoff matrix with payoffs a,b,c and d and see what kind of simplex we get depending on their relative values, with the species A and B (this time x = frequency of A's):

To recap on what this means, if two A's meet then they'll both get a payoff of a. If an A meets a B, it gets a payoff of b. If a B meets an A it gets a payoff of C and if two B's meet they both get a payoff of d. The fitness of A and B is their average payoff.

So the fitness of A = ax + b(1-x) and the fitness of B = cx + d(1-x)

the average fitness is:
x(ax + b(1 - x)) + (1 - x)(cx + d(1 - x))

and the replicator equation for A is:
x((ax+b(1-x))-x(ax + b(1 - x)) + (1 - x)(cx + d(1 - x)))
= -x(x-1)(x(a - b + c - d) + b + d)

And yes, I used a calculator for that.

We can now consider what happens for various relative values of a,b,c and d.

So let's say a > c and b > d. This means that no matter what A will dominate over B because it always gets a higher payoff and thus has a higher fitness than average while there's still B's floating around.

If it's the other way around and a < c and b < d then B will dominate over A.

If a > c but b < d then we can consider three situations:
1) there aren't many A's to start with so despite A's getting a better payoff against other A's than B's, there aren't enough A's for that to make a difference as the majority of games will occur between A's and B's in which case the B's destroy the A's, meaning the A's go extinct.
2) there aren't many B's to start off with, and the exact same logic applies: the B's will go extinct
3) there is a sweet spot where there are just enough A's and just enough B's for them to be in equilibrium, albeit an unstable one. To work out where this is you can solve A's replicator equation for when the rate of its changing frequency is 0 and you end up with x = (d-b)(a-b-c+d).

if a < c and b > d then each species gets a better payoff when interacting with the other species than interacting with their own species. This means we get a trend towards the middle of the simplex at a stable equilibrium. The position of that equilibrium is x = (d-b)/(a-b-c+d). The multiplication from above becomes a division, WHAT A TIME TO BE ALIVE (that's not sarcasm).

Finally, if a = c and b = d then it's impossible for selection to discriminate against either species because they are identical in their payoffs. This would give you a neutral simplex where no frequencies change but this is actually not accurate in terms of finite populations as neutral drift is enough to eventually wipe out one or the other species.

Here's a pictorial representation:

So we've looked at populations with two species. Before we go into more depth in the prisoners dilemma, let's consider what the implications of having three species are. As 1 species in a population gives us a 0D simplex i.e. a point, and then we get a line, now we get a triangle (the next step would be a triangular pyramid and then the dimensions go beyond what can easily be visualised). Let's consider the game of scissors paper rock (S,P and R) which has the quality that S dominates P, P dominates R and R dominates S. Here's the payoff matrix A:

What would happen if we had all three frequencies initialised to 1/3? There's no good reason why any of them would win over any others (i.e. their fitnesses will all be equal) so that is an equilibrium point. Is it a stable one? It turns out to be a neutral equilibrium point surrounded by neutral oscillations which go back to the edges but never touch them. This means if we had x_S (frequency of S) = 1/5 and x_P = 1/5 and x_R = 2/5, we'd see the point on the simplex orbit around the equilibrium point, passing the points (1/5,2/5,1/5) and (2/5,1/5,1/5). If we generalised the payoff matrix to be:

Then we can work out what the simplex will look like. If the determinant is =0 then we have the neutral oscillation situations from the traditional SPR game. If it's <0 then we've still got the interior equilibrium point (whether it's actually in the centre depends on the matrix) but it's unstable and all other points spiral out towards the edges asymptotically so the boundary of the simplex is an attractor (called the heteroclinic cycle). If the determinant is >0 then we have the spiral but this this time the interior equilibrium point is the attractor (so it's stable).

Note that although the oscillations can get asymptotically close to the edges, a rounding error can make a species completely extinct and then you'll end up with the remaining dominating species taking over completely. Not to mention that in finite populations there are no asymptotic protections against a species going extinct.

Now that we have some intuition behind how a population works with three species, let's go back to the prisoners dilemma. The two strategies we had were Defect and Cooperate, but these alone don't tell us much about how species behave. What would be more accurate is if two individuals interacting with eachother played the game on average m times. Now the number of strategies really open up. The multi-game analogue of the single-game defect strategy would be the always-defect strategy (ALLD). Likewise we have the always-cooperate strategy (ALLC). The third strategy we're going to look at it tit-for-tat (TFT). In TFT, your first move is to cooperate and after that you just do what your opponent did in the previous round. This time we're going to use a generalised payoff matrix for a single round with the variables T,R,P and S where T>R>P>S. You can remember them as Temptation to defect, Reward for cooperating, Punishment for mutual defection and Sucker's payoff for being the one who cooperates when the opponent defects.

Let's consider how these strategies to eachother:

ALLD vs ALLC: ALLD wins because in each round it gets the highest payoff T, and ALLC gets the lowest payoff S.

ALLC vs TFT: both get the R payoff, which is a big step-up for ALLC but neither strategy walks away with as much of a payoff as ALLD when it's up against ALLC

ALLD vs TFT: TFT starts off cooperating and gets the Sucker's payoff whilst ALLD gets the Temptation payoff but after that TFT just defects for the rest of the m-1 round so TFT walks away with S + P(m-1) and ALLD walks away with a slightly higher payoff T + P(m-1).

ALLC vs ALLC: both get a pretty good R payoff each round.

TFT vs TFT: same as above

ALLD vs ALLD: measly payoff of P each round.

We can make this more interesting (and more accurate) by giving TFT a complexity cost 'c' which is subtracted from its total payoff in each iteration. This means ALLC will now beat it and ALLC-ALLC interactions yield higher payoffs than TFT-TFT interactions.

as you can see, ALLD always wins against its opponent. HOWEVER, two ALLDs versing each other get a far lower payoff than interactions between TFTs and ALLCs. What does this mean? If we start with (1/3,1/3,1/3) frequencies then ALLD is going to dominate but as its frequency increases, its fitness decreases because there are less TFTs and ALLCs to fight. Meanwhile the TFTs fitness increases because TFT is only slightly outplayed by ALLD but gets high payoffs with all other interactions. So we'll see a growth of TFT and a shrink of ALLD. But now with less ALLDs floating around, ALLC can once again rise as it fairs better than TFT in a population low in ALLDs. But then we get back to the beginning where ALLDs can exploit the high numbers of ALLCs.

So we know we're going to get oscillations, but will they be neutral or inward/outward spirals? If we follow the convention of saying T = 5, R = 3,P = 1, and S = 0.1, m = 10 and c = 0.8, then we get a matrix like this:

I've looked all over for a picture of a simplex demonstrating this payoff matrix under the replicator equation but I can't find one. However, Imhof, Fudenberg and Nowak's 2005 study on the prisoner's dilemma has some simplexes under the replicator-mutator equation, which assigns a value u to rate at which one species mutates to another (I'm assuming there's an equal chance of mutating from one species to another proportional to fitness). Here's the replicator-mutator equation:
$\dot{x_i} = \sum_{j=1}^{n}{x_j f_j(x) Q_{ji}} - \phi(x)x_i,$
where Q is the matrix describing the probabilities of species j mutating to species i.

And here's some simplexes under this equation:

As you can see, this alone produces some crazy stuff. With very low mutation (u = 10^-8) we converge to the conventional replicator equation, and we can see that ALLD wins out in the end. Obviously with more mutation, the selection isn't as cut-throat so we see with u=10^-2 that there is a convergence to a stable equilibrium which has a high frequency of TFT. This is an interesting change which comes about through merely increasing the rate of mutation. Nonetheless the deterministic approach to the repeated prisoners dilemma overwhelmingly favours ALLD, which is intuitive due to ALLD-ALLD being the only nash equilibrium in the game. But is this what we see in finite populations?

The answer is no. In finite populations what we see is oscillations which have a time-average near the TFT vertex (i.e. most of the time is spent around there). Why is this the case? Good question. Finding out requires an understanding of how evolutionary dynamics works with finite populations, which has a whole lot of different processes (e.g. the Moran Process) and stochastic equations, but as this post has gotten long enough I'll leave you with the cliffhanger that finite populations and infinite populations can give us drastically differing results when simulating the same game.

Special thanks to Martin Nowak whose textbook on evolutionary dynamics provided a few of the images in this post a lot of insight into how evo-dynamics works.

Saturday, 2 August 2014

What's Up With Signed Integers?!

In an earlier post we talked about how binary numbers worked (in fact we talked about how all bases work) for positive numbers with both integer and fractional parts. In this post we're going to talk about how negative integers are represented in binary.

Under status quo, let's say we have a computer which, due to a lack of foresight into memory allocation requirements, only holds 4 bits of memory. If we were to consider those bits to represent an unsigned number, here's what we would interpret the states as:

0000 = 0 0001 = 1 0010 = 2 0011 = 3
0100 = 4 0101 = 5 0110 = 6 0111 = 7
1000 = 8 1001 = 9 1010 = 10 1011 = 11
1100 = 12 1101 = 13 1110 = 14 1111 = 15

0,1,2,3,4,5,6,7,8,9,10,11,12,12,14,15

so we have 16 numbers, 0-15, so no surprises so far.

If we wanted to have as many negative numbers represented by our 4 bits as positive numbers, then we're going to end up with half as many positive numbers that we can represent. One possible way of going about this (which was my first impression) would be to reserve one bit for the sign of the number. Let's say we take the left-most (most significant) bit and say that if it's 0, we're dealing with a positive number and if it's a 1 we're dealing with a negative number. How will our representations change?

0000 = 0 0001 = 1 0010 = 2 0011 = 3
0100 = 4 0101 = 5 0110 = 6 0111 = 7
1000 = -0 1001 = -1 1010 = -2 1011 = -3
1100 = -4 1101 = -5 1110 = -6 1111 = -7

0,1,2,3,4,5,6,7,-0,-1,-2,-3,-4,-5,-6,-7

Not bad but a few problems arise:

Firstly we have two ways of representing 0. Not only is this a waste of a state (namely the binary number 1000) but computers very very often need to compare numbers to 0 so if there are two ways to represent 0 in binary then every time a computer wants to check whether a number is 0, it will need to check if it's +0 or -0 and the extra comparison is costly when we're dealing with processors that can do these kinds of things millions of times in a second.

Secondly, in terms of arithmetic, the computer needs to read the sign bit of both numbers if it wants to work out whether it needs to perform addition or subtraction on them.

Thirdly, it would be nice to have the numbers be in order like -7,-6,-5...0,1,2,3...6,7 so if you're incrementing or decrementing the number you don't need to worry about shifting gears when you get to 0.

Before I introduce the next method for representing signed integers, it's a good time to mention that these quarrels I have with the above method (called the sign-and-magnitude method) are not necessarily going to be solved by some super-method, and the method that computers actually use today (which will be later on in the post) was and still is subject to a lot of heated debate in light of the other methods.

The next method which is intuitive to me is the 'Excess K' method where you take your regular unsigned numbers and just give them an offset of K. For example we could choose K to be 8 meaning that each of our numbers will be translated 8 units to the left:

0000 = -8 0001 = -7 0010 = -6 0011 = -5
0100 = -4 0101 = -3 0110 = -2 0111 = -1
1000 = 0 1001 = 1 1010 = 2 1011 = 3
1100 = 4 1101 = 5 1110 = 6 1111 = 7

-8,-7,-6,-5,-4,-3,-2,-1,0,1,2,3,4,5,6,7

Note that you could have chosen your K to be 7 and you'd have a range of -7-8 which is just as valid.

Well we fixed quarrel #3 with the sign-and-magnitude method, an increase/decrease of 1 in the binary representation corresponds to an increase/decrease of 1 in the signed number itself. We also now only have a single 0. The only problem here, and it is a big problem, is arithmetic. If you want to add 1 and -5 then you're dealing with 1001 and 0011. This means you're dealing with (1+8) + (-5 + 8) = 12 and then you need to cancel out the excess 8's by subtracting 2*8 which gives you -4 (at least it gives the correct answer). Well actually if you used that process you'd then need to add another 8 afterwards to give -4 its binary representation so we may as well only ever subtract off one 8 instead of 2*8 after we do additions.

The other methods are the complement methods. What is a complement? Its easier to think about in terms of decimals:

For base 10 we can look at the 10's complement or the 9's complement of any number. If I have the number 531, the 10's complement of that is simply 1000 - 531 = 469. Why would we ever want to use this? Let's say I want to subtract 531 from 812. I suck at subtraction so it would be easier to just add 812 to -531 which by the above equation is (469 - 1000) i.e. add 469 and 812, and then subtract off 1000 when I'm done to get the answer to the question of what 812 - 531 is. 469 + 812 = 1281 meaning our answer is 1281-1000 = 281. To generalise, the 10's complement of a number n with d digits is 10^d - n (but if n=0 the complement is 0 which is reasonable when you consider that the number of digits is irrelevant to representing 0). 10's complements are good because removing the 10^d from the result of the addition at the end is easy, although actually calculating the complement in the first place isn't as easy as it could be. What you could do is instead of subtract n from 1000 you could subtract it from 999 (meaning you can do the subtraction digit by digit) and then add 1 at the end i.e. you're subtracting the rightmost digit from 10 and every other digit from 9 (which is just another way of subtracting n from 1000 because 1000 = 999 + 1).

So our 10's complement is 10^d - n and the way we implemented it actually utilised the 9's complement i.e. (10^d - 1 - n). So if I wanted the 9's complement of 531 that's just 999 - 531 = 468 (the 10's complement is 469 because 999 and 1000 differ by 1).

So there's 9's and 10's complements, and you may be wondering why we can't have an 8's complement or a 7's complement when dealing with decimal numbers. The answer is that if you did, and you wanted to subtract your number from say 8888, then if one of the digits is a 9 you'll need to take the subtraction beyond the current digit which defeats the purpose of having a complement.

Anyway now that we know what a complement is we can apply the concept to binary numbers. We shall start off with the 1's complement.

If we have a number 0010 (2) then the one's complement of it is 1111 - 0010 = 1101. The bits just get inverted. The formula for the 1's complement is 2^d - 1 - n. The 2's complement (the only other possible complement) is 2^d - n. so the 2's complement of 0010 is 10000 - 0010 = 1110. Just like how you could consider the 10's complement as the 9's complement plus 1, the 2's complement is the 1's complement plus 1. So the 2's complement of 0010 could be calculated as 1111 - 0010 + 1 = 1101 + 1 = 1110. Notice what happens when you add the 1 at the end: if the leftmost digit is a 0 (meaning it was a 1 before the inversion happened) then it becomes a 1 (i.e. the inverted digit is inverted again meaning it goes back to how it was in n originally). If the leftmost digit is a 1 then it becomes a 0 and the 1 gets carried all the way until there's a 0 to hold it. For example if you say n is 0101011000 then when calculating the 2's complement you first invert the digits so we get 1010100111 then you add one so we get 101010[1000]. The digits in the brackets are exactly what they were when we started so now we know that the easiest way to get the 2's complement is to just start from right to left and we ignore all the zeros and the first 1 that we see and then we start inverting the digits to the end.

Before I get into how this helps represent negative numbers in binary, notice that the complement of a complement is the original number. For example if we start with n and take the 2's complement we get (2^d - n) and if we take the 2's complement of that we get 2^d - (2^d - n) = 2^d - 2^d + n = n.

So how does this translate into representing negative numbers? Say I only allowed myself to represent decimal numbers with 3 digits. Clearly I need to halve my range of positive values in order to have equal representation of positive and negative numbers. We can say that 000 to 499 are positive and anything past that is negative and you just take the 10's complement to find out it's magnitude. e.g. 500 becomes -(1000 - 500) = -500, and 999 becomes -(1000 - 999) = -1. So we get a range of 0 to 499 in positive numbers and -1 to -500 in negative numbers.

In 9's complement we again say 000 to 499 just remain unchanged and represent positive numbers. Then everything else represents the negative of the 9's complement so 500 becomes -(999 - 500) = -499, and 999 becomes -(999 - 999) = -0. So we can represent 0 to 499 in positive numbers and -0 to -499 in negative numbers.

In 2's complement let's say again we can only store 4 bits, so 0000 (0) to 0111 (7) remain unchanged and 1000 (8) to 1111 (15) represent the negative of their 2's complements so with the table we used above:

0000 = 0 0001 = 1 0010 = 2 0011 = 3
0100 = 4 0101 = 5 0110 = 6 0111 = 7
1000 = -8 1001 = -7 1010 = -6 1011 = -5
1100 = -4 1101 = -3 1110 = -2 1111 = -1

Notice that you can easily tell when a number is negative because the leftmost digit is 1.

In 2's complement the 0000 (0) to 0111 (7) remain unchanged but the 1000 (8) to 1111 (15) represent the negative of their 1's complement:

0000 = 0 0001 = 1 0010 = 2 0011 = 3
0100 = 4 0101 = 5 0110 = 6 0111 = 7
1000 = -7 1001 = -6 1010 = -5 1011 = -4
1100 = -3 1101 = -2 1110 = -1 1111 = -0

Considering subtraction can be seen as adding a number and another number's complement i.e. A - B = A + (-B). Here's how a decimal computer would go about subtracting 531 from 712 using 10's complement:

531 --> 1000 - 531 = 469

712 + 469 = 1181

because we end up with an extra digit, remove the rightmost 1 because that represents the 1000 we added on at the beginning

1181 --> 181 and that's our answer.

If we ended up with a three-digit number then we can't just take off the rightmost 1 to subtract off 1000 so instead we take the complement of the number and store that e.g. 280 - 435:

435 --> 1000 - 435 = 575

280 + 575 = 855

855 is the 10's complement representation of the actual answer -145 because we are in excess of 1000 so if we wanted to print the answer we'd just go 855 - 1000 = -(1000 - 855) = -(10's complement of 855) = -145 i.e. take the 10's complement of 855 and stick a minus sign infront of it.

With 9's complement the process is exactly the same except if you get an extra digit holding a 1 (an end-carry) you need to do an end-around-carry. For example let's 712 - 531 using 9's complement:

531 --> 999 - 531 = 468

712 + 468 = 1180

you've got an end-carry but you can't just remove it because that would be removing 1000 when you want to remove 999 to get the true answer, so instead remove the rightmost digit and then add 1 i.e. 1000 = 999 + 1:

1180 --> 180 --> 181. Which is the same as we got with 2's complement.

If you're doing 9's complement subtraction with 280 - 435:

435 --> 999 - 435 = 564

280 + 564 = 844 which is correct in terms of 9's complement representations of negative numbers but if you want to print the actual result you'd go 999 - 844 is 155 so the answer is -155 (you've just taken the 9's complement again).

A question that just popped up in my head is 'what happens if you do the subtraction and end up with 999, because you won't have the extra digit telling you to subtract off 999 to get 0', but then I remembered that 999 = -0 so that a self-solving problem.

As for 2's complement subtraction it's very similar: if we wanted to subtract 10010 (26) from 11010 (18) then we follow the same process:

11010 --> 100000 - 11010 = 00110

10010 + 00110 = 11000 and considering there's no 6th digit we're dealing with a negative number so we can store it like that and if we want to print out what it we can take the negative of the 2's complement and say it's -(100000 - 11000) = -01000 = -8.

If we wanted to subtract 18 from 26 we'd end up with an extra 1 digit in which case we could just throw that away (i.e. take the negative of the complement again) to get our answer.

For 1's complement subtraction let's see what subtracting 26 from 18 is like:

11010 --> 11111 - 11010 = 00101

10010 + 00101 = 10111 which is negative and to find out the value we take the negative of the complement to give us -01000 = -8. If we had done 26 - 18 we'd have an end-carry so we'd need to discard that digit and add 1 because 100000 = 11111 + 1 so subtracting 100000 and then adding 1 is the same as subtracting 11111 i.e. taking the negative of the complement.

So the general process is to calculate A - B we take B's complement and then calculate A + (B's complement) and then take the negative of the complement of the result to get the true answer which, if A > B (i.e. the answer is positive) involves either discarding the extra digit (10's and 2's complement) or discarding the extra digit and adding 1 (9's and 1's complement) and if A < B (i.e. the answer is negative) we can leave the value as it is if we want to store it or if we want to print it we just take the negative of the complement.

So which is better for binary? 1's complement or 2's complement?

Why to go with 1's and not 2's:

getting the complement is easier with 1's complement because you just invert each bit
1's is symmetric i.e. 4 bits goes from -15 to 15 but 2's is assymetric

Why to go with 2's and not 1's:

2's can be implemented with two rules: get the 1's complement and add 1 or search from right to left until you find a 1 and after that you start inverting
when subtracting with 2's complement you can just discard the end-carry whereas with 1's you need to discard it and add 1
1's complement has two zeros which is very bad considering how often computers compare values to zero
whether the number is positive or negative, if a number in 2's complement is even, then the rightmost big is 0 which is good for testing whether an integer is even (in 1's complement that bit is 0 for negative odd numbers)

Seems pretty compelling to me to choose 2's complement, which is why it's the most commonly used system for representing integers in binary.

Friday, 1 August 2014

What's Up With MST Fast Implementations?!

In the last post we looked at Prim's and Kruskal's Algorithms and proved that they were correct i.e. that they produced minimum spanning trees (MSTs). In this post we'll look at their running times and consider if there are ways to speed them up (spoiler alert: there is, and it's beautiful).

Prim's:

In order to know how to go faster than our current implementation of Prim's Algorithm, we need to know how fast our implementation is! Let's look at the pseudocode again (this time using the set notation where V is the set of vertices, X is the set of vertices in our MST so far, V-X is the set of vertices not in the MST yet, and T is the set of edges in our MST so far)

1 Pick some vertex and add it to X
2 While not all vertices are in X:
3 If there is an edge crossing the cut (X,V-X) of minimum cost:
4 Add the edge to T
5 Move its incident vertex in V-X from V-X to X
6 return the graph (X,T)

What would be the asymptotic running time of this algorithm?

at line 1 we just need to pick some vertex so that can be done in constant, O(1), time. As for the while loop, you're going to be adding n-1 vertices to X because there are n vertices and you added the first in step 1, meaning there are O(n-1) = O(n) loops. For each loop you need to go work out whether an edge is crossing the cut, meaning you need to look at each edge and determine if one endpoint is in X and the other is in V-X (the vertices could store their set as a boolean value). This means we'll need to scan all m edges with each loop, giving us O(m) time for each loop (lines 4,5 are both constant time so we don't need to worry about those). So with O(n) loops each with O(m) time complexity we have an overall time complexity of O(nm).

Considering that m >= n - 1 (as shown in the previous post on MSTs) our best case scenario is O(n^2), which is reason enough to ask 'can we do better?'

So with each loop we're looking for the minimum edge crossing the cut, and a data structure whose life is all about repeated minimum computations is the heap. Recall from the post on heaps that heaps can have nodes inserted and deleted in logarithmic time, and also perform the extract-min operation in logarithmic time. Also, building the heap from scratch can be done in linear time (if you do it right).

A good idea would be to give each edge a node in the heap so that the root is always the cheapest edge. This is good for looking at the cheapest edges first but the odds of the graph's cheapest edge crossing the cut (X,V-X) at some point in the algorithm is fairly low. A better idea is to use vertices, with the key of each node corresponding to the vertex's edge cost for its edge crossing the cut (meaning if it doesn't have an edge crossing the cut it's key will be set to inf.). This means we'll be changing the key values of the nodes a lot which will require a lot more reassembly of the heap than if we used edges, but it also means that the root of the heap will always give you the vertex you want to add to X (and the corresponding edge) meaning you can just perform a series of extract-mins and you'll end up with the MST.

So with our heap we want to maintain two invariants:
1) the elements of the heap are vertices in V-X
2) for v in V-X, key[v] = cheapest edge (u,v) with u in X, where if no such edge exists, key[v] = inf.

What's the time complexity of building this heap in the first place? Well first you need to find out what the key of each vertex is, which requires you to check every edge once. You go from edge to edge asking 'is this edge crossing the cut?' and if so, if the edge cost is lower than the incident vertex's current key then the vertex's key becomes the cost of the edge. So that's O(m) time. The next step is simply building the heap using the keys which can be done in linear time so we end up with a total of O(m+n) speed.

So if we just went and did n extract-mins to make our spanning tree, that would be O(nlogn) time, but we need to do more than that because after extracting a vertex, and putting it in X, the cut changes and vertices which previously had a key of inf. i.e. no edge crossing the cut, may now need to have their key re-evaluated. Luckily the only vertices we ever need to consider after adding a vertex v to X are the ones incident to v which are in V-X (we'll call such a vertex w), as their edges are going to end up crossing the cut. For a vertex w we simply delete their corresponding key from the heap (this will require us to keep track of where a vertex's key is on a heap, which can just be stored with the vertex itself) then recompute w's key i.e. the cost of the cheapest edge incident to w which crosses the cut, then we insert w's new key back into the heap.

How many times will we be doing this? Each edge can only be responsible for at most one delete-insert combo, as one of its incident vertices will get sucked into X and then with the incident vertex still in V-X we just see whether this edge cost is lower than the vertex's current key and if so, we delete the vertex's key, assign the edge cost to the vertex's key and re-insert it. As deletions and insertions both take O(logn) time and we can have at most m of these operations, we'll end up with O(m*2logn) = O(mlogn) time in terms of maintaining invariant #2 across the life of the algorithm.

To recap:
1) initialising the keys and building the heap costs O(m+n)
2) extract-mins cost O(logn) and there are n-1 of them to do giving us O(nlogn) for that
3) maintaining invariant #2 costs O(mlogn) time

So our total time complexity is O(m + n + nlogn + mlogn) and as m >= n-1 we can always use m as an upperbound for n meaning our mlogn term is the 'strongest', leaving us with a final time complexity of O(mlogn), which is far better than the naive implementation running time of O(mn).

Moving onto Kruskal's:

What's the time complexity of the naive implementation of Kruskal's? Well considering that we go through each edge from cheapest to most expensive, we must first sort them by cost, which can be done via a quicksort in O(mlogm) time. As for each iteration of the algorithm's while loop, if the edge doesn't produce a cycle then we add it, and the definition of a cycle is where you've got two paths between u and v which don't share any edges. If placing an edge which is incident to u and v produces a cycle, then there must already be a path between u and v. In order to see whether such a path exists we can just do a simple graph search in linear time (I'll leave the implentation of that to another post), and considering we may consider every edge at some point then our time complexity for this is O(mn). That leaves a total running time of O(mlogm + mn) and working out which is the strongest term is half-simple: m is bounded by n^2 because if you consider the adjacency matrix you can have a max of n(n+1)/2 edges. You can therefore replace logm with log(n^2) which is 2logn and as they differ by a constant they are interchangeable, so our running time ends up being O(mlogn + mn) and obviously n is stronger than logn so our strongest term is the mn thus we're left with O(mn).

What would really speed this algorithm up would be a data structure that let us remember whether two nodes have a path between eachother i.e. they're in the same cluster. How about we make a data structure called a union-find which works like this: each cluster has a 'leader' vertex which acts as an identifier for that cluster, and all vertices store the index of their cluster's leader (or SOMETHING to identify it). This means that to check whether two vertices are in the same cluster we can just ask whether two vertices have the same leader, meaning this step can be done in constant time. This is called the 'Find' operation so it's no surprise that the second operation of this data structure is the 'Union' operation. This is where you get two clusters and fuse them together, which is what happens in Kruskal's when you add an edge that bridges two clusters together. If we perform the union operation on clusters A and B where A has fewer vertices (we can store the size of the clusters in a hash table or something), then we'd want to set the leader of all the vertices in A to the leader of B, because it will require less changes. Now we need to ask how many times can this happen in a single run of the algorithm?

Let's consider a single vertex v. Say in the first iteration it's lucky enough to form a cluster with another vertex u which is incident to the first edge added to the MST. Because each vertex started off as its own leader (as they were in clusters of size 1) now we just randomly choose which one becomes the new leader, and maybe that leader is u, so v switches its leader to u (meaning 1 switch so far). We're trying to find the worst case scenario for our vertex in terms of number of switches so if the next edge just connects another vertex to our cluster of two, v's leader won't change because it belongs to the larger cluster. But if instead the second edge gets added to a completely different place, forming another cluster of two, and then those two clusters need to be combined in the next iteration, then worst case says our vertex v will need to swtich its leader again (2 switches now). Notice that each switch means the cluster has at least doubled in size, and starting a cluster of size 1, the cluster can only double in size log2(n) times, meaning worst case scenario is that a vertex switches its leader O(logn) times. Considering there are n vertices this means we'll never have anymore than O(nlogn) switches.

So combining our initial sorting O(mlogm) and our cycle-checks (which includes the total O(nlogn) leader swtiches and finding whether two vertices are in the same cluster which is O(1) and happens O(m) times) we end up with a time complexity of O(mlogm + nlogn + m) where the strongest term is mlogm so we end up with O(mlogm) timecomplexity, meaning now our bottleneck is the sorting that occurs at the beginning of the algorithm.

This is the same running time we got with the speedy version of Prim's as O(mlogm) and O(mlogn) are, as explained above, interchangeable.

NOT BAD. So not-bad infact that I had a tear in my eye when I found out why the faster implementation of Prim's Algorithm with the O(mlogn) factor worked, and similarly the way that the union-find data structure for Kruskal's gives each vertex a max of log2(n) leader changes is awesome.

Sorry I couldn't snaz this up with pictures, but I have a lot to get through in a very short period of time. Next up will be the storage of signed integers, and maybe some stuff on floating point values, then I'll dive headfirst (with little regard for my mental health) into evolutionary dynamics.