Sunday, 1 June 2014

What's Up With Complexity?!

Well I said my next post was going to be on binary and whatnot but I decided to space out the more interesting topics and settle with something dry like data structures. And then I remembered how I cannot be bothered making a post on data structures so I reverted to the idea of doing something fun and settled on complexity.

Complexity is just a solid way of measuring the rate at which an algorithm's processing time will increase with an increase in the size of the input. I'm going to be talking here mostly about inputs which are lists and also some graphs, because these have fairly variable sizes and thus the running time of algorithms which take them as input are very dependent on that size.

We'll start off by talking about running time. This is just the time measured in (hopefully) seconds that an algorithm takes to produce an output given some input. This is going to be dependent on all sorts of things like how fast your computer's processor is, the size of your input (consider the time taken to sort a list of 2 elements compared to 1 million elements), the quality of the code after compilation, and the algorithm's time complexity. Now if we used running time to compare the efficiency of two different algorithms, you'd need to test them on the same machine with the same inputs etc which isn't a very convenient method of comparison (albeit it's certainly more rigorous). The time complexity is the part which is solely dependent on the algorithm, or at least it's dependent enough for you to easily compare between different algorithms.

The time complexity of an algorithm is a measure of the speed of an algorithm as a function of the size of the input, given some abstract machine to work with. What do I mean by abstract machine? Well there may be some processor architectures which are optimised for doing different things like for example the EXAMPLE2000 processor can assign one variable to another in no time whatsoever but takes an entire year to process a simple multiplication, whereas the ANOTHEREXAMPLE3000 processor can do pretty much everything in a millisecond but printing expressions takes a week. The 'abstract machine' is just some model for the time cost of everything that your algorithm can do. Now I know above I said that the time complexity doesn't care about the speed of your processor as they are both components of the 'running time' which is true; e.g. you can have the same processor but one's overclocked. But you still need to use an abstract machine as a framework to determine the relative time cost of everything that your algorithm does.

We're going to use a simple but fairly representative abstract machine where each assignment (x<--y), print, return and comparison (x<y) takes one step (where a step is some unit of time that we can use for comparison). Operations won't have any time cost (this part is not representative but wherever you see operations there'll be an assignment or comparison nearby anyway so we'll consider them covered). Also this will be a random-access machine meaning accessing memory also has no time cost. Also our machine can't do anything fancy like automatically swap elements in a list; it's basically bare-bones in terms of functionality (but don't worry it's not like it's assembly language which I will be doing a post on in the future and it's going to be awesome).

Okay let's have a look at a simple algorithm: the Power(x,n) algorithm which takes x and n and outputs x^n.

Power(x,n):
value <-- 1 //assignment so +1 to time complexity
k <-- 1 // another +1
while (k<=n) do { //comparison so +1
    value <-- value * x //assignment so +1
    k <-- k + 1 // another +1
}
return value //return statement so +1

Alrighty how many steps did we count? 1+1+1+1+1+1 = 6 steps. But hang on, we ignored the fact that there's a while loop in there meaning that part of the code can be executed multiple times. The question is how many times will it be executed? k starts at 1 and gets incremented at the end of the loop so if n is 3 it's going to be incremented 3 times because after the second time k=n so k<=n is true. So the three +1's that form the steps in the loop occur 3 times however there is then another +1 when the while loop checks whether k<=n and finds out that k is larger than n. So now what do we have, 3 +1's outside the loop, 3 as part of the loop which runs three times so that's 3*3 and then an extra +1 on the final comparison of k and n. This leaves us with 3*1 + 3*3 + 1 = 13 steps. But this was for when n=3, so the general expression would be 3*1 + 3*n + 1 = 3n+4. We can call the time complexity t(n) so for this algorithm t(n)=3n+4.

I'm going to make a digression here and say that finding out how many times a loop is going to execute sucks but you don't need to always say something like 'what if n=3?' such as what we did above. There are only a few possibilities of what the conditions of the loop are: you can have k<n or k<=n and you can have k start off being 0 or 1. We can display the number of loops involved in each case using a table:


What about with do-until loops? If we have a loop like this:

k <-- 1
do
    //insert code here
    k+ <-- k + 1
until k = n

then you're going to have n-1 loops (and thus if k<-- 0 at the beginning you'll have n loops). In that sense do-until loops are identical to while loops with a k<n comparison except that with do-until loops there will always be at least 1 loop no matter what because the loop condition occurs at the end of the loop.

Alright one more digression (but this one is a bit more important), if you are counting the steps in an algorithm and come across something like this:

1    if (a<b) { 
2        do something worth 1 step
3    }
4    if (a>b) {
5        do something else worth 1 step
6    }

you might be tempted to count 4 steps here but in fact at most 3 steps can be counted because if a<b then your processor will just skip line 5 meaning that step isn't counted and if a>b then line 2 will be skipped, and if a=b both lines 2 and 5 will be skipped. So here we can say the worst case scenario (a<b or a>b) is 3 steps and the best case scenario (a=b) is 2 steps. If not otherwise specified, any discussion on running time/time complexity refers to the worst case scenario.

Anyway those two digressions illustrate well that the nuances of loops and if statements suck and it would be great if there was a way to not have to worry about them. Well it just turns out that asymptotic analysis is the perfect tool for ignoring those kinds of things. Asymptotic analysis is all about describing the rate at which an algorithm's running time grows for large values of n. Looking at the result of t(n)=3n+4 from above, we can see that for enormous values of n, the 4 doesn't really make much of a difference. That is, the leading n term becomes a better and better approximation of the real value of t(n) as n becomes huge. If we take n=2, t(2) = 6+4 = 12 and simply 3n = 3*2 = 6. So here our approximation differs by a factor of 2. But if we take n=10,000, t(10,000) = 30,000 + 4 = 30,004 whereas simply 3n = 30,000. Here they don't differ by a pathetic factor of 1.0001.

What if an n^2 term showed up in our t(n) function? How could an n^2 term show up? We could have a nested loop, something like:

poweroftwo(n):
value <-- 1
k <-- 1 
while (k<=n) do {
    l <-- 1
    while (l<=n) do{
        value <-- value + n
        l <-- l + 1 //holy crap 1 and l are similar in this font
    }
    k <-- k + 1
}
return value

So outside all of the loops we have 3 +1's and for the outer while loop you've got a comparison and an assignment so that's 2 +1's per loop (and an extra +1 for the last k<=n check) and for the inner while loop you've a comparison and 2 assignments meaning 3 +1's per loop (and an extra +1 for the last l<=n check). So you've got 3 + [n*(2 + [n*(3) + 1]) + 1] where [,] are around each while loop and (,) are around the code in the loop not including the final check. So all in all we'll have t(n) = 4 + n*(3+3n) = 3n^2 + 3n + 4.
Now like above, the 3n and 4 terms don't really matter once n becomes huge because the factor between them becomes smaller and smaller. (the difference gets bigger but not fast enough for the factor to actually increase between them). Consider the graphs of t(n) against n for small values of n:


Compared to large values of n:


So what does this tell us? If we want to be able to efficiently measure algorithmic time complexity, all we need to worry about are the leading terms in our time complexity functions, because these will always dominate as n becomes larger. 

There is a problem with just choosing something like 3n^2 to compare different algorithms which is that in reality different machines will have different ways of implementing algorithms meaning that '3' could be a '2' or a '10' on some other machine. In our case there were three steps in the inner loop but we ignored all the operations like +,* etc so that 3 in 3n^2 could be completely different in reality. If you wanted to get that specific you really would need to look at the architecture of the processor you intend on using. Although the '3' is arbitrary, the n^2 is not and I doubt there's a processor in the world which can turn that n^2 into an 'n' or a constant. But the n^2 is all you need to understand how the running time of the algorithm grows as n becomes large. Luckily for us, when you're comparing two time complexities on different orders, e.g. one has an n^3 leading term and the other has an n^2 leading term, you are guaranteed that past some value of n, regardless of all the constant factors, the time complexity containing the n^3 will become be on top.

A more formal way of saying 'on top' is that our function of a higher power of n will be an 'upper bound' of the function with the lower power. This is where big O notation comes in. We say that t(nϵ O(f(n)) if there is some constant C and some value L for which C*f(n) > t(n) for n > L. 

For the t(n) we obtained above: 3n^2 + 3n + 4. We can say this is an element of O(n^2). Because if you said f(n) = n^2 and then gave it a constant factor like 4 then after some value of n it's always going to be above t(n). If we gave it a factor of 3 that 3n+4 would stop our f(n) from being an upperbound, so that extra n^2 in 4n^2 covers those smaller terms as we know n^2 will get larger than 3n+4 as n becomes large. Here's a graph showing this:


You might be thinking 'surely I can just say f(n) = n and make C so big that C*f(n) will always be an upper bound for t(n), which makes this whole big O notation pointless!'. I know that's what I thought when I started learning about big O notation, but the fact is that no matter how large you make your value of C, the function with the higher order will always end up on top, the only thing that's going to change is the value of L.

Let's see if we can say that our t(n)  ϵ O(n) when we say f(n) = n and C = 1000 i.e. we have 3n^2 + 3n + 4 competing against 1000x.


Looks convincing, but let's zoom out:



BAM. Looks like here, L is ~330. What matters is that with a larger leading term, coefficients do not matter for large values of n.

The second thing you might be thinking is the converse of this: can't we just choose our f(n) to be stupendously large meaning we end up with something like t(n) ϵ O(n^n^n^9001). Well yes, we can. It's completely within the definition of our big O notation, but that doesn't make it very useful to say. Big O notation started off in mathematics and has been translated to computer science but I don't think in mathematics they were particularly concerned in whether the C*f(n) was the least upper bound of t(n), all they cared was that after some point n=L, C*f(n) was an upper bound. In practice, we do want to express our time complexity function t(n) as an element of the set of all functions with the same approximate rate of growth for large values of n, but in theory, big O notation is more about expressing our t(n) as an element of the set of all functions which are eventually bounded by C*f(n). So when I say 3n^2 + 3n + 4 is an element of O(n^2) I really mean it just grows in the same way that n^2 does for large values of n, even though I'm theoretically allowed to say it's also an element of O(n^10).

For our algorithm with the nested loop, we would say its complexity is O(n^2) (i.e. its time complexity function can be bounded by n^2 times some factor). For a single loop we would say its complexity is O(n). Let's look all the common complexity 'classes', in order of increasing complexity:

constant                O(1)
logarithmic            O(logn)
linear                    O(n)
superlinear            O(n*log(n)) (also known as linearithmic which is a blend of linear and logarithmic)
quadratic              O(n^2)
polynomial            O(n^c)
exponential           O(2^n)
factorial                O(n!)

O(1) applies to any algorithm whose running time is independent of the input e.g. some algorithm like 'PrintTimeOfDay' which doesn't even take input, or an algorithm which adds two values a and b.

O(logn) shows up in divide-and-conquer algorithms where with each recursion/loop you're breaking down the size of the problem by a constant factor. In a binary search, that factor is 2. Worst case scenario, you don't even find the target, meaning you halved the number of elements in your sublist until you got to one element and then you worked out the element isn't the one you're searching for. Let's say you start off with n=16 elements and you're searching for an element like '1' when the lowest in the sorted list is '2':
We started with n=16 elements and got k=5 loops. If we had started with just the 8 elements there would have been one less loop i.e. k=4. And with 4 elements we'd end up with k=3. The relationship here is that n=2^(k-1). Rearranging this we get k=log2(n)+1. This is a pretty good basis for a time complexity function t(n), because the nuts and bolts of the algorithm will only contribute to smaller terms and some constant factors e.g. maybe each full loop cost 5 steps meaning our t(n) would be 5*log2(n)+1. But clearly the leading term here is log2(n) so we can say that t(n) ϵ O(log2(n)). But above, the complexity class which divide and conquer applies to is just O(log(n)) and doesn't include the base 2. Why is this? Because you can convert between any two logarithms with different bases simply using different constant factors (and constant factors are discarded in big O notation).

To prove this I'll say loga(n)=C*logb(n) where a and b are different bases and C is some constant. We can divide both sides by logb(n) to get:

loga(n)  =   C
logb(n)

and using some log law whose name I've forgotten, we say:

loga(n)  =   logb(a)
logb(n)

so logb(a) = C. Let's go back to the original equation and substitute that in:

loga(n)=C*logb(n) --> loga(n)=logb(a)*logb(n)

and we know that a and b were just arbitrary bases so logb(a) is an arbitrary constant (it doesn't depend on n) so we've proven that the only difference between two logs with different bases is a constant factor. So we don't need to bother including the base when we express the complexity in big O notation because big O notation doesn't care about constant factors.

We've talked about the worst case scenario but not the best case scenario of binary searches. Best case, like many algorithms, is to just get it right on the first time i.e. in the first loop you immediately work out that the middle element in the list matches your target so you can just output the index of it without doing any extra work. This clearly doesn't depend on the input at all so the time complexity is O(1).

You may be thinking 'what about the average case?'. The average case is not always easy to work out, and may involve running the algorithm millions of times for randomised inputs and getting the big O of the average time complexity. The worst-case complexity is usually a good enough measure of what kind of running times we should expect from an algorithm.

So now we've talked about constant time complexity, logarithmic, we've done linear already, next up is superlinear. This is just a combination of linear and logarithmic, e.g. for every element in the list you need to perform something which has a complexity of O(logn). So far we haven't come across one of these but Binary Trees and Quicksorts both involve that O(nlogn) complexity so I'll go into depth when we come to those.

Polynomial time i.e. O(n^c) where c is some greater-than-1 constant shows up whenever you've got nested loops (just like with quadratic time). However there are some situations like particularly efficient matrix multiplication algorithms which can have complexities like O(n^log2(7)) in the case of 'Strassen's Algorithm'. Here's a good time to mention that due to big O notation being all about upper bound, and with O(anything) just being a set of functions, each big O set of lower complexity is a subset of the sets with higher complexity. For example the running time of logarithmic-time algorithms can be bounded by a polynomial function like n^5 and thus although not polynomial in the traditional sense, logarithmic-time algorithms are considered to be 'polynomial-time' algorithms simply because O(logn) ϵ O(n^c). Any algorithm with a polynomial time complexity is considered pretty good and if you have an algorithm which operates on polynomial time, even if it's n^3 or n^5, it's probably going to perform better than an algorithm which has an n! or 2^n term in the t(n) function, for large values of n.

As for O(2^n), the 2 in that is also arbitrary and all that matters is that it's greater than 1. Why? Because 2^n and 3^n differ by a constant factor (which is being multiplied with n) of log2(3). That is, 2^(log2(3)*n) = 3^n. But this isn't the same kind of constant factor we've seen before because it actually does make a difference to the rate of growth of the running time, due to being part of the exponent. However, as previously mentioned, the coefficients on our n-terms are subject to change depending on the processor, so we consider both the above examples to be equal.

As for situations where O(2^n) shows up, I'll be making a post about the travelling salesman problem which can be solved via 'dynamic programming'. The same problem can be solved using brute force (i.e. generate all the permutations of paths that our salesman can take and then output the valid ones) and if the word 'permutation' gives any indication, the complexity of that particular approach is O(n!). The problems whose best algorithms require exponential, factorial or even greater complexity (something like maybe O(n^n!)) are called 'intractable' problems. 'Intractable' means 'hard to deal with' i.e. for large values of n the running time really gets out of hand fast.

I wanted to show a comparison table of running times for various complexities and values of n but this blog's budget is already cutting enough corners (i.e. I spent 5 minutes looking for it on the internet and couldn't find it) but take my word that for large values of n, exponential and factorial running times are absolutely enormous. I don't care how super your super-computer is, if you can compute one billion steps per second, dealing with an algorithm where t(n) = n! where n = 100, you will still need to wait 100!/1,000,000,000/60/60/24/365/1,000,000,000 = 3*10^132 billion years before your algorithm is done. 

For everything I've said here about time complexity, the same can be said about space complexity. Space complexity is just an expression of the amount of memory you need to be able to use your algorithm e.g. sorting a list of n elements may require O(n) space to be able to store the list in memory whilst it's being used. Luckily with the level of compression technology and data-storage technology we have today, space complexity doesn't need to be considered as often as time complexity, but it still is a factor to consider if you have enormous inputs or enormous temporary chunks of data being used in the algorithm, and if you don't have much space to actually store it all.

Another thing to mention is that n is not always the only thing which can affect the complexity. For graphs where vertices and edges are stored separately, you can have algorithms that have time complexities along the lines of O(E+V). However even for graphs you can just consider n to be E+V, and simplify the problem if you want to.

Now when would using big O notation not prove very useful? As stated above, the notation is best used when looking at large values of n, meaning that some 'higher-complexity' algorithms may in fact have lower running times for very small values of n. Consider n^3 vs 2^n for small values of n. Before 2^n starts really taking off, it's actually underneath its polynomial counterpart and so for those values of n, it would be the better time complexity. This is why people often include multiple algorithms in their code and just switch gears between the algorithms when the value of n reaches the L in the formal definition of big O notation (or at least I think they do because that would make sense).

Also, constant factors often matter a lot, and although I showed you about half-way through this post that 1000x will eventually get overtaken by a higher-order complexity, that doesn't mean that while it's on top, it's  not on top by a very large margin. Again, gear changers would be good here. This applies to the smaller terms in our t(n) function too, as they can dominate the function for smaller values of n, depending on their constant factors.

Algorithms which go to great lengths to lower their complexity often become exceedingly 'complex' in terms of readability and comprehendability (that's not a word? It is now) and if you're working with a team that would rather keep things simple because efficiency isn't that much of a priority, it could be beneficial to just stick to the 'simpler' but more time-complex algorithms. This is a bit of a hilarious contradiction; having the easier-to-understand algorithms being more complex than the harder-to-understand ones, but it makes sense when you consider the usual process of making algorithms is to start off with something intuitive and in the search for more efficient algorithms to solve the same problem, venture further from what comes easily to your understanding.

SO that concludes my enormous post about complexity. I was going to touch on Big Ω ('big omega') and Big Θ ('big theta') notation but I'll save that for another time. Actually they won't take long to explain: basically Big Ω is about finding a lower-bound to an algorithm (the only difference is '>' sign from big O gets flipped) the and Big Θ is the set which is the intersection of the Big O and Big Ω sets i.e. if t(n) is an element of Θ(n^2) then t(n) can be upper-bounded by C1*n^2 where C1 is some positive constant and lower-bounded by C2*n^2 where C2 is some other positive constant. if your function is in Θ(n^2) then you're guaranteed that the highest-order term in your function is some constant times n^2, which begs the question of why we don't use big Θ for everything? Basically the answer is that with big O notation you can categorise complexities into all the higher ones so I can say 'polynomial' includes O(logn) for convenience (well maybe this is more of a weakness than a strength), but more importantly the intersection of all the big O's and the big Ω's isn't that big because you're not always going to have a function which, depending on the constant, can be both an upper bound and a lower bound for your t(n), so we just go with big O notation because in practice it's basically used as if it were big Θ but also for the sake of convenience via just sticking to one kind of notation.

OKAY now I'm done, there you go. Complexity is quite a beast but at the end of the day it's just one of the approaches we have to comparing different algorithms.

No comments:

Post a Comment