Monday, 12 May 2014

What's Up With Bubble Sorts?!

Anybody who's anybody in the world of computer science knows that bubble sorts kinda suck. However, they are a good introduction to sorting and get you warmed up for the cooler sorting algorithms.

Let's say I have a list of numbers 1,5,3,2,6,7,4,2 and I want to end up with 1,2,2,3,4,5,6,7 (i.e. I want to sort the list based on the size of the numbers). One of the ways is to start from the left and compare each pair of numbers and if the one on the right is greater than the one on the left, swap the two. I'd end up with something like this:

 1 5 3 2 6 7 4 2
[1 5]3 2 6 7 4 2 //so far so good
 1[5 3]2 6 7 4 2 // here 3<5 so we swap them
 1 3 5 2 6 7 4 2
 1 3[5 2]6 7 4 2 // 2<5 so swap
 1 3 2 5 6 7 4 2
 1 3 2[5 6]7 4 2 // 6>5 so don't swap
 1 3 2 5[6 7]4 2 // 7>6 so don't swap
 1 3 2 5 6[7 4]2 // 4<7 so swap
 1 3 2 5 6 4 7 2
 1 3 2 5 6 4 [7 2] // 2<7 so swap
 1 3 2 5 6 4 2 7

so we went through the list one time, performing a few swaps that got us closer to what looked like a sorted list. On the next iteration (or 'sweep):

 1 3 2 5 6 4 2 7
becomes
1 2 3 5 4 2 6 7
which becomes
1 2 3 4 2 5 6 7
which becomes
1 2 3 2 4 5 6 7
which becomes
1 2 2 3 4 5 6 7

Although a human can identify that as a sorted list, a computer can't, so just run another sweep of the numbers and if you don't swap anything then you know it's sorted (for each sweep you can have a counter set to zero which you increment with each swap and if that's at zero at the end you know you've sorted the list.)

each of these lists is the result of doing the long swapping process shown above and you can see that because an element can be swapped, if it gets moved to the right it can be swapped again. This allows large numbers to go a long way to the right in a single iteration but it means that smaller numbers take several iterations to go left for example the 2 took three whole iterations to get from the 5th position to the 3rd position. I'm seriously just speculating here but you might be able to find an optimum time to stop scanning left to right and start scanning right to left to quickly move all the residual small numbers to the left. To demonstrate: compare the two situations:

start with 9111
1911
1191
1119
final sweep:
1119
1119
1119

start with 9991
9991
9991
9919
9919
9199
1999
1999
1999
final sweep:
1999
1999
1999

Clearly going left-to-right in comparisons favours larger numbers and so if you don't really have any large numbers to heave all the way from the left of the list to the right you could swap the direction the algorithm goes in. Again bubble sorts aren't very efficient anyway but this stuff is interesting.

As for swapping, it's impossible to swap two elements without introducing a temporary place to store one of them. Like if you drew two circles on piece of paper and put a coin in each and tried to swap the coins whilst only allowing one coin to be off the paper at any point in time and only allowing one coin to occupy a circle at any point in time, you just can't do it. However if you introduce a third circle  you can put one of the coins there whilst their previous circle gets occupied by the other coin, allowing the coin in the third circle to be moved into the previous circle of the other coin. It looks something like this:

swap(X,Y):
temp <-- list[X]
list[X] <-- list[Y]
list[Y] <-- temp
bammmm

here's a pseudocode bubblesort algorithm from wikipedia:

procedure bubbleSort( A : list of sortable items )
   n = length(A)
   repeat     
     swapped = false
     for i = 1 to  n-1 inclusive do
       /* if this pair is out of order */
       if A[i-1] > A[i] then
         /* swap them and remember something changed */
         swap( A[i-1], A[i] )
         swapped = true
       end if
     end for
   until not swapped
end procedure

Here they used a boolean value to determine whether a swap was made which is certainly more efficient than my idea but using a counter allows for some level of self-analysis of the algorithm if you want to implement that.

And whilst getting that code of wikipedia I realised the thing I mentioned above has actually been considered by other computer scientists and is referred to as rabbit and turtles and has been circumvented with a few more optimised algorithms. Refer to the cocktail sort (the thing I was leaning towards) or the comb sort (no idea what this is) for more on that. Until next time, goodbye.

No comments:

Post a Comment