Since I have a few hours to kill on the plane this evening (and I won’t be falling asleep anytime soon due to the delicious Pumpkin Spice Latte I just consumed), I decided to write a post about Comb Sort. Context: on Wednesday I will be presenting this to my Analysis of Algorithms class… this algorithm did not come up during my interview prep, so if you are deep in the process, stressing about how you need to know yet-another-sort-algo don’t fret!
Yo, Bubble Sort Kinda Blows
While Bubble Sort maintains an admirable O(1) space complexity, it has an O(n^2) time complexity. Compared to any decent sorting algorithm, quicksort, mergesort, etc. this is quite inefficient. So, one may wonder, is it possible to speed it up? Where are the inefficencies? Where is the bottleneck? While I cannot promise you Bubble Sort will ever be as fast as a decent O(n log n) algorithm (and it won’t), there is room for optimization!
An Abstract Intuition
Let’s pretend you are a hunter and its your job to hunt as many turtles as possible. Now, turtles are slow creatures, so, we will assume they do not run away from you, We will also assume there are infinitely many turtles, just for the sake of argument.
Since you are a tough guy, your weapon of choice is a club. Thus, you must travel within a yard of each turtle in order to be in striking distance.
For many seasons, you are able to provide for your family. However, your family grows. By the time your family exceeds 100 people, you are having problems keeping up – your strategy for traveling within striking distance of each turtle seriously inhibits the efficiency of your hunting. You begin to wonder, how can you improve upon your hunting? Would it be possible to reduce the distance you travel between turtles?
It turns out the answer to this question is yes! A neighboring family has discovered a hunting algorithm that works considerably more efficiently: in order to reduce the distance traveled, they use a grappling hook to bring the turtles closer.
This preprocessing allows them to perform less steps-per-turtle, allowing them to feed families 100x larger than yours with ease.
Turtle and Rabbits
So how can we apply this to Bubble Sort? (With my best Gavin Belson impression) “Consider the tortoise…”.
In Bubble Sort, there are two, let’s call them, “outlier values”, turtles and rabbits. Assuming an ascending order sort, we have the following definitions:
Turtle: a small value at the end of the list
Rabbit: a large value at the beginning of the list
While Rabbits are cheap to move, Turtles are incredibly expensive to move. Consider the following list:
[X, X, X, X, 1]
If every X > 1, 1 must be swapped (or inverted) 4 times. Now consider a boat load more X’s and several more turtles… I think you see where I am going.
Key Intuition: a bottleneck of bubble sort is the inversion count related to turtles. What if we could reduce the inversion count by increasing the swap gap from 1 to some much larger n (where n < length of the array).
This is the idea behind Comb Sort! In Comb Sort land, it’s open season and you’re going turtle hunting with a long range sniper. Comb sort works by establishing a gap, and swapping values across that gap. Starting with a gap of length of array - 1 and decrementing by the magic 1.3 until we get to 1, we essentially preprocess the unsorted array, reducing the work of Bubble Sort at the end.
In code it looks like this:
# the magical 1.3 GAP_FACTOR = 1.3 def swap(arr, a, b): arr[a], arr[b] = arr[b], arr[a] def getNextGap(gap): gap *= GAP_FACTOR if gap < 1: return 1 return int(gap) def combsort(nums): n = len(nums) gap = n swapped = True while gap != 1 or swapped == True: gap = getNextGap(gap) swapped = False for i in range(0, n - gap): if nums[i] > nums[i + gap]: swap(nums, i, i + gap) swapped = True return nums
How Effective is this Preprocessing?
Curious to see how this performed, I created a list of 10,000 randomly shuffled numbers and clocked (using cProfile) Bubble Sort and Comb Sort. The numbers line up with the intuition:
- ~25,000,000 swaps
- ~8.79 seconds
- ~65,000 swaps
- ~0.05 seconds
While this data may be seriously skewed by some arbitrarilly bad distribution of the random shuffle in my one sample point, this at least goes to show that preprocessing the unsorted list before performing Bubble Sort is an effective strategy for optimization.