Talk:Comb sort

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

comb11?[edit]

comb11 suggests that instead of a "constant scale factor of 1.25" some different method for selecting the gap would better serve.

It's perhaps also worth noting that the average ratio between prime numbers is almost the same as the "optimal gap ratio" for a sequence of maybe 17 prime numbers. This average ratio increases for shorter sequences of prime numbers and decreases for longer sequences of prime numbers. However, I imagine considerable testing would be needed to ascertain whether a "prime gap" approach would be a significant improvement. —Preceding unsigned comment added by 72.83.127.158 (talk) 14:05, 27 January 2010 (UTC)[reply]

Rivals quicksort?[edit]

I think something must be wrong with the Python implementation if this truly is supposed to rival quicksort. Even a simple bubble sort in Python beats it.


Combsort is an improved version of bubblesort that can be almost as good as more complex algorithms like quicksort.

Combsort's useage varies with the comb gapsize offset, so I suspect some of those empty fields might need "varies" as opposed to O notation responses. The most optimized gapsize I've ever seen offered is at http://world.std.com/~jdveale/combsort.htm and uses a table. Alternatively, the original version ( http://cs.clackamas.cc.or.us/molatore/cs260Spr01/combsort.htm ) uses /=1.3 instead. I suspect that the other number, /='1.279604943109628' , is more precise.


How in depth should these encyclopedia articles be? Another entry could be Hybrid Sort which uses Quicksort on lists larger than 6 elements and switches to Insertion Sort when the Quicksort divisions are small enough (20 or less elements), because on very small lists InsertionSort is actually faster. Then there are the many specialized methods for special data. Examples are Bentley Sedgewick, Sadakane's algorithm, and Forward Radix Sort for the sorting of all suffixes of a string as is required for the Burrows Wheeler Transform.

Also, the algorithms should be split into two groups: the comparison based ones that cannot be faster than n log n but do not need additional knowledge of the data, these being Quick Sort, Heap Sort, Merge Sort, Bubble Sort, etc., and the ones that do depend on the data being within a range of values or otherwise comparable to an external reference, these being Bucket Sort and Radix Sort.


Come to think of it, I think combsort might not be stable.


How is this different to shell sort? According to NIST, shell sort completely sorts at each iteration whereas comb sort does only one pass per iteration.


Rabbits and turtles are fine, but does anyone know what the asymptotic complexity is? Arvindn 08:04, 9 Oct 2004 (UTC)


Surprisingly the asymptotic complexity is O(N^2) (yes, the mean is O(N^2), not just the worst case!) but with a very, very low coefficient (it's something above 10^-80, for a gap factor of 1.3, but how much more I can't say). Combsort still has a Turtle problem.

The number of elements, M, that can be sorted to the last place in the array after the first P passes has an upper bound (which can be calculated from P and the gap factor G) regardless of N. For any reasonable gap factor G you can select a P such that the remaining ~ log(N)/log(G) passes cannot move an element from last place in the array to the median element of the array, as each pass moves each value downwards at most once. For G=1.3 for example, P=7 will do.

Say we know the median value in a randomly ordered array of distinct values, and we rewrite the array, writing 0 to every element containing a value less than the median, and 1 to every element containing a value more than the median. The probability that the value 0 is found in -all- the elements that can be sorted to the last place in the array after the first P passes is 2^(-M). So the probability of O(N^2) running time is at least 2^(-M), for some M, determined from G, regardless of N. So the asymptotic complexity is O(N^2).

Finishing off with an insertion sort (rather than doing any bubble passes with interval of 1) deals with the "bad news" input mentioned above, and probably improves the mean complexity. It may even be O(N*log(N)). The worst case running time, however, is probably still worse than order (N*log(N)). How much worse, I can't say.

Alternating between ascending and descending passes (as per the cocktail shaker sort) also addresses the problem (in fact, gap factors approaching 1.4 become workable if you do this) perhaps because all "turtles" are effectively "rabbits" half the time. It might still be possible to construct "bad news" input, but I don't have good enough math skills to figure out the asymptotic complexity of "cocktail shaker combsort" as N tends to infinity. Not by a long shot.

james_barbetti@yahoo.com

C++ Implimentation[edit]

I just wrote a version for C++

Hmm... I should template it, shouldn't I? I'll work on that.

Update: Done!

template <typename RandomAccessIterator>
void combSort11(RandomAccessIterator first, RandomAccessIterator last)
{
   bool flag = true;
   int leng = last - first;
   int offset = leng;
   RandomAccessIterator array = first;
   const double shrink = 1.247330950103979;
   while (offset!=1 && !flag)    // finished only if offset is 1 and no swaps were made
   {
       flag = true;
       if (offset>1)
           offset /= shrink;
       if (offset==9 || offset==10)    // makes it Combsort11
           offset = 11;
       for (int i=0; i<(leng-offset); i++)
           if (array[i]>array[i+offset])
           {
               std::swap(array[i], array[i+offset]);
               flag = false;
           }
   }
}

DevastatorIIC 01:14, 8 March 2006 (UTC)[reply]

I guess flag should be initialized to false instead. --Como (talk) 14:33, 9 April 2008 (UTC). No, no, no: the && should be ||. --Como (talk) 14:41, 9 April 2008 (UTC)[reply]

i wrote a java implementation example... see if its compact/efficient enough plz

 Pulveriser 16:24, 17 April 2006 (UTC)[reply]
The C++ implementation doesn't work. The Java implementation currently on the article page doesn't work for the array (1 20 30 40 50 6 70 80 90). I'm removing the Java implementation from the article for the moment; frankly, I don't think we need a Java implementation if the Python implementation is correct (which I'm not sure of, but it seems plausible). One imperative language is enough. --Quuxplusone 21:24, 11 May 2006 (UTC)[reply]

you're right.. doesn't work. did a comparison test between a mergesort algo and it doesnt work sometimes... will fix it. and i think the python implementation is insufficient since not everyone knows python and a pseudo code is absent. once a pseudo code is made then perhaps only one langugae will be sufficient. Pulveriser 09:16, 25 May 2006 (UTC)[reply]

UPDATE: fixed it... had forgot to add the "continue as bubblesort till no swapping" condition. am posting it here.

private static int newGap(int gap) {
		gap=gap*10/13;
		if(gap==9||gap==10)
			gap=11;
		if(gap<1)
			return 1;
		return gap;
	}
	
	private static void sort(int a[]) {
		int gap=(int)(a.length);
		int temp;
		boolean found=true;
		for(;gap>1||found;) {
			found=false;
			gap=newGap(gap);
			for(int i=0;i<=a.length-gap;i++) {
				found=false;
				if(a[i]>a[i+gap-1]) {
					found=true;
					temp=a[i];
					a[i]=a[i+gap-1];
					a[i+gap-1]=temp;
				}
			}
		}
	}

Pulveriser 09:46, 25 May 2006 (UTC)[reply]

oh yeah, almost forgot... decide whether the Java implementatiopn should be put up on the page... if i don't hear any objections in a week, i'll put it up. Pulveriser 09:50, 25 May 2006 (UTC)[reply]

oops! forgot a line in the code! Pulveriser 10:35, 25 May 2006 (UTC)[reply]

damn, forgot another line... Pulveriser 10:37, 25 May 2006 (UTC)[reply]

Comb sort is a rediscovery and popularization of an algorithm that had been known for years in the theoretical CS community under the names "brick sort", or "Dobosiewicz sort". See, for instance:

Wlodzimierz Dobosiewicz: An Efficient Variation of Bubble Sort. Inf. Process. Lett. 11(1): 5-6 (1980)

I think this article should give proper credit to Dobosiewicz rather than solely mention the Byte paper.

I think Dobosiewicz's home page is the following: http://www.cis.uoguelph.ca/?q=user/6 hence, it might be a good idea to reach him and invite him to comment/contribute hist view on the history of comb/brick sort...

The "brick sort" with which I'm familiar is the one in Knuth, where you just compare each element to the one on its right, then the one on its left, then the one on its right..., so named because a diagram of its sorting network looks like a brick wall. Comb sort is nothing like brick sort, except that it's also a sorting network. Comb sort is also much more efficient; brick sort is even less efficient than bubble sort, unless you can do the comparisons in parallel. --Quuxplusone 21:24, 11 May 2006 (UTC)[reply]
Hmm, So much for my calling it brick sort, that's a mistake.

Sedgewick credits Dobosiewicz with having observed in 1980 that the insertion passes in Shell sort could be replaced with bubble sort passes, and then devised the variation in question. The author then proceeds with defining 3 sorts which improve 3 "naive" sort: brick, bubble and shaker by performing those algorithms in multi-pass decreasing-gap fashion.

So clearly, Dobosiewicz is the inventor, and the guys in the Byte paper merely rediscovered it.

I finally got a copy of Dobosiewicz's paper. It's true: he used a starting gap of n/2 an then he applied, after every bubble-like pass, a shrink factor of 1.3333... while gap>1. Then he applied bubblesort with a simple optimization consisting in stopping each pass at the position of the last swap of the previous pass. He compared the algorithm with quicksort, shellsort and bubblesort, with up to 10000 elements. Knuth proposed in "The Art of Computer Programming, vol3, Sorting and Searching", 1973, exercise 5.2.1-40, a variation of shellsort that made it very very similar to combsort Dobosiewiczsort. He proposed, for the passes with gap>1, just comparing one pair of elements ("Thus, the result of interchanges is not propagated..." as it would in the normal insertionsort applied in shellsort). The only difference is the final pass. Knuth: insertionsort. Dobosiewicz: optimized bubblesort. Lacey&Box: unoptimized (not even with triangular loops) bubblesort. Sincerely, I would call it Dobosiewiczsort. At least, Knuth and Dobosiewicz should be mentioned in the article. --Como (talk) 09:31, 23 May 2008 (UTC)[reply]
And, by the way, the final sort would be faster with insertionsort, as Knuth proposed. Note that insertionsort naturally favors the case of "almost sorted" data. It does this far more efficiently than bubblesort. --Como (talk) 09:41, 23 May 2008 (UTC)[reply]

Comparison sort?[edit]

Is comb sort a comparison sort? ~ Booyabazooka 22:31, 11 May 2006 (UTC)[reply]

(this comment was copied from my talk pageBooya Bazooka)
Yes, comb sort is a comparison sort. It compares the values to be sorted, and sorts them accordingly. A non-comparison sort uses keys, not the values, to sort. I'm not sure if comb sort belongs in the comparison sort article (as it only lists well-known ones (and comb sort isn't)), but you could mention that it is in its own article. DevastatorIIC 09:06, 25 May 2006 (UTC)[reply]
Thanks - I asked not with the intention of putting it in the Comparison sort article, but for putting it in the category. ~ Booya Bazooka 09:12, 25 May 2006 (UTC)[reply]
What about Smoothsort, Timsort and Cycle sort? I'm sure Combsort is more known than them. 83.31.149.76 (talk) 17:37, 30 October 2015 (UTC)[reply]

Implementations[edit]

The current pseudo code in the article is buggy. It doesn't use the swaps variable so loops endlessly. After fixing that, the loop conditions are wrong. Will fail for arrays containing 0, 1 or 2 elements. --John Corbett <wikipedia@pictographer.com>

I say someone writes a pseudocode and or C implementation, then move the others to Wikibooks, ala bubble sort. --DevastatorIIC 13:51, 31 May 2006 (UTC)[reply]

I concur on this point; althought it should unquestionably be pseudocode, not C (some of us, ahem, don't know C yet...). I'm hesitant to go ahead and transwiki these implementations now, because that would leave this article without any code at all. ~ Booya Bazooka 17:45, 31 May 2006 (UTC)[reply]
Is this anywhere close to good? Feel free to tear it apart. --DevastatorIIC 01:47, 1 June 2006 (UTC)[reply]
Done tearing apart ;)
Modified it to use a more rigid syntax. I think yours was a tad too close to literary prose. Jafet 07:51, 23 July 2006 (UTC)[reply]
function update_gap(gap)
    gap := gap / 1.3
    if gap = 9 or 10
        gap := 11
    endif
    if gap < 1
        gap := 1
    endif
    return gap
endfunction

function combsort(data)
    var list gap, swapped, index
    gap := length(data)
    swapped := true
    while gap > 1 or swapped = true
        gap := update_gap(gap)
        swapped := false
        for index := {0 to length(data) - gap}
            if data[index + gap] < data[index]
                swap(data[index + gap], data[index])
                swapped := true
            endif
        endfor
    endwhile
endfunction

Broken links[edit]

Both http://world.std.com/~jdveale/combsort.htm and http://cs.clackamas.cc.or.us/molatore/cs260Spr01/combsort.htm are gone from the web for reasons unknown. I noticed that many other webpages outside of Wikipedia also reference these nonexistent webpages. A brief search failed to relocate these articles. Jafet 07:51, 23 July 2006 (UTC)[reply]

What does this sentence mean...?[edit]

This sentence in the intro doesn't make sense to me:

"improves on bubble sort and rivals in speed more complex algorithms like Quicksort"

The fragment "more complex algorithems ..." seems misplaced. Is there a missing work or something? perhaps "more than"??? Fresheneesz 19:39, 20 September 2006 (UTC)[reply]

answer: "rivals" is a (tensed) verb.

False complexity at Sorting algorithm[edit]

This algorithm is not an O(n log n) worst case or average case algorithm, as noted above - but is listed as such at sorting algorithm. Did the original article include a complexity analysis? If not, it shouldn't list any complexity there - it's OR to do our own nontrivial algorithm analysis. If Dobosiewicz's algorithm is the same, I'm sure he included an analysis that we can use.

Also, er, the note claiming it has "small code size" in sorting algorithm is silly - shellsort has very similar code and better complexity and is accompanied by no such note.

Also, the phrasing "almost as good as [...] quicksort" is implausible - for both small and large lists I seriously doubt it could be competitive in any practical implementation, because I believe it does a lot more comparisons overall. Dcoetzee 03:43, 7 December 2007 (UTC)[reply]


I've tested it for a while. The "and swaps = 0" clause makes it look like... O(n2) worst case. My experiments showed that this clause is indeed necessary... as long as I use a shrink factor of 1.3. In some cases it required up to 6 combs with gap 1. Though, if I use the other number (by the way, that phi is the golden ratio!!) and round the gap in a conservative way (gap := (gap * 1000 + 999) / 1247), then no additional comb of gap 1 is required (in my tests). I tried all possible permutations of up to 11 elements (no repeated values) and some millions of random permutations of up to 100,000 elements.

Now the question is: can anybody prove it? (that a single last comb of gap 1 is enough, and therefore the algorithm is O(n log n) worst case with a shrink factor of 1.247...) Sadly, the links above in Talk:Comb_sort#Rivals_quicksort.3F seem to be broken. Where did all that come out from??? --Como (talk) 19:21, 3 April 2008 (UTC)[reply]

Update: I searched the critical shrink factor. The smallest bad shrink factor I found is 9/7. Every smaller shrink factor killed all turtles in my experiments. Here is my algorithm in non-cryptic C:

 void combsort_7_9 (unsigned int A[], unsigned int len)
 {
   unsigned int i, j, tmp, gap;
   
   gap = len;
                   // No "sorted" flag! The number of
   while (gap>1)   // iterations only depends on the size
   {
     if (gap<10)              // Below 10, descend one
       gap = gap - 1;         // by one until zero
     else
       gap = (gap*7)/9 + 1;   // Integer division!!
                              // +1: ensure shrink factor
     i = 0;                   //     strictly < 9/7
     j = gap;
     
     while (j<len)     // Make a pass with current gap
     {
       if (A[i]>A[j])    // If not sorted ...
       {
         tmp = A[i];
         A[i] = A[j];      // ... swap!
         A[j] = tmp;
       }
       
       i = i + 1;        // Advance
       j = j + 1;
     }
   }
 }

Can anybody find a killer sequence that remains unsorted after this? --Como (talk) 10:44, 17 April 2008 (UTC)[reply]

Update: Somebody did find a killer sequence! See "Combsort: shrink factor for guaranteed O(n log n) worst case time?" in comp.theory, thread started on Thu, 10 Apr 2008. --Como (talk) 15:20, 3 June 2008 (UTC)[reply]

Last version discussed there (in optimized C):

 const unsigned long combtable_sqrt2_primes_and_insertion[] =
 {
   /* Note that small combs have been removed */
   
   5UL, 7UL, 9UL /* Not prime!! */,
   11UL, 13UL, 17UL, 23UL, 31UL, 43UL, 59UL, 83UL, 113UL,
   157UL, 211UL, 293UL, 409UL, 577UL, 811UL, 1129UL,
   1583UL, 2237UL, 3163UL, 4463UL, 6311UL, 8923UL, 12619UL,
   17839UL, 25219UL, 35617UL, 50363UL, 71209UL, 100703UL,
   142403UL, 201359UL, 284759UL, 402697UL, 569497UL,
   805381UL, 1138979UL, 1610753UL, 2277941UL, 3221473UL,
   4555843UL, 6442897UL, 9111629UL, 12885751UL, 18223193UL,
   25771469UL, 36446357UL, 51542927UL, 72892669UL,
   103085789UL, 145785317UL, 206171569UL, 291570607UL,
   412343081UL, 583141177UL, 824686151UL, 1166282329UL,
   1649372281UL, 2332564607UL, 3298744483UL,
   ~0UL
 };
 
 void combsort_with_table_sqrt2_primes_and_insertion (unsigned A[],
                                                      unsigned long len)
 {
   unsigned long comb;
   unsigned n;
   unsigned tmp, * p, * q, * r, * s;
   
   n = 0;
   
   while (len > combtable_sqrt2_primes_and_insertion[n])
     n ++;
   
   while (n>0)
   {
     comb = combtable_sqrt2_primes_and_insertion[--n];
     
     for (p=A, q=p+comb, r=A+len; q!=r; p++, q++)
       if (*p>*q)
       {
         tmp = *p;
         *p = *q;
         *q = tmp;
       }
     
     if (n==0)
       break;
     
     comb = combtable_sqrt2_primes_and_insertion[--n];
     
     for (q=A+len-1, p=q-comb, r=A-1; p!=r; p--, q--)
       if (*p>*q)
       {
         tmp = *p;
         *p = *q;
         *q = tmp;
       }
   }
   
   // End with insertion sort
   
   for (s=A+1, r=A+len; s!=r; s++)
   {
     p = s - 1;
     q = s;
     
     if (*p>*q)
     {
       tmp = *q;
       
       do
       {
         *q = *p;
         q = p--;
       }
       while (q!=A && *p>tmp);
       
       *q = tmp;
     }
   }
 }

--Como (talk) 15:55, 8 February 2010 (UTC)[reply]

Citation for complexity[edit]

(Brejová, Bronislava (September 2001). "Analyzing variants of Shellsort". Information Processing Letters. 79 (5): 223–227. doi:10.1016/S0020-0190(00)00223-4.) ...the average-case time for both variants is Omega(n^2/c^p) where p is the number of increments and c=2 for Dobosiewicz sort and c=4 for Shaker sort.... I don't have access to the full paper though. The key result is reproduced in a survey by Paul Vitanyi. Interestingly he mentions that a final insertion sort is required, suggesting that your 'killer sequences' are known to always exist, but doesn't cite a source for that. Bazzargh (talk) 18:09, 9 October 2008 (UTC)[reply]

Note the text "for certain sequences of increments" in the abstract of "Analyzing variants of Shellsort". I don't have the full text, either, but I bet that they do not mean prime increments. Knuth already pointed out in 1973 that Shellsort would be slower if the intervals had common factors. Indeed, Shellsort _is_ O(N*N) worst case if all intervals are even. Try to sort the sequence {0,1,0,1,0,1,0,1.....} — Preceding unsigned comment added by Comocomocomocomo (talkcontribs) 14:50, 14 February 2013 (UTC) Oops, I forgot to sign! Here I go: Como (talk) 15:02, 14 February 2013 (UTC)[reply]

I don't believe that the average complexity Ω(n^2/2^p) is appropriate here. I have read the paper, and the paper denotes p as the length of the comb-width sequence, say, (11, 8, 6, 4, 3, 2, 1) (in this case, p = 7.) We are using a shrink factor h around 1.3 or 1.4. Then p = logn / logh > 2 logn. (The paper's author uses log = log_2.) In this case, Ω(n^2/2^p) < Ω(1). Wow! Well, Ω(n^2/2^p) is only a lower bound that is meaningful when p is sufficiently small (or, the shrink factor large.) More precisely, Ω(n^2/2^p) is the lower bound of the average number of "inversions" that still remain after the p left-to-right sweeps finish. Inversions here mean any index pairs (i,j) that satisfy i < j and x_i > x_j. If we are to sort the remaining inversions by insertion sort, it takes Ω(n^2/2^p). Ω(n^2/2^p) < Ω(1) means that the extra sort may be unnecessary, but not has to be either (since this is only a lower bound.) We cannot see the word "lower bound" in the abstract, but it is clearly noted in the text.114.145.149.16 (talk) 16:44, 4 June 2013 (UTC)[reply]

"Simplified C++ Implementation"[edit]

 template<class t> inline void combsort(t*a, int n)
 {
   int g=n,s;
   t*j,*i=a-1,z;
   while(++i<a+n-g?z=*i,z>*(j=i+g)?*i=*j,*j=z:--s:(s=n,i=a-1,g=(g/=1.3)>8&g<11?11:g?:1),s>1);
 }

Yeah, that's simple alright. —Preceding unsigned comment added by 62.92.25.130 (talk) 05:33, 5 August 2009 (UTC)[reply]

Simplified Obfuscated C++ implementation. --Paddy (talk) 14:44, 27 May 2010 (UTC)[reply]

Wrong Animation[edit]

I think the animation of the sorting algorithm is wrong. It looks like a simple Bubble sort is done over the data but not using the whole range at first and increasing it later on. A correct animation can seen here: http://www.cs.ubc.ca/~harrison/Java/sorting-demo.html. --Walderich (talk) 07:58, 23 February 2012 (UTC)[reply]

Comb sort versus Merge Sort, Quick Sort and Shell Sort[edit]

I have implemented various sort algorithms in Perl: bubble sort (standard), bubble sort (optimized,i.e. not trying to sort the part of the array already sorted), comb sort (standard), comb sort (optimized with the same improvement when the gap becomes 1)), merge sort, quick sort and shell sort. Then I have timed these various sorts in a number of situations (with number of elements to be sorted between 1,000 and 10,000,000, each time running each sort a large number of times to even out possible artifacts).

It will be no surprise to anyone, I think, that bubble sort (even optimized) has just very poor performances, especially when the list to be sorted grows. I won't go any further into this, and limit the rest of this discussion to optimized comb sort, merge sort, quick sort and shell sort. (In fact, I have also implemented a couple of other algorithms, such as selection and insertion sort, but my point here focuses on the four mentioned faster algorithms).

For arrays populated with random or pseudo-random values, quick sort and shell sort are consistently 2.25 to 2.30 faster than comb sort, which is itself 1.35 to 1.39 faster than merge sort. Part of these differences may be linked to the relative efficiency of my own implementation of these different approaches, but an interesting fact is that the gaps are very consistent, whether running on an array of 2,000 or 1,000,000 elements, which indicates that all four algorithms scale up almost identically.

For arrays that are very far from random, the situation is very different. I have tried to use arrays that are already sorted, that are sorted in reverse direction, or that are close to be sorted in direct or reverse direction. In those cases, quick sort and shell sort exhibit extremely poor performance, whereas merge sort and comb sort continue to behave very well. For example, for arrays with 5,000 elements almost sorted, quick sort and shell sort are both more than 10 times slower than merge sort and comb sort. For arrays sorted in reverse direction (or almost sorted in reverse direction), merge sort and comb sort are about 20 times faster (still with 5,000 elements). With larger arrays almost sorted in either direction, the difference in favor of merge sort and comb sort is even more blatant.

Statistically, the cases where shell sort and quick sort break, compared to the total number of possible cases, are very rare, probably extremely rare in fact. But in real life, the situation is quite different. It is quite common, for example in the index of a database or in the rankings of sport teams, to have data almost sorted and needing to be sorted again. In those cases, quick sort and shell sort, although the fastest algorithms of those tested here for random data, behave very badly.

This has led software vendors and other sorting algorithm implementers to use merge sort (which has a worst case scenario in O(n log n) rather than usually faster algorithms such as quick sort whose worst case scenarios are much poorer. As an example, the Perl implementers have decided, as of Perl 5.08, to shift the standard sort function from quick sort to merge sort. On average, merge sort is 2 to 3 times slower than quick sort, but merge sort does not suffer from worst case scenarios such as almost sorted data as quick sort.

Now, in every single test I have made (and I ran tens of thousands of automatized tests in all kinds of conditions), comb sort has proven at least 35% faster than merge sort. I have seen some (very) small fluctuations in performance results, but I haven't seen any single case where comb sort behaves as bad as quick sort, or anywhere near that, for almost sorted data. There may be cases where comb sort also breaks, but I have yet to see one single case. So, although my tests are far from exhaustive, I submit the idea that comb sort might be a better candidate that merge sort for a general sorting algorithm that does not break on specific situations. I realize that far more work is needed before one can confidently choose comb sort instead of merge sort as the best basic algorithm, but at least, there is rather convincing evidence that it might be the case.

As a final note: even if comb sort really has a mean complexity in O(n²), if the coefficient is really 10e-80, then it is really negligible. I know what the theory usually says about it: even if the coefficient is very low, an algorithm in n log n will always always end up to be better than an algorithm in n² provided the data is large enough. But if the n² coefficient is so small, the data is not going to be large enough for years or probably decades ahead. In my humble opinion, it does not really make sense, as of today, to take into account data volumes that are just not imaginable as of today or in the foreseeable future. Therefore, although it is probably or possibly theoretically correct to state a complexity of O(n²) for comb sort, it would make more sense in my humble view to state a real world complexity neglecting this n² factor and giving real world approximation (I have no idea of whether this should be n log n or something less). Laurent rosenfeld (talk) 00:51, 25 November 2012 (UTC)[reply]

Comb sort is O(n log n) on average, just as quick sort, merge sort and heap sort. Shell sort is worse than O(n log n) on average, so I will not discuss it. The advantage of merge sort is that it performs a stable sort, which might be important in some applications. The main drawback of merge sort is that it requires O(n) additional memory. The other three algorithms (comb, quick, and heap) require only O(1) additional space, but they are not stable sort algorithms. Quick sort is the fastest, but it is O(n²) worst case. We must suppose that comb sort is O(n²) worst case too, until somebody proves it otherwise (see the discussion above). Security applications and real time applications are specially sensible to worst case scenarios. That is why libraries should not offer them as all-purpose sorting algorithms. As far as I know, the most extended sorts in libraries are merge sort (for being a stable sort with O(n log n) average and O(n log n) worst case time), and introsort (a mixture of quicksort and heapsort that requires only O(1) additional memory, and is O(n log n) average and O(n log n) worst case time). Como (talk) 07:39, 7 December 2012 (UTC)[reply]

Cocktail comb sort[edit]

Comb sort and cocktail sort are strategies for handling "turtles" in bubble sort. Comb sort moves turtles more than one space, while cocktail sort turns turtles into rabbits temporarily. Have any papers analyzed the combination of the two? --Damian Yerrick (talk) 03:06, 3 December 2014 (UTC)[reply]

Confused by the pseudocode[edit]

The pseudocode is looping until gap is 1 and swapped is False. But if we have multiple copies of the same number in our list, when we get down to a gap of 1, swapped will never become False because of these lines:

   if input[i] = input[i+gap]
       swapped := true

So you will be stuck in an infinite loop. Please let me know if I am misunderstanding this pseudocode.


I think there is an implicit understanding that the integer division that takes place when `gap` is recalculated will result in 0.

   gap := int(gap / shrink)

This implies that when `gap == 1` is true, then 1/1.3 will result in 0. See Division (mathematics), Case #5 for more specific information regarding this. TraugdorOriginal (talk) 14:28, 8 August 2016 (UTC)[reply]

Pseudocode needs a rework[edit]

I made an edit to the pseudocode recently regarding a case when the ends of an array are sorted, but the middle is not. This case will cause the Comb Sort to fail on the first pass. A javascript implementation of my edit is as follows:

   'comb': function(a) {
       var n = a.length, gap = n, shrink = 1.3, swapped, ignorePass1 = true;
       do {
           gap = Math.floor(gap/shrink);
           if(gap < 1)
               gap = 1;
           var i = 0
           swapped = false;
           do{
               if(a[i] > a[i+gap]) {
                   common.swap(a, i, i+gap);
                   swapped = true;
                   ignorePass1 = false;
               }
               i++;
           }while(i+gap < n);
       }while(gap > 1 && (swapped == true || ignorePass1));
   }

In JavaScript and other languages, Integer Division will give a whole number less than the actual result if floating point division had been used.

This should suffice to let the edit stand and be approved by the general community.

A sample array that will fail to be sorted because of this flaw is found here: https://gitlab.com/snippets/23840. The array is 10 characters long. 10/1.3 equals 7. With a gap of 7, the first pass would give no swaps. Since the algorithm thinks the array is already sorted, it will exit on the first pass without actually doing anything.

TraugdorOriginal (talk) 14:36, 8 August 2016 (UTC)[reply]

After checking the pseudocode with a colleague, we found that the sort will fail with a segmentation fault if the list to be sorted is only one element in length.

TraugdorOriginal (talk) 20:03, 8 August 2016 (UTC)[reply]

History[edit]

Article says:

Comb sort is a relatively simple sorting algorithm originally designed by Włodzimierz Dobosiewicz in 1980.[1][2] Later it was rediscovered by Stephen Lacey and Richard Box in 1991.[3]

The comb sort algorithm or something close to it appears 6 years before 1980, in Kernighan and Plauger's 1974 book The Elements of Programming Style. K&P called it "Shell sort", apparently erroneously. In particular, K&P's version uses exchanges like comb sort, rather than insertions ("sifting") like in the Shellsort article. I'm not sure how to fix the text here. 173.228.123.121 (talk) 04:34, 25 February 2018 (UTC)[reply]

When Comb sort is Shellsort[edit]

If instead of shrinking the gap by a constant factor you take the gaps to consist of all integers of the form 2ᵐ3ⁿ strictly less than the array size, arranged in descending order, then Comb sort performs the same exchanges as Shell sort with that sequence. For example if the array has 32 elements then the largest possible gap is 31 and therefore the sequence is 27 24 18 16 12 9 8 6 4 3 2 1, at which point the input is sorted. For Comb sort with 1.3 as the constant shrink factor, the sequence is 24 18 13 10 7 5 3 2 1 1 1 1 ... until sorted.

The reason for this is that when sorting with gap size g, all elements that are 2g apart and 3g apart are already in their correct order. Since every integer ≥ 2 is expressible as a sum of 2's and 3's, all elements that are kg apart for k ≥ 2 are already in their correct order. (For example 5 = 2 + 3, so if input(i) <= input(i + 2*g) and input(i + 2*g) <= input(i + 2*g + 3*g) then input(i) <= input(i + 5*g).) Hence the only out-of-order pairs of elements that are g apart must be isolated pairs.

In the case of Comb sort, the pass of Comb sort with gap size g will put all those isolated pairs in order, resulting in all pairs that are a multiple of g apart being in order. Every pass of Comb sort takes time O(s) where s is the size of the array. Shellsort more wastefully insertion-sorts those elements in time O(s²/g), failing to notice that the only exchanges it needs to perform are on these isolated pairs. Hence Shellsort takes longer than Comb sort even though it ends up performing exactly the same exchanges.

For an array of size s, there are approximately (log2(s)+1)*(log3(s)+1)/2 integers of the form 2ᵐ3ⁿ < s, or O(log²s). Since each pass of Comb sort takes O(s) exchanges the whole sort requires time O(s*log²s).

This exploits the non-messing-up property of Shellsort, namely that after it has put all the elements that are g apart in their correct order, subsequent passes of Shellsort with other gaps will leave elements that are g apart in order. While Comb sort need not have that property in general (because after the pass with gap size g, elements that are a multiple of g apart need not be in order), it does have that property in the case of this particular sequence because it performs the same exchanges as Shellsort.

I discovered this version of Shellsort 50 years ago, namely on December 26, 1969 while relaxing at the house of my parents-in-law in Tuckahoe, NY the day after Christmas (big family gathering). I later wrote it up as Chapters 3 (the basic result) and 4 (application to sorting networks) of my 1971 Ph.D. thesis "Shellsort and Sorting Networks", which was subsequently published as a book by Garland Press. The method is also in the section on Shellsort in Volume 3 of Knuth's The Art of Computer Programming, as exercise 30 in my edition (might be later in later editions).

The following C code (pardon the compressed style) generates a random array and then sorts it with this method. It prints out the sequence used, then the sorted array. The sequence used is not fully sorted, but does have the property that for every g in the sequence, 2*g and 3*g both appear earlier in the sequence. For S = 100 the sequence is

81 54 27 72 36 18 9 96 48 24 12 6 3 64 32 16 8 4 2 1.

Note that every power of 3 in this sequence is immediately preceded by it times powers of 2, < 100.

#include <stdio.h>
#include <stdlib.h>

#define S 100
int a[S];

// Code to process all powers of 2 times the given g
void sort2(int g) {
    int i, tem;
    if (2*g < S) sort2(2*g);   // Before processing gap g, process gap 2*g
    // Now do the comb sort with gap g
    printf("%d ", g);  // Print the gap just before doing the comb sort with that gap
    for (i = 0; i + g < S; i++)
        if (a[i] > a[i + g])     // swap a[i] and a[i + g]
            {tem = a[i + g]; a[i + g] = a[i]; a[i] = tem;}
}

// Code to process all powers of 3
void sort3(int g) {
    if (3*g < S) sort3(3*g); // Before processing gap g (a power of 3), process gap 3*g
    sort2(g);                // For this power of 3, multiply it by all powers of 2
}

// Code to generate an array, call sort3(1) to kick things off, and print the resulting sorted array
int main() {
    int i;
    for (i = 0; i < S; i++) a[i] = rand()%(10*S);
    sort3(1);     // Perform the actual sort
    printf("\n\n");
    for (i = 0; i < S; i++) printf("%d ", a[i]);
    printf("\n");
    return 0;
}

Vaughan Pratt (talk) 18:37, 14 May 2019 (UTC)[reply]

sorting2.netlify.com has this as the iterative and recursive comb sort! 2A01:119F:21D:7900:C9AF:4B1E:D8B8:D4EA (talk) 08:52, 6 November 2019 (UTC)[reply]

the gap sequence does the powers of two first. and then powers of 3.

..., ..., 128, ..., 192, 64, ..., 96, 32, ..., 144, 48, 16, ..., 216, 72, 24, 8, ..., 108, 36, 12, 4, ..., 162, 54, 18, 6, 2, ..., 243, 81, 27, 9, 3, 1. 2A01:119F:21D:7900:C9AF:4B1E:D8B8:D4EA (talk) 08:57, 6 November 2019 (UTC)[reply]

Yes, that order is equally effective because it still satisfies the requirement that 2*g and 3*g precedes g for every g in the sequence.
But what's sorting2.netlify.com? Is there something to click on? Vaughan Pratt (talk) 21:09, 21 November 2019 (UTC)[reply]