Lucene - Core
  1. Lucene - Core
  2. LUCENE-2719

Re-add SorterTemplate and use it to provide fast ArraySorting and replace BytesRefHash sorting

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.1
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      This patch adds back an optimized and rewritten SorterTemplate back to Lucene (removed after release of 3.0). It is of use for several components:

      • Automaton: Automaton needs to sort States and other things. Using Arrays.sort() is slow, because it clones internally to ensure stable search. This component is much faster. This patch adds Arrays.sort() replacements in ArrayUtil that work with natural order or using a Comparator<?>. You can choose between quickSort and mergeSort.
      • BytesRefHash uses another QuickSort algorithm without insertionSort for very short ord arrays. This class uses SorterTemplate to provide the same with insertionSort fallback in a very elegant way. Ideally this class can be used everywhere, where the sort algorithm needs to be separated from the underlying data and you can implement a swap() and compare() function (that get slot numbers instead of real values). This also applies to Solr (Yonik?).

      SorterTemplate provides quickSort and mergeSort algorithms. Internally for short arrays, it automatically chooses insertionSort (like JDK's Arrays). The quickSort algorith was copied modified from old BytesRefHash. This new class only shares MergeSort with the original CGLIB SorterTemplate, which is no longer maintained.

      1. LUCENE-2719.patch
        30 kB
        Uwe Schindler
      2. LUCENE-2719.patch
        28 kB
        Uwe Schindler
      3. LUCENE-2719.patch
        28 kB
        Uwe Schindler
      4. LUCENE-2719.patch
        25 kB
        Uwe Schindler
      5. LUCENE-2719.patch
        22 kB
        Uwe Schindler
      6. LUCENE-2719-CollSupport.patch
        45 kB
        Uwe Schindler
      7. LUCENE-2719-CollSupport.patch
        42 kB
        Uwe Schindler
      8. LUCENE-2719-CollSupport.patch
        40 kB
        Uwe Schindler
      9. LUCENE-2719-final-3x.patch
        35 kB
        Uwe Schindler
      10. LUCENE-2719-final-trunk.patch
        59 kB
        Uwe Schindler

        Issue Links

          Activity

          Hide
          Uwe Schindler added a comment -

          Attached you find the patch. Robert offered to do benchmarks with Automaton.

          The patch can be applied to a clean checkout, you no longer need to svn copy old SorterTemplate, as this is a almost complete rewrite.

          This patch removes the CHANGES.txt entry for 3.x, as it readds the class. If we don't merge this to 3.x, the CHANGES should be reverted. As Lucene uses Arrays.sort(Object[]) which is slow at other places, I recommend to add it also to 3.x.

          Please test the stuff with large -Dtests.multiplier! Maybe also verify my modified quickSort!

          Show
          Uwe Schindler added a comment - Attached you find the patch. Robert offered to do benchmarks with Automaton. The patch can be applied to a clean checkout, you no longer need to svn copy old SorterTemplate, as this is a almost complete rewrite. This patch removes the CHANGES.txt entry for 3.x, as it readds the class. If we don't merge this to 3.x, the CHANGES should be reverted. As Lucene uses Arrays.sort(Object[]) which is slow at other places, I recommend to add it also to 3.x. Please test the stuff with large -Dtests.multiplier! Maybe also verify my modified quickSort!
          Hide
          Uwe Schindler added a comment -

          Using this class we can look for more useless quickSort code duplication. One is e.g. in DocFieldProcessorPerThread. Maybe more of them.

          Show
          Uwe Schindler added a comment - Using this class we can look for more useless quickSort code duplication. One is e.g. in DocFieldProcessorPerThread. Maybe more of them.
          Hide
          Yonik Seeley added a comment -

          This also applies to Solr (Yonik?).

          Yep - Solr also has it's own version of quicksort - PrimUtils.sort() to deal with sorting indexes (an int[]) instead of objects (parallel array sorting).

          Show
          Yonik Seeley added a comment - This also applies to Solr (Yonik?). Yep - Solr also has it's own version of quicksort - PrimUtils.sort() to deal with sorting indexes (an int[]) instead of objects (parallel array sorting).
          Hide
          Uwe Schindler added a comment -

          The quickSort in DocFieldProcessorPerThread also converted to ArrayUtil.quickSort().

          Yonik: I will take care of PrimUtils!

          Show
          Uwe Schindler added a comment - The quickSort in DocFieldProcessorPerThread also converted to ArrayUtil.quickSort(). Yonik: I will take care of PrimUtils!
          Hide
          Uwe Schindler added a comment -

          New patch:

          • Cleaned up (removed useless imports)
          • Added test that verifies that mergeSort() is stable - this is now validated and we can use mergeSort() in ConjunctionScorer
          • Further test optimization, so also totally reversed arrays get sorted correctly (special case)
          Show
          Uwe Schindler added a comment - New patch: Cleaned up (removed useless imports) Added test that verifies that mergeSort() is stable - this is now validated and we can use mergeSort() in ConjunctionScorer Further test optimization, so also totally reversed arrays get sorted correctly (special case)
          Hide
          Michael McCandless added a comment -

          This is great Uwe!

          Show
          Michael McCandless added a comment - This is great Uwe!
          Hide
          Uwe Schindler added a comment -

          Modified all sorts to use >>>1 instead /2 (have no real opinion about that).

          Yep - Solr also has it's own version of quicksort - PrimUtils.sort() to deal with sorting indexes (an int[]) instead of objects (parallel array sorting).

          According to java.util.Arrays javadoc: The sorting algorithm is a tuned quicksort, adapted from Jon L. Bentley and M. Douglas McIlroy's "Engineering a Sort Function", Software-Practice and Experience, Vol. 23(11) P. 1249-1265 (November 1993). This algorithm offers n*log performance on many data sets that cause other quicksorts to degrade to quadratic performance.

          If I change to SorterTemplate, we will degrade to good old quicksort. We could also upgrade SorterTemplate to use this algo, but I am not sure if thats easy because SorterTemplate only allows swap(index, index) and compare(index, index). But we cannot retrieve e.g. the pivot value. This was also one problem in porting BytesRefHash's quicksort, as it used the value of the pivot (this is why there is an additional check below the commented out assert in the current patch's algo).

          Show
          Uwe Schindler added a comment - Modified all sorts to use >>>1 instead /2 (have no real opinion about that). Yep - Solr also has it's own version of quicksort - PrimUtils.sort() to deal with sorting indexes (an int[]) instead of objects (parallel array sorting). According to java.util.Arrays javadoc: The sorting algorithm is a tuned quicksort, adapted from Jon L. Bentley and M. Douglas McIlroy's "Engineering a Sort Function", Software-Practice and Experience, Vol. 23(11) P. 1249-1265 (November 1993). This algorithm offers n*log performance on many data sets that cause other quicksorts to degrade to quadratic performance. If I change to SorterTemplate, we will degrade to good old quicksort. We could also upgrade SorterTemplate to use this algo, but I am not sure if thats easy because SorterTemplate only allows swap(index, index) and compare(index, index). But we cannot retrieve e.g. the pivot value. This was also one problem in porting BytesRefHash's quicksort, as it used the value of the pivot (this is why there is an additional check below the commented out assert in the current patch's algo).
          Hide
          Michael McCandless added a comment -

          I tested indexing first 10M wikipedia 1KB docs and running various queries and found no real perf change, or at least less than the noise in the test...

          Show
          Michael McCandless added a comment - I tested indexing first 10M wikipedia 1KB docs and running various queries and found no real perf change, or at least less than the noise in the test...
          Hide
          Uwe Schindler added a comment -

          Latest updates.

          There is also an additional patch which provides CollectionUtil, now also supporting in-place collection sorts which is much more perfromant for smaller collections. Collections.sort() in JDK does copy the List into array and calls Arrays.sort() which itsself does clone the array and after that copies the arraycontents using a ListIterator back to the List.

          Before committing this, can somebody look into the o.a.l.index package, because I replaced some sorts for field names there. For commit points i used MergeSort, as I am not sure if it should be stable.

          So just a confirmation, if we need stable sort for Indexer and Commit points would be fine.

          Show
          Uwe Schindler added a comment - Latest updates. There is also an additional patch which provides CollectionUtil, now also supporting in-place collection sorts which is much more perfromant for smaller collections. Collections.sort() in JDK does copy the List into array and calls Arrays.sort() which itsself does clone the array and after that copies the arraycontents using a ListIterator back to the List. Before committing this, can somebody look into the o.a.l.index package, because I replaced some sorts for field names there. For commit points i used MergeSort, as I am not sure if it should be stable. So just a confirmation, if we need stable sort for Indexer and Commit points would be fine.
          Hide
          Uwe Schindler added a comment -

          Robert tested yesterday and found out that SorterTemplate.quickSort is not as efficient as it could be. The general problem is:

          • Quicksort needs the value of the pivot/partition element and the main sorting step compares this single value quite often
          • For our in-place algorithm that only used swap(i,j) and compare(i,j), the main loop's swap statements needed an extra check that not the pivot index is swapped and so the pivot changes suddenly. Because of this when the index of the pivot is swapped, the pivot index value needed to be updated.

          I changed SorterTemplate to look more like FieldComparator known from search. It now has not only swap(index1,index2) and compare(index1,index2), it also gets setPivot(index) [stores index' value as pivot] and comparePivot(index) [compares given index' value with previously stored pivot value]. Now the quicksort algorithm is identical to the one seen everywhere in Lucene before. We can now also implement the optimized one from harmony also seen in Solr's PrimUtil. I will look into this, if it makes sense (it makes not always sense as comparing and swapping is more intensive for non-native values!).

          This has also some improvements to BytesRefHash, as there are less de-references of BytesRefs, because the main quickSort loop only compares an index with the in setPivot dereferenced BytesRefs. Before it did this on every compare step!

          Robert: Can you supply your benchmark? Or test again

          Show
          Uwe Schindler added a comment - Robert tested yesterday and found out that SorterTemplate.quickSort is not as efficient as it could be. The general problem is: Quicksort needs the value of the pivot/partition element and the main sorting step compares this single value quite often For our in-place algorithm that only used swap(i,j) and compare(i,j), the main loop's swap statements needed an extra check that not the pivot index is swapped and so the pivot changes suddenly. Because of this when the index of the pivot is swapped, the pivot index value needed to be updated. I changed SorterTemplate to look more like FieldComparator known from search. It now has not only swap(index1,index2) and compare(index1,index2), it also gets setPivot(index) [stores index' value as pivot] and comparePivot(index) [compares given index' value with previously stored pivot value] . Now the quicksort algorithm is identical to the one seen everywhere in Lucene before. We can now also implement the optimized one from harmony also seen in Solr's PrimUtil. I will look into this, if it makes sense (it makes not always sense as comparing and swapping is more intensive for non-native values!). This has also some improvements to BytesRefHash, as there are less de-references of BytesRefs, because the main quickSort loop only compares an index with the in setPivot dereferenced BytesRefs. Before it did this on every compare step! Robert: Can you supply your benchmark? Or test again
          Hide
          Uwe Schindler added a comment -

          After spending the evening with performance tests on BytesRefHash and Fuzzy automatons I cam to the following conclusion, finalized in this hopefully last patch:

          • Automatons use very short, mostly presorted arrays. Quicksort is ineffective for them. Initial tests showed that even insertionSort is enough for most of the Transition arrays. As some automatons also contain very large Transition arrays, it showed, that then insertionSAort gets very slow. Quicksort gets better, but as the array is already sorted, mergesort beats them all. SorterTemplate.mergeSort contains a limit, so when array size is < 12 entries, it uses insertion sort for the sorting (also in later merge steps if the partitioned array gets < 12 entries).
            schindlerMinimize and mccandlessDeterminize are now using mergesort.
          • BytesRefHash gets about 10% speed improvement by the recent extension to SorterTemplate with setPivot/comparePivot abstract methods. This beats the old algorithm which is currently in trunk, as for the quicksort algorithm used, the swapping of entries in the mail loop always compares to the pivot value. If BytesRefHash needs to resolve this values every time, it gets slow. The new patch improves a modified TestBytesRefHash.testSort for perf testing by 11% (runs with -Xbatch -server -Xmx1024m, Java 1.5 on my computer in 12.5 secs on trunk, 11.1 secs with this patch):
          public void testSortPerf() {
            long start = System.currentTimeMillis();
            BytesRef ref = new BytesRef();
            for (int j = 0; j < 200 * RANDOM_MULTIPLIER; j++) {
              for (int i = 0; i < 1797; i++) {
                String str;
                do {
                  str = _TestUtil.randomRealisticUnicodeString(random, 1000);
                } while (str.length() == 0);
                ref.copy(str);
                hash.add(ref);
              }
              hash.sort(BytesRef.getUTF8SortedAsUTF16Comparator());
              hash.clear();
              hash.reinit();
            }
            System.out.println("time: "+(System.currentTimeMillis()-start));
          }
          

          I will commit this patch, which now also makes insertionSort public in SorterTemplate, ArrayUtil and CollectionUtil tomorrow. I tend to also commit this to 3.x (merged to BytesRefHash-similar class from 3.x). This is why the CHANGES.txt removes the SorterTemplate removal message (may need to be modified, because SorterTemplate changed API). If we will only commit to trunk, CHANGES would keep unchanged.

          Show
          Uwe Schindler added a comment - After spending the evening with performance tests on BytesRefHash and Fuzzy automatons I cam to the following conclusion, finalized in this hopefully last patch: Automatons use very short, mostly presorted arrays. Quicksort is ineffective for them. Initial tests showed that even insertionSort is enough for most of the Transition arrays. As some automatons also contain very large Transition arrays, it showed, that then insertionSAort gets very slow. Quicksort gets better, but as the array is already sorted, mergesort beats them all. SorterTemplate.mergeSort contains a limit, so when array size is < 12 entries, it uses insertion sort for the sorting (also in later merge steps if the partitioned array gets < 12 entries). schindlerMinimize and mccandlessDeterminize are now using mergesort. BytesRefHash gets about 10% speed improvement by the recent extension to SorterTemplate with setPivot/comparePivot abstract methods. This beats the old algorithm which is currently in trunk, as for the quicksort algorithm used, the swapping of entries in the mail loop always compares to the pivot value. If BytesRefHash needs to resolve this values every time, it gets slow. The new patch improves a modified TestBytesRefHash.testSort for perf testing by 11% (runs with -Xbatch -server -Xmx1024m, Java 1.5 on my computer in 12.5 secs on trunk, 11.1 secs with this patch): public void testSortPerf() { long start = System .currentTimeMillis(); BytesRef ref = new BytesRef(); for ( int j = 0; j < 200 * RANDOM_MULTIPLIER; j++) { for ( int i = 0; i < 1797; i++) { String str; do { str = _TestUtil.randomRealisticUnicodeString(random, 1000); } while (str.length() == 0); ref.copy(str); hash.add(ref); } hash.sort(BytesRef.getUTF8SortedAsUTF16Comparator()); hash.clear(); hash.reinit(); } System .out.println( "time: " +( System .currentTimeMillis()-start)); } I will commit this patch, which now also makes insertionSort public in SorterTemplate, ArrayUtil and CollectionUtil tomorrow. I tend to also commit this to 3.x (merged to BytesRefHash-similar class from 3.x). This is why the CHANGES.txt removes the SorterTemplate removal message (may need to be modified, because SorterTemplate changed API). If we will only commit to trunk, CHANGES would keep unchanged.
          Hide
          Uwe Schindler added a comment - - edited

          After Robert mentioned the strange comparator in the above benchmark:
          It is just a leftover from the original testSort() test which needed that special order, because it compared the sorted BytesRefHash using a TreeSet of UTF16 strings.
          For the benchmark the comparator has no real effect.

          Show
          Uwe Schindler added a comment - - edited After Robert mentioned the strange comparator in the above benchmark: It is just a leftover from the original testSort() test which needed that special order, because it compared the sorted BytesRefHash using a TreeSet of UTF16 strings. For the benchmark the comparator has no real effect.
          Hide
          Uwe Schindler added a comment -

          Final patch, will get committed... now. It adds some contrib changes and changes.txt/notice.txt and javadocs.

          Show
          Uwe Schindler added a comment - Final patch, will get committed... now. It adds some contrib changes and changes.txt/notice.txt and javadocs.
          Hide
          Uwe Schindler added a comment -

          Committed trunk revision: 1027998

          Now working on 3.x

          Show
          Uwe Schindler added a comment - Committed trunk revision: 1027998 Now working on 3.x
          Hide
          Uwe Schindler added a comment -

          filterdiff'ed patch for 3.x branch - we need that for commit mails, too. The changes in BytesRefHash are merged over to TermsHashPerField. This patch also removes useless synchronization!

          After this also 3.x gets the imporved terms sorting and reduced code duplication.

          I will commit soon.

          Show
          Uwe Schindler added a comment - filterdiff'ed patch for 3.x branch - we need that for commit mails, too. The changes in BytesRefHash are merged over to TermsHashPerField. This patch also removes useless synchronization! After this also 3.x gets the imporved terms sorting and reduced code duplication. I will commit soon.
          Hide
          Uwe Schindler added a comment -

          Committed 3.x branch revision: 1028042

          Thanks to all for performance testing!

          Show
          Uwe Schindler added a comment - Committed 3.x branch revision: 1028042 Thanks to all for performance testing!
          Hide
          Grant Ingersoll added a comment -

          Bulk close for 3.1

          Show
          Grant Ingersoll added a comment - Bulk close for 3.1

            People

            • Assignee:
              Uwe Schindler
              Reporter:
              Uwe Schindler
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development