Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.0, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      This has a number of performance problems today:

      1. suboptimal stored fields merging. This is especially the case with high compression. Today this is 7x-64x times slower than it should be.
      2. ram stacking: for any docvalues and norms fields, all instances will be loaded in RAM. for any string docvalues fields, all instances of global ordinals will be built, and none of this released until the whole merge is complete.

      We can fix these two problems without completely refactoring LeafReader... we won't get a "bulk byte merge", checksum computation will still be suboptimal, and its not a general solution to "merging with filterreaders" but that stuff can be for another day.

      1. LUCENE-6131.patch
        228 kB
        Robert Muir

        Activity

        Hide
        Robert Muir added a comment -

        patch. sorry for the file movements, it makes it huge. The only actual code changes to SortingMergePolicy are in one method, which now looks like this:

        public List<LeafReader> getMergeReaders() throws IOException {
              if (unsortedReaders == null) {
                unsortedReaders = super.getMergeReaders();
                // wrap readers, to be optimal for merge;
                List<LeafReader> wrapped = new ArrayList<>(unsortedReaders.size());
                for (LeafReader leaf : unsortedReaders) {
                  if (leaf instanceof SegmentReader) {
                    leaf = new MergeReaderWrapper((SegmentReader)leaf);
                  }
                  wrapped.add(leaf);
                }
                final LeafReader atomicView;
                if (wrapped.size() == 1) {
                  atomicView = wrapped.get(0);
                } else {
                  final CompositeReader multiReader = new MultiReader(wrapped.toArray(new LeafReader[wrapped.size()]));
                  atomicView = new SlowCompositeReaderWrapper(multiReader, true);
                }
                docMap = sorter.sort(atomicView);
                sortedView = SortingLeafReader.wrap(atomicView, docMap);
              }
              // a null doc map means that the readers are already sorted
              return docMap == null ? unsortedReaders : Collections.singletonList(sortedView);
            }
        
        Show
        Robert Muir added a comment - patch. sorry for the file movements, it makes it huge. The only actual code changes to SortingMergePolicy are in one method, which now looks like this: public List<LeafReader> getMergeReaders() throws IOException { if (unsortedReaders == null ) { unsortedReaders = super .getMergeReaders(); // wrap readers, to be optimal for merge; List<LeafReader> wrapped = new ArrayList<>(unsortedReaders.size()); for (LeafReader leaf : unsortedReaders) { if (leaf instanceof SegmentReader) { leaf = new MergeReaderWrapper((SegmentReader)leaf); } wrapped.add(leaf); } final LeafReader atomicView; if (wrapped.size() == 1) { atomicView = wrapped.get(0); } else { final CompositeReader multiReader = new MultiReader(wrapped.toArray( new LeafReader[wrapped.size()])); atomicView = new SlowCompositeReaderWrapper(multiReader, true ); } docMap = sorter.sort(atomicView); sortedView = SortingLeafReader.wrap(atomicView, docMap); } // a null doc map means that the readers are already sorted return docMap == null ? unsortedReaders : Collections.singletonList(sortedView); }
        Hide
        Adrien Grand added a comment -

        +1 to adding this new boolean merging to SlowCompositeReaderWrapper
        +0 to the MergeReaderWrapper approach. I think the hack is ok until we can figure a way through LUCENE-6065?

        Show
        Adrien Grand added a comment - +1 to adding this new boolean merging to SlowCompositeReaderWrapper +0 to the MergeReaderWrapper approach. I think the hack is ok until we can figure a way through LUCENE-6065 ?
        Hide
        Robert Muir added a comment -

        Exactly, i dont want to try to rush that issue or make LeafReader apis too complicated or anything.

        But at the same time we should have reasonable performance for SortingMP for 5.0, and not leave the trap where it is terribly slow.

        Show
        Robert Muir added a comment - Exactly, i dont want to try to rush that issue or make LeafReader apis too complicated or anything. But at the same time we should have reasonable performance for SortingMP for 5.0, and not leave the trap where it is terribly slow.
        Hide
        Robert Muir added a comment -

        I ran quick benchmarks, indexing 1M docs log data and sorting by timestamp. I used 10k doc segments/logdocMP/serial MS. all fields were indexed and stored, and I enabled DV on timestamp:

        compression no sorting sort (trunk) sort (patch)
        BEST_SPEED 37,552ms 56,095ms 46,309ms
        BEST_COMPRESSION 39,132ms 206,068ms 47,395ms

        So I think it solves the worst of the worst and we can move forward from here? Another thing that seems not to work is the "already sorted" optimization. For this test it should be kicking in? We should look at that in another issue.

        Show
        Robert Muir added a comment - I ran quick benchmarks, indexing 1M docs log data and sorting by timestamp. I used 10k doc segments/logdocMP/serial MS. all fields were indexed and stored, and I enabled DV on timestamp: compression no sorting sort (trunk) sort (patch) BEST_SPEED 37,552ms 56,095ms 46,309ms BEST_COMPRESSION 39,132ms 206,068ms 47,395ms So I think it solves the worst of the worst and we can move forward from here? Another thing that seems not to work is the "already sorted" optimization. For this test it should be kicking in? We should look at that in another issue.
        Hide
        ASF subversion and git services added a comment -

        Commit 1647587 from Robert Muir in branch 'dev/trunk'
        [ https://svn.apache.org/r1647587 ]

        LUCENE-6131: optimize SortingMergePolicy

        Show
        ASF subversion and git services added a comment - Commit 1647587 from Robert Muir in branch 'dev/trunk' [ https://svn.apache.org/r1647587 ] LUCENE-6131 : optimize SortingMergePolicy
        Hide
        ASF subversion and git services added a comment -

        Commit 1647588 from Robert Muir in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1647588 ]

        LUCENE-6131: optimize SortingMergePolicy

        Show
        ASF subversion and git services added a comment - Commit 1647588 from Robert Muir in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1647588 ] LUCENE-6131 : optimize SortingMergePolicy
        Hide
        Anshum Gupta added a comment -

        Bulk close after 5.0 release.

        Show
        Anshum Gupta added a comment - Bulk close after 5.0 release.

          People

          • Assignee:
            Unassigned
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development