Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7101

OfflineSorter's merging is O(N^2) cost for large sorts

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.0, 6.1, 7.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Our OfflineSorter acts just like Lucene, writing small initial
      segments of sorted values (from what it was able to sort at once in
      heap), periodically merging them when there are too many, and doing a
      forceMerge(1) in the end.

      But the merge logic is too simplistic today, resulting in O(N^2)
      cost. Smallish sorts usually won't hit it, because the default 128
      merge factor is so high, but e.g. the new 2B points tests do hit the
      N^2 behavior. I suspect the high merge factor hurts performance (OS
      struggles to use what free RAM it has to read-ahead on 128 files), and
      also risks file descriptor exhaustion.

      I think we should implement a simple log merge policy for it, and drop
      its default merge factor to 10.

        Attachments

        1. LUCENE-7101.patch
          10 kB
          Michael McCandless

          Activity

            People

            • Assignee:
              mikemccand Michael McCandless
              Reporter:
              mikemccand Michael McCandless
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: