Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7792

Add optional concurrency to OfflineSorter

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 6.6, 7.0
    • None
    • None
    • New

    Description

      OfflineSorter is a heavy operation and is really an embarrassingly concurrent problem at heart, and if you have enough hardware concurrency (e.g. fast SSDs, multiple CPU cores) it can be a big speedup.

      E.g., after reading a partition from the input, one thread can sort and write it, while another thread reads the next partition, etc. Merging partitions can also be done in the background. Some things still cannot be concurrent, e.g. the initial read from the input must be a single thread, as well as the final merge and writing to the final output.

      I think I found a fairly non-invasive way to add optional concurrency to this class, by adding an optional ExecutorService to OfflineSorter's ctor (similar to IndexSearcher) and using futures to represent each partition as we sort, and creating Callable classes for sorting and merging partitions.

      Attachments

        1. LUCENE-7792.patch
          33 kB
          Michael McCandless
        2. LUCENE-7792.patch
          33 kB
          Michael McCandless
        3. LUCENE-7792.patch
          31 kB
          Michael McCandless

        Issue Links

          Activity

            People

              mikemccand Michael McCandless
              mikemccand Michael McCandless
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: