Lucene - Core
  1. Lucene - Core
  2. LUCENE-6294

Generalize how IndexSearcher parallelizes collection execution

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Trivial Trivial
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.1, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      IndexSearcher takes an ExecutorService that can be used to parallelize collection execution. This is useful if you want to trade throughput for latency.

      However, this executor service will only be used if you search for top docs. In that case, we will create one collector per slide and call TopDocs.merge in the end. If you use search(Query, Collector), the executor service will never be used.

      But there are other collectors that could work the same way as top docs collectors, eg. TotalHitCountCollector. And maybe also some of our users' collectors. So maybe IndexSearcher could expose a generic way to take advantage of the executor service?

      1. LUCENE-6294.patch
        13 kB
        Adrien Grand

        Issue Links

          Activity

          Hide
          Adrien Grand added a comment -

          Here is a patch that demonstrates the idea. This does not change any API on Collector since not all collectors could work this way, but adds a CollectorManager object which can create collectors and merge them. I cut over top docs collection to this new API and also added IndexSearcher.count to exercise it.

          Show
          Adrien Grand added a comment - Here is a patch that demonstrates the idea. This does not change any API on Collector since not all collectors could work this way, but adds a CollectorManager object which can create collectors and merge them. I cut over top docs collection to this new API and also added IndexSearcher.count to exercise it.
          Hide
          David Smiley added a comment -

          I didn't look it over in great detail but I like it. At first I was hoping that there might be a Collector subclass to declare it's parallel-izability with the reduce method but then realized it wouldn't look good since the factory method to create itself wouldn't feel right.

          Show
          David Smiley added a comment - I didn't look it over in great detail but I like it. At first I was hoping that there might be a Collector subclass to declare it's parallel-izability with the reduce method but then realized it wouldn't look good since the factory method to create itself wouldn't feel right.
          Hide
          Adrien Grand added a comment -

          Thanks for the feedback David!

          Show
          Adrien Grand added a comment - Thanks for the feedback David!
          Hide
          Michael McCandless added a comment -

          +1, I like this approach.

          Show
          Michael McCandless added a comment - +1, I like this approach.
          Hide
          Ryan Ernst added a comment - - edited

          +1

          In the javadocs for IndexSearcher.search I think you mean "In contrast to" instead of "On the contrary to"?

          Show
          Ryan Ernst added a comment - - edited +1 In the javadocs for IndexSearcher.search I think you mean "In contrast to" instead of "On the contrary to"?
          Hide
          ASF subversion and git services added a comment -

          Commit 1662751 from Adrien Grand in branch 'dev/trunk'
          [ https://svn.apache.org/r1662751 ]

          LUCENE-6294: Generalize how IndexSearcher parallelizes collection execution.

          Show
          ASF subversion and git services added a comment - Commit 1662751 from Adrien Grand in branch 'dev/trunk' [ https://svn.apache.org/r1662751 ] LUCENE-6294 : Generalize how IndexSearcher parallelizes collection execution.
          Hide
          Adrien Grand added a comment -

          Thanks David, Mike and Ryan for the reviews!

          Show
          Adrien Grand added a comment - Thanks David, Mike and Ryan for the reviews!
          Hide
          ASF subversion and git services added a comment -

          Commit 1662761 from Adrien Grand in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1662761 ]

          LUCENE-6294: Generalize how IndexSearcher parallelizes collection execution.

          Show
          ASF subversion and git services added a comment - Commit 1662761 from Adrien Grand in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1662761 ] LUCENE-6294 : Generalize how IndexSearcher parallelizes collection execution.
          Hide
          Shikhar Bhushan added a comment -

          This is great. I saw some improvements when testing LUCENE-5299 with the addition of a configurable parallelism throttle at the search request level using a semaphore, that might be useful to have here too. I.e. being able to cap how many segments are concurrently searched. That can help ensure resources for concurrent search requests, or reduce context switching if using an unbounded pool.

          Show
          Shikhar Bhushan added a comment - This is great. I saw some improvements when testing LUCENE-5299 with the addition of a configurable parallelism throttle at the search request level using a semaphore, that might be useful to have here too. I.e. being able to cap how many segments are concurrently searched. That can help ensure resources for concurrent search requests, or reduce context switching if using an unbounded pool.
          Hide
          Adrien Grand added a comment -

          I think a better approach than the semaphore would be to just cap the number of slices of your searcher (see IndexSearcher.slices).

          Show
          Adrien Grand added a comment - I think a better approach than the semaphore would be to just cap the number of slices of your searcher (see IndexSearcher.slices).
          Hide
          Shikhar Bhushan added a comment -

          Makes sense! Seems to be already customizable by overriding that method.

          Show
          Shikhar Bhushan added a comment - Makes sense! Seems to be already customizable by overriding that method.
          Hide
          Shikhar Bhushan added a comment - - edited

          When slicing differently than segment-per-slice, it'd probably be desirable to distribute segments by size across the slices, rather than all large segments ending up in one slice to be searched sequentially.

          Show
          Shikhar Bhushan added a comment - - edited When slicing differently than segment-per-slice, it'd probably be desirable to distribute segments by size across the slices, rather than all large segments ending up in one slice to be searched sequentially.
          Hide
          Timothy Potter added a comment -

          Bulk close after 5.1 release

          Show
          Timothy Potter added a comment - Bulk close after 5.1 release

            People

            • Assignee:
              Adrien Grand
              Reporter:
              Adrien Grand
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development