Details

    • Type: Sub-task Sub-task
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 4.9, 5.0
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Spin-off from parent issue:

      We should discuss about how many threads should be spawned. If you have an index with many segments, even small ones, I think only the larger segments should be separate threads, all others should be handled sequentially. So maybe add a maxThreads cound, then sort the IndexReaders by maxDoc and then only spawn maxThreads-1 threads for the bigger readers and then one additional thread for the rest?

        Activity

        Hide
        Earwin Burrfoot added a comment -

        I use the following scheme:

        • There is a fixed pool of threads shared by all searches, that limits total concurrency.
        • Each new search apprehends at most a fixed number of threads from this pool (say, 2-3 of 8 in my setup),
        • and these threads churn through segments as through a queue (in maxDoc order, but I think even that is unnecessary).

        No special smart binding between threads and segments (eg. 1 thread for each biggie, 1 thread for all of the small ones) -
        means simpler code, and zero possibility of stalling, when there are threads to run, segments to search, but binding policy does not connect them.
        Using fewer threads per-search than total available is a precaution against biggie searches blocking fast ones.

        Show
        Earwin Burrfoot added a comment - I use the following scheme: There is a fixed pool of threads shared by all searches, that limits total concurrency. Each new search apprehends at most a fixed number of threads from this pool (say, 2-3 of 8 in my setup), and these threads churn through segments as through a queue (in maxDoc order, but I think even that is unnecessary). No special smart binding between threads and segments (eg. 1 thread for each biggie, 1 thread for all of the small ones) - means simpler code, and zero possibility of stalling, when there are threads to run, segments to search, but binding policy does not connect them. Using fewer threads per-search than total available is a precaution against biggie searches blocking fast ones.
        Hide
        Michael McCandless added a comment -

        Using fewer threads per-search than total available is a precaution against biggie searches blocking fast ones.

        But doesn't that mean that an app w/ rare queries but each query is massive fails to use all available concurrency?

        Show
        Michael McCandless added a comment - Using fewer threads per-search than total available is a precaution against biggie searches blocking fast ones. But doesn't that mean that an app w/ rare queries but each query is massive fails to use all available concurrency?
        Hide
        Earwin Burrfoot added a comment -

        But doesn't that mean that an app w/ rare queries but each query is massive fails to use all available concurrency?

        Yes. But that's not my case. And likely not someone else's.

        I think if you want to be super-generic, it's better to defer exact threading to the user, instead of doing a one-size-fits-all solution. Else you risk conjuring another ConcurrentMergeScheduler.
        While we're at it, we can throw in some sample implementation, which can satisfy some of the users, but not everyone.

        Show
        Earwin Burrfoot added a comment - But doesn't that mean that an app w/ rare queries but each query is massive fails to use all available concurrency? Yes. But that's not my case. And likely not someone else's. I think if you want to be super-generic, it's better to defer exact threading to the user, instead of doing a one-size-fits-all solution. Else you risk conjuring another ConcurrentMergeScheduler. While we're at it, we can throw in some sample implementation, which can satisfy some of the users, but not everyone.
        Hide
        Doron Cohen added a comment -

        Is it a possible that with this, searching a large optimized index (single segment) might be slower than searching an un-optimzed index of the same size, since the latter enjoys concurrency? If so, is it too wild for more than one thread to handle that single segment?

        Show
        Doron Cohen added a comment - Is it a possible that with this, searching a large optimized index (single segment) might be slower than searching an un-optimzed index of the same size, since the latter enjoys concurrency? If so, is it too wild for more than one thread to handle that single segment?
        Hide
        Michael McCandless added a comment -

        I think if you want to be super-generic, it's better to defer exact threading to the user, instead of doing a one-size-fits-all solution. Else you risk conjuring another ConcurrentMergeScheduler.

        I think something like CMS (basically a custom ES w/ proper thread prio/scheduling) will be necessary here.

        Until Java can schedule threads the way an OS schedules processes we'll need to emulate it ourselves.

        You want long running queries (or, merges) to be gracefully down prioritized so that new/fast queries (merges) finish quickly.

        And you want searches (merges) to use the allowed concurrency fully.

        Show
        Michael McCandless added a comment - I think if you want to be super-generic, it's better to defer exact threading to the user, instead of doing a one-size-fits-all solution. Else you risk conjuring another ConcurrentMergeScheduler. I think something like CMS (basically a custom ES w/ proper thread prio/scheduling) will be necessary here. Until Java can schedule threads the way an OS schedules processes we'll need to emulate it ourselves. You want long running queries (or, merges) to be gracefully down prioritized so that new/fast queries (merges) finish quickly. And you want searches (merges) to use the allowed concurrency fully.
        Hide
        Earwin Burrfoot added a comment -

        A lot of fork-join type frameworks don't even care. Even though scheduling threads is something people supposedly use them for.
        Why? I guess that's due to low yield/cost ratio.
        You frequently quote "progress, not perfection" in relation to the code, but why don't we apply this same principle to our threading guarantees?
        I don't want to use allowed concurrency fully. That's not realistic. I want 85% of it. That's already a huge leap ahead of single-threaded searches.

        Show
        Earwin Burrfoot added a comment - A lot of fork-join type frameworks don't even care. Even though scheduling threads is something people supposedly use them for. Why? I guess that's due to low yield/cost ratio. You frequently quote "progress, not perfection" in relation to the code, but why don't we apply this same principle to our threading guarantees? I don't want to use allowed concurrency fully. That's not realistic. I want 85% of it. That's already a huge leap ahead of single-threaded searches.
        Hide
        Michael McCandless added a comment -

        You frequently quote "progress, not perfection" in relation to the code, but why don't we apply this same principle to our threading guarantees?

        Oh we should definitely apply progress not perfection here – in fact we already are: for starters (today), we bind concurrency to segments (so eg an "optimized" index has no concurrency), and we just use an ES (punt this thread scheduling problem to the caller). This is better than nothing, but not good enough – we can do better.

        There's another quote that applies here: "big dreams, small steps". My comment above is "dreaming" but when it comes time to actually get the real work done / making progress towards that dream, of course we take baby steps / progress not perfection.

        Design discussions should start w/ the big dreams but then once you've got a rough sense of where you want to get to in the future you shift back to the baby steps you do today, in the direction of that future goal.

        Maybe I should wrap my comments in </dream> tags and </babysteps> tags!

        Show
        Michael McCandless added a comment - You frequently quote "progress, not perfection" in relation to the code, but why don't we apply this same principle to our threading guarantees? Oh we should definitely apply progress not perfection here – in fact we already are: for starters (today), we bind concurrency to segments (so eg an "optimized" index has no concurrency), and we just use an ES (punt this thread scheduling problem to the caller). This is better than nothing, but not good enough – we can do better. There's another quote that applies here: "big dreams, small steps". My comment above is "dreaming" but when it comes time to actually get the real work done / making progress towards that dream, of course we take baby steps / progress not perfection. Design discussions should start w/ the big dreams but then once you've got a rough sense of where you want to get to in the future you shift back to the baby steps you do today, in the direction of that future goal. Maybe I should wrap my comments in </dream> tags and </babysteps> tags!
        Hide
        Steve Rowe added a comment -

        Bulk move 4.4 issues to 4.5 and 5.0

        Show
        Steve Rowe added a comment - Bulk move 4.4 issues to 4.5 and 5.0
        Hide
        Uwe Schindler added a comment -

        Move issue to Lucene 4.9.

        Show
        Uwe Schindler added a comment - Move issue to Lucene 4.9.

          People

          • Assignee:
            Unassigned
            Reporter:
            Uwe Schindler
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:

              Development