Lucene - Core
  1. Lucene - Core
  2. LUCENE-2837

Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      We've discussed cleaning up our *Searcher stack for some time... I
      think we should try to do this before releasing 4.0.

      So I'm attaching an initial patch which:

      • Removes Searcher, Searchable, absorbing all their methods into IndexSearcher
      • Removes contrib/remote
      • Removes MultiSearcher
      • Absorbs ParallelMultiSearcher into IndexSearcher (ie you can now
        pass useThreads=true, or a custom ES to the ctor)

      The patch is rough – I just ripped stuff out, did search/replace to
      IndexSearcher, etc. EG nothing is directly testing using threads with
      IndexSearcher, but before committing I think we should add a
      newSearcher to LuceneTestCase, which randomly chooses whether the
      searcher uses threads, and cutover tests to use this instead of making
      their own IndexSearcher.

      I think MultiSearcher has a useful purpose, but as it is today it's
      too low-level, eg it shouldn't be involved in rewriting queries: the
      Query.combine method is scary. Maybe in its place we make a higher
      level class, with limited API, that's able to federate search across
      multiple IndexSearchers? It'd also be able to optionally use thread
      per IndexSearcher.

      1. LUCENE-2837.patch
        2 kB
        Yonik Seeley
      2. LUCENE-2837.patch
        264 kB
        Michael McCandless
      3. LUCENE-2837.patch
        256 kB
        Michael McCandless

        Issue Links

          Activity

          Hide
          Robert Muir added a comment -

          but before committing I think we should add a
          newSearcher to LuceneTestCase, which randomly chooses whether the
          searcher uses threads, and cutover tests to use this instead of making
          their own IndexSearcher.

          I did this on LUCENE-2751, but the tests won't all pass until we fix the FieldCache autodetect
          synchronization bug (the Numerics tests will fail with multiple threads)...

          Show
          Robert Muir added a comment - but before committing I think we should add a newSearcher to LuceneTestCase, which randomly chooses whether the searcher uses threads, and cutover tests to use this instead of making their own IndexSearcher. I did this on LUCENE-2751 , but the tests won't all pass until we fix the FieldCache autodetect synchronization bug (the Numerics tests will fail with multiple threads)...
          Hide
          Uwe Schindler added a comment -

          I like that! +1

          We should discuss about how many threads should be spawned. If you have an index with many segments, even small ones, I think only the larger segments should be separate threads, all others should be handled sequentially. So maybe add a maxThreads cound, then sort the IndexReaders by maxDoc and then only spawn maxThreads-1 threads for the bigger readers and then one additional thread for the rest?

          Show
          Uwe Schindler added a comment - I like that! +1 We should discuss about how many threads should be spawned. If you have an index with many segments, even small ones, I think only the larger segments should be separate threads, all others should be handled sequentially. So maybe add a maxThreads cound, then sort the IndexReaders by maxDoc and then only spawn maxThreads-1 threads for the bigger readers and then one additional thread for the rest?
          Hide
          Uwe Schindler added a comment -

          I think MultiSearcher has a useful purpose, but as it is today it's
          too low-level, eg it shouldn't be involved in rewriting queries: the
          Query.combine method is scary. Maybe in its place we make a higher
          level class, with limited API, that's able to federate search across
          multiple IndexSearchers? It'd also be able to optionally use thread
          per IndexSearcher.

          Query.combine is simply broken, this is another issue. It violates DeMorgans law...: LUCENE-2756

          Show
          Uwe Schindler added a comment - I think MultiSearcher has a useful purpose, but as it is today it's too low-level, eg it shouldn't be involved in rewriting queries: the Query.combine method is scary. Maybe in its place we make a higher level class, with limited API, that's able to federate search across multiple IndexSearchers? It'd also be able to optionally use thread per IndexSearcher. Query.combine is simply broken, this is another issue. It violates DeMorgans law...: LUCENE-2756
          Hide
          Robert Muir added a comment -

          Query.combine is simply broken, this is another issue.

          I agree, but with this issue we don't need Query.combine anymore, so its then fixed.
          This method only exists for MultiSearcher (and there is some other dead code in
          Query.java related to it, that we could even delete now, totally unused today!)

          Show
          Robert Muir added a comment - Query.combine is simply broken, this is another issue. I agree, but with this issue we don't need Query.combine anymore, so its then fixed. This method only exists for MultiSearcher (and there is some other dead code in Query.java related to it, that we could even delete now, totally unused today!)
          Hide
          Simon Willnauer added a comment -

          Mike, I like it! This issue together with LUCENE-2831 will make the API so much better and powerful IMO. IndexSearcher is the right class to be passed to Query#weight / Query#createWeight and Weight ctors should take it too. That would make all the instanceof checkes unnecessary and would always have access to the ReaderContext, yet I think we should like those issues as related though.

          Show
          Simon Willnauer added a comment - Mike, I like it! This issue together with LUCENE-2831 will make the API so much better and powerful IMO. IndexSearcher is the right class to be passed to Query#weight / Query#createWeight and Weight ctors should take it too. That would make all the instanceof checkes unnecessary and would always have access to the ReaderContext, yet I think we should like those issues as related though.
          Hide
          Michael McCandless added a comment -

          but before committing I think we should add a newSearcher to LuceneTestCase, which randomly chooses whether the searcher uses threads, and cutover tests to use this instead of making their own IndexSearcher.

          I did this on LUCENE-2751, but the tests won't all pass until we fix the FieldCache autodetect
          synchronization bug (the Numerics tests will fail with multiple threads)...

          Duh, I knew newSearcher() sounded familiar OK so we have to fix the multi-threaded bug in FC first and then I think commit the newSearcher cutover from LUCENE-2751, then commit this issue.

          Then, I think, separately create a new "higher level" MultiSearcher w/ a limited search API. I'll open a new issue for that.

          Show
          Michael McCandless added a comment - but before committing I think we should add a newSearcher to LuceneTestCase, which randomly chooses whether the searcher uses threads, and cutover tests to use this instead of making their own IndexSearcher. I did this on LUCENE-2751 , but the tests won't all pass until we fix the FieldCache autodetect synchronization bug (the Numerics tests will fail with multiple threads)... Duh, I knew newSearcher() sounded familiar OK so we have to fix the multi-threaded bug in FC first and then I think commit the newSearcher cutover from LUCENE-2751 , then commit this issue. Then, I think, separately create a new "higher level" MultiSearcher w/ a limited search API. I'll open a new issue for that.
          Hide
          Michael McCandless added a comment -

          We should discuss about how many threads should be spawned. If you have an index with many segments, even small ones, I think only the larger segments should be separate threads, all others should be handled sequentially. So maybe add a maxThreads cound, then sort the IndexReaders by maxDoc and then only spawn maxThreads-1 threads for the bigger readers and then one additional thread for the rest?

          That sounds like a great improvement – Uwe can you open a new issue for that? Let's try to leave this issue as a rote refactoring...

          Show
          Michael McCandless added a comment - We should discuss about how many threads should be spawned. If you have an index with many segments, even small ones, I think only the larger segments should be separate threads, all others should be handled sequentially. So maybe add a maxThreads cound, then sort the IndexReaders by maxDoc and then only spawn maxThreads-1 threads for the bigger readers and then one additional thread for the rest? That sounds like a great improvement – Uwe can you open a new issue for that? Let's try to leave this issue as a rote refactoring...
          Hide
          Uwe Schindler added a comment -

          That sounds like a great improvement - Uwe can you open a new issue for that? Let's try to leave this issue as a rote refactoring...

          Done: LUCENE-2840

          Show
          Uwe Schindler added a comment - That sounds like a great improvement - Uwe can you open a new issue for that? Let's try to leave this issue as a rote refactoring... Done: LUCENE-2840
          Hide
          Robert Muir added a comment -

          i noticed the comment about the shutting down of executorservice... can we just make the executorservice arg mandatory for parallel?

          in my opinion, whoever creates it should be responsible for shutting it down, no one else.

          so i don't like the dual mode where we sometimes make our own but you can set a different one.
          we don't clean up correctly at all wrt this in ParallelMultiShredder today in my opinion.

          Show
          Robert Muir added a comment - i noticed the comment about the shutting down of executorservice... can we just make the executorservice arg mandatory for parallel? in my opinion, whoever creates it should be responsible for shutting it down, no one else. so i don't like the dual mode where we sometimes make our own but you can set a different one. we don't clean up correctly at all wrt this in ParallelMultiShredder today in my opinion.
          Hide
          Robert Muir added a comment -

          OK so we have to fix the multi-threaded bug in FC first and then I think commit the newSearcher cutover from LUCENE-2751, then commit this issue.

          Well, you don't have to do all of that (you could commit this one, then chase down all the bugs). I was just warning you
          so you don't get surprised.

          Show
          Robert Muir added a comment - OK so we have to fix the multi-threaded bug in FC first and then I think commit the newSearcher cutover from LUCENE-2751 , then commit this issue. Well, you don't have to do all of that (you could commit this one, then chase down all the bugs). I was just warning you so you don't get surprised.
          Hide
          Robert Muir added a comment -

          Mike, also if you apply LUCENE-2751, tests randomly fails because of the LUCENE-2756 bug.

          For example TestBoolean2.testRandomQueries will fail because sometimes it uses a wildcard query,
          and if it then incorporates MUST_NOT, this will fail against the multisearcher/parallelmultisearcher
          because the combine() is wrong.

          So I'm thinking we should add the newSearcher tests after you committed this one
          (as long as this one has some reasonable standalone tests to show it works)

          Show
          Robert Muir added a comment - Mike, also if you apply LUCENE-2751 , tests randomly fails because of the LUCENE-2756 bug. For example TestBoolean2.testRandomQueries will fail because sometimes it uses a wildcard query, and if it then incorporates MUST_NOT, this will fail against the multisearcher/parallelmultisearcher because the combine() is wrong. So I'm thinking we should add the newSearcher tests after you committed this one (as long as this one has some reasonable standalone tests to show it works)
          Hide
          Michael McCandless added a comment -

          can we just make the executorservice arg mandatory for parallel?

          The thing is, I'd like for "use threads" to be easy for the app/user.

          But: we aren't near a good solution here (the threads are tied to segments, which I don't like).

          So I agree, for now, how about I remove the IS ctor that takes boolean useThreads, leaving only the one that takes an ES? And in the jdoc I can say "for example here's how to get an ES, but, be sure you properly shut it down when done, and then close the IS/IR"?

          Ideally, in the future, we make it very easy to use concurrency within a single search...

          Show
          Michael McCandless added a comment - can we just make the executorservice arg mandatory for parallel? The thing is, I'd like for "use threads" to be easy for the app/user. But: we aren't near a good solution here (the threads are tied to segments, which I don't like). So I agree, for now, how about I remove the IS ctor that takes boolean useThreads, leaving only the one that takes an ES? And in the jdoc I can say "for example here's how to get an ES, but, be sure you properly shut it down when done, and then close the IS/IR"? Ideally, in the future, we make it very easy to use concurrency within a single search...
          Hide
          Michael McCandless added a comment -

          Another iteration. I think it's ready to commit!

          I removed the IS ctors that took boolean useThreads – you must make your own ES instead.

          I fixed TestNRTThreads to pass an ES (and at least TestSort does as well), so we have some minimal testing until we commit newSearcher.

          Show
          Michael McCandless added a comment - Another iteration. I think it's ready to commit! I removed the IS ctors that took boolean useThreads – you must make your own ES instead. I fixed TestNRTThreads to pass an ES (and at least TestSort does as well), so we have some minimal testing until we commit newSearcher.
          Hide
          Simon Willnauer added a comment -

          Mike that patch looks good to me. I just have one small comment about the executor service. You stated that the user has to shutdown the service upon IS#close(). This is absolutely the recommended way but I see a little risk for people calling ExecutorService#shutdownNow() which interrupts the executing threads and can cause a AlreadyClosedException down in one of our NIO Directories if there are still searches going on etc. I don't think this is super important but I would point it out in the JDoc to give folks a heads-up. Thoughts?

          simon

          Show
          Simon Willnauer added a comment - Mike that patch looks good to me. I just have one small comment about the executor service. You stated that the user has to shutdown the service upon IS#close(). This is absolutely the recommended way but I see a little risk for people calling ExecutorService#shutdownNow() which interrupts the executing threads and can cause a AlreadyClosedException down in one of our NIO Directories if there are still searches going on etc. I don't think this is super important but I would point it out in the JDoc to give folks a heads-up. Thoughts? simon
          Hide
          Michael McCandless added a comment -

          Good point Simon – I'll add that warning to the jdoc.

          Show
          Michael McCandless added a comment - Good point Simon – I'll add that warning to the jdoc.
          Hide
          Doron Cohen added a comment -

          This broke ant target 'eclipse' - just fixed it (remove the 'remote' dir).
          Probably the same is needed also for "Idea" but I'm not sure how to do this.

          Show
          Doron Cohen added a comment - This broke ant target 'eclipse' - just fixed it (remove the 'remote' dir). Probably the same is needed also for "Idea" but I'm not sure how to do this.
          Hide
          Robert Muir added a comment -

          Hello, I think we should revisit branch_3x here. I'm not asking for a backport but i think we should do some targeted javadoc+deprecations:

          • I think we should deprecate Searcher, suggesting to use IndexSearcher instead. Searcher->IndexSearcher is probably the only way
            this change will affect 99% of users and so this fixes that case, as most users make a simple change to their code.
          • We could add some wordage to MultiSearcher, such as 'if you are making a MS of IS you might want to consider MR instead'. This would
            be nice since we are still going to have the lurking combine() bug, at least people then know that MR is recommended.
          • i think it would be nice to add something to contrib/remote so users expect to change their code? Personally I think we should deprecate.
            Deprecate doesn't mean there has to be a 1-1 replacement... sometimes we re-think and it wasnt the best idea all along. But deprecation
            helps alert anyone using it so they wont be surprised with 4.0
          Show
          Robert Muir added a comment - Hello, I think we should revisit branch_3x here. I'm not asking for a backport but i think we should do some targeted javadoc+deprecations: I think we should deprecate Searcher, suggesting to use IndexSearcher instead. Searcher->IndexSearcher is probably the only way this change will affect 99% of users and so this fixes that case, as most users make a simple change to their code. We could add some wordage to MultiSearcher, such as 'if you are making a MS of IS you might want to consider MR instead'. This would be nice since we are still going to have the lurking combine() bug, at least people then know that MR is recommended. i think it would be nice to add something to contrib/remote so users expect to change their code? Personally I think we should deprecate. Deprecate doesn't mean there has to be a 1-1 replacement... sometimes we re-think and it wasnt the best idea all along. But deprecation helps alert anyone using it so they wont be surprised with 4.0
          Hide
          Steve Rowe added a comment -

          This broke ant target 'eclipse' - just fixed it (remove the 'remote' dir).
          Probably the same is needed also for "Idea" but I'm not sure how to do this.

          Done: r1055474

          Show
          Steve Rowe added a comment - This broke ant target 'eclipse' - just fixed it (remove the 'remote' dir). Probably the same is needed also for "Idea" but I'm not sure how to do this. Done: r1055474
          Hide
          Michael McCandless added a comment -

          I agree we should fix 3.x, too.

          Also I didn't fix all the jdocs that reference Searcher/Searchable!

          I'll reopen...

          Show
          Michael McCandless added a comment - I agree we should fix 3.x, too. Also I didn't fix all the jdocs that reference Searcher/Searchable! I'll reopen...
          Hide
          Michael McCandless added a comment -

          Reopen to also backport merging of PMS into IS in 3.x.

          Show
          Michael McCandless added a comment - Reopen to also backport merging of PMS into IS in 3.x.
          Hide
          Yonik Seeley added a comment -

          The multithreaded stuff feels like it should be in a subclass of IndexSearcher.
          But barring that, perhaps make it so that the subSearcher array is only populated if there is an executor passed in (to try and keep IndexSearcher as light weight as possible)?

          Show
          Yonik Seeley added a comment - The multithreaded stuff feels like it should be in a subclass of IndexSearcher. But barring that, perhaps make it so that the subSearcher array is only populated if there is an executor passed in (to try and keep IndexSearcher as light weight as possible)?
          Hide
          Yonik Seeley added a comment -

          How about this little patch to avoid creation of IndexSearcher per-segment if not needed?

          Show
          Yonik Seeley added a comment - How about this little patch to avoid creation of IndexSearcher per-segment if not needed?
          Hide
          Michael McCandless added a comment -

          How about this little patch to avoid creation of IndexSearcher per-segment if not needed?

          Looks great!

          Show
          Michael McCandless added a comment - How about this little patch to avoid creation of IndexSearcher per-segment if not needed? Looks great!
          Hide
          Michael McCandless added a comment -

          3rd time's a charm?

          Show
          Michael McCandless added a comment - 3rd time's a charm?
          Hide
          Grant Ingersoll added a comment -

          Bulk close for 3.1

          Show
          Grant Ingersoll added a comment - Bulk close for 3.1

            People

            • Assignee:
              Michael McCandless
              Reporter:
              Michael McCandless
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development