Lucene - Core
  1. Lucene - Core
  2. LUCENE-1630

Mating Collector and Scorer on doc Id orderness

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      This is a spin off of LUCENE-1593. This issue proposes to expose appropriate API on Scorer and Collector such that one can create an optimized Collector based on a given Scorer's doc-id orderness and vice versa. Copied from LUCENE-1593, here is the list of changes:

      1. Deprecate Weight and create QueryWeight (abstract class) with a new scorer(reader, scoreDocsInOrder), replacing the current scorer(reader) method. QueryWeight implements Weight, while score(reader) calls score(reader, false /* out-of-order */) and scorer(reader, scoreDocsInOrder) is defined abstract.
        • Also add QueryWeightWrapper to wrap a given Weight implementation. This one will also be deprecated, as well as package-private.
        • Add to Query variants of createWeight and weight which return QueryWeight. For now, I prefer to add a default impl which wraps the Weight variant instead of overriding in all Query extensions, and in 3.0 when we remove the Weight variants - override in all extending classes.
      2. Add to Scorer isOutOfOrder with a default to false, and override in BS to true.
      3. Modify BooleanWeight to extend QueryWeight and implement the new scorer method to return BS2 or BS based on the number of required scorers and setAllowOutOfOrder.
      4. Add to Collector an abstract acceptsDocsOutOfOrder which returns true/false.
        • Use it in IndexSearcher.search methods, that accept a Collector, in order to create the appropriate Scorer, using the new QueryWeight.
        • Provide a static create method to TFC and TSDC which accept this as an argument and creates the proper instance.
        • Wherever we create a Collector (TSDC or TFC), always ask for out-of-order Scorer and check on the resulting Scorer isOutOfOrder(), so that we can create the optimized Collector instance.
      5. Modify IndexSearcher to use all of the above logic.

      The only class I'm worried about, and would like to verify with you, is Searchable. If we want to deprecate all the search methods on IndexSearcher, Searcher and Searchable which accept Weight and add new ones which accept QueryWeight, we must do the following:

      • Deprecate Searchable in favor of Searcher.
      • Add to Searcher the new QueryWeight variants. Here we have two choices: (1) break back-compat and add them as abstract (like we've done with the new Collector method) or (2) add them with a default impl to call the Weight versions, documenting these will become abstract in 3.0.
      • Have Searcher extend UnicastRemoteObject and have RemoteSearchable extend Searcher. That's the part I'm a little bit worried about - Searchable implements java.rmi.Remote, which means there could be an implementation out there which implements Searchable and extends something different than UnicastRemoteObject, like Activeable. I think there is very small chance this has actually happened, but would like to confirm with you guys first.
      • Add a deprecated, package-private, SearchableWrapper which extends Searcher and delegates all calls to the Searchable member.
      • Deprecate all uses of Searchable and add Searcher instead, defaulting the old ones to use SearchableWrapper.
      • Make all the necessary changes to IndexSearcher, MultiSearcher etc. regarding overriding these new methods.

      One other optimization that was discussed in LUCENE-1593 is to expose a topScorer() API (on Weight) which returns a Scorer that its score(Collector) will be called, and additionally add a start() method to DISI. That will allow Scorers to initialize either on start() or score(Collector). This was proposed mainly because of BS and BS2 which check if they are initialized in every call to next(), skipTo() and score(). Personally I prefer to see that in a separate issue, following that one (as it might add methods to QueryWeight).

      1. LUCENE-1630-2.patch
        1 kB
        Shai Erera
      2. LUCENE-1630.patch
        118 kB
        Shai Erera
      3. LUCENE-1630.patch
        118 kB
        Shai Erera
      4. LUCENE-1630.patch
        119 kB
        Shai Erera
      5. LUCENE-1630.patch
        144 kB
        Shai Erera
      6. LUCENE-1630.patch
        144 kB
        Shai Erera
      7. LUCENE-1630.patch
        145 kB
        Michael McCandless
      8. LUCENE-1630.patch
        4 kB
        Michael McCandless
      9. LUCENE-1630.patch
        0.8 kB
        Shai Erera

        Issue Links

          Activity

          Hide
          Michael McCandless added a comment -

          Super – I just committed this; thanks Shai.

          Show
          Michael McCandless added a comment - Super – I just committed this; thanks Shai.
          Hide
          Shai Erera added a comment -

          Added testcase to TestBooleanQuery

          Show
          Shai Erera added a comment - Added testcase to TestBooleanQuery
          Hide
          Michael McCandless added a comment -

          Super, I'll commit the current patch. Shai I'll commit the additional test case once you post that...

          Show
          Michael McCandless added a comment - Super, I'll commit the current patch. Shai I'll commit the additional test case once you post that...
          Hide
          Shalin Shekhar Mangar added a comment -

          Can you try out the attached patch? Sorry!

          That fixes the NPEs I was seeing. Thanks!

          Show
          Shalin Shekhar Mangar added a comment - Can you try out the attached patch? Sorry! That fixes the NPEs I was seeing. Thanks!
          Hide
          Shai Erera added a comment -

          can you add that (to the last patch)?

          Sure. I'll try to do it today or tomorrow.

          Show
          Shai Erera added a comment - can you add that (to the last patch)? Sure. I'll try to do it today or tomorrow.
          Hide
          Michael McCandless added a comment -

          Is there a test for BQ returning null scorer when one if its required clauses is required and returns a null scorer itself? If not, I suggest we cover that angle also, can't hurt. What do you think?

          Good idea - not sure if there is an existing test - can you add that (to the last patch)?

          Show
          Michael McCandless added a comment - Is there a test for BQ returning null scorer when one if its required clauses is required and returns a null scorer itself? If not, I suggest we cover that angle also, can't hurt. What do you think? Good idea - not sure if there is an existing test - can you add that (to the last patch)?
          Hide
          Shai Erera added a comment -

          Is there a test for BQ returning null scorer when one if its required clauses is required and returns a null scorer itself? If not, I suggest we cover that angle also, can't hurt. What do you think?

          Show
          Shai Erera added a comment - Is there a test for BQ returning null scorer when one if its required clauses is required and returns a null scorer itself? If not, I suggest we cover that angle also, can't hurt. What do you think?
          Hide
          Michael McCandless added a comment -

          I'm completely embarassed by these failures.

          Well this is tricky stuff!

          And one of the failures (DisjunctionMaxQuery) is pre-existing.

          About the test case, I'm not sure it catches the queryWeight.scorer() returning null

          The 2nd part of the test case (the DisjunctionMaxQuery) does give a null top-level scorer back to IndexSearcher, and results in this exception (before fixing IndexSearcher & DisjunctionMaxQuery):

              [junit] java.lang.NullPointerException
              [junit] 	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:244)
              [junit] 	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:171)
              [junit] 	at org.apache.lucene.search.Searcher.search(Searcher.java:183)
              [junit] 	at org.apache.lucene.search.Searcher.search(Searcher.java:193)
              [junit] 	at org.apache.lucene.search.TestBooleanQuery.testNullOrSubScorer(TestBooleanQuery.java:80)
              [junit] 	at org.apache.lucene.util.LuceneTestCase.runTest(LuceneTestCase.java:88)
          
          Show
          Michael McCandless added a comment - I'm completely embarassed by these failures. Well this is tricky stuff! And one of the failures (DisjunctionMaxQuery) is pre-existing. About the test case, I'm not sure it catches the queryWeight.scorer() returning null The 2nd part of the test case (the DisjunctionMaxQuery) does give a null top-level scorer back to IndexSearcher, and results in this exception (before fixing IndexSearcher & DisjunctionMaxQuery): [junit] java.lang.NullPointerException [junit] at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:244) [junit] at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:171) [junit] at org.apache.lucene.search.Searcher.search(Searcher.java:183) [junit] at org.apache.lucene.search.Searcher.search(Searcher.java:193) [junit] at org.apache.lucene.search.TestBooleanQuery.testNullOrSubScorer(TestBooleanQuery.java:80) [junit] at org.apache.lucene.util.LuceneTestCase.runTest(LuceneTestCase.java:88)
          Hide
          Shai Erera added a comment -

          Mike, thanks for taking care of that. I'm completely embarassed by these failures. Obviously when I moved scorer.score() in IndexSearcher out of doSearch() (now searchWithFilter), I forgot to check for the null Scorer. With that added test case, I won't be able to "forget" again.

          About the test case, I'm not sure it catches the queryWeight.scorer() returning null. It does check that optional null scorers are just ignored by BQ. Maybe if you add to BQ the same PhraseQuery as MUST, BQ will return null, which will test IndexSearcher handling of a null Scorer?

          Show
          Shai Erera added a comment - Mike, thanks for taking care of that. I'm completely embarassed by these failures. Obviously when I moved scorer.score() in IndexSearcher out of doSearch() (now searchWithFilter), I forgot to check for the null Scorer. With that added test case, I won't be able to "forget" again. About the test case, I'm not sure it catches the queryWeight.scorer() returning null. It does check that optional null scorers are just ignored by BQ. Maybe if you add to BQ the same PhraseQuery as MUST, BQ will return null, which will test IndexSearcher handling of a null Scorer?
          Hide
          Michael McCandless added a comment -

          Sorry Shalin, and thanks for digging into this.

          With trunk, the new BooleanQuery#scorer method returns null if the sub scorer is null without checking if the clause was required.

          This is clearly wrong. I've added a test case & fixed it.

          DisjunctionMaxWeight's scorer also returns null if any of its
          sub-scorers are null (on trunk & 2.4.1), which I think is not right?
          I think it should simply skip that sub scorer; I've included that in
          the patch too.

          Finally, I think I fixed your 2nd NPE, where IndexSearcher was failing
          to handle a null scorer returned by the top QueryWeight, though I'm
          not certain it was the same issue you hit.

          Can you try out the attached patch? Sorry!

          Show
          Michael McCandless added a comment - Sorry Shalin, and thanks for digging into this. With trunk, the new BooleanQuery#scorer method returns null if the sub scorer is null without checking if the clause was required. This is clearly wrong. I've added a test case & fixed it. DisjunctionMaxWeight's scorer also returns null if any of its sub-scorers are null (on trunk & 2.4.1), which I think is not right? I think it should simply skip that sub scorer; I've included that in the patch too. Finally, I think I fixed your 2nd NPE, where IndexSearcher was failing to handle a null scorer returned by the top QueryWeight, though I'm not certain it was the same issue you hit. Can you try out the attached patch? Sorry!
          Hide
          Shai Erera added a comment -

          I'll check it, and add some unit tests, which are obviously missing . Shalin - do you have a simple test case that reproduces the problem? (just for easier start)

          Show
          Shai Erera added a comment - I'll check it, and add some unit tests, which are obviously missing . Shalin - do you have a simple test case that reproduces the problem? (just for easier start)
          Hide
          Shalin Shekhar Mangar added a comment -

          Something is still wrong, I'm seeing:

          SEVERE: java.lang.NullPointerException
          at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:242)
          at org.apache.lucene.search.Searcher.search(Searcher.java:173)
          at org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1103)
          at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:880)
          at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341)
          at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:176)

          I did some digging around. It seems that before r787772 (LUCENE-1630), the BooleanQuery#scorer used to return a null scorer only if the subScorer was null and the clause was required. With trunk, the new BooleanQuery#scorer method returns null if the sub scorer is null without checking if the clause was required. But I'm not sure if that is the only problem. When I changed BooleanQuery to the old behavior, I still see the same NPE but this time it is coming from DisjunctionMaxQuery#scorer due to subScorer.nextDoc() != DocIdSetIterator.NO_MORE_DOCS always being false. The subScorer in this case was ExactPhraseScorer.

          Show
          Shalin Shekhar Mangar added a comment - Something is still wrong, I'm seeing: SEVERE: java.lang.NullPointerException at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:242) at org.apache.lucene.search.Searcher.search(Searcher.java:173) at org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1103) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:880) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:176) I did some digging around. It seems that before r787772 ( LUCENE-1630 ), the BooleanQuery#scorer used to return a null scorer only if the subScorer was null and the clause was required. With trunk, the new BooleanQuery#scorer method returns null if the sub scorer is null without checking if the clause was required. But I'm not sure if that is the only problem. When I changed BooleanQuery to the old behavior, I still see the same NPE but this time it is coming from DisjunctionMaxQuery#scorer due to subScorer.nextDoc() != DocIdSetIterator.NO_MORE_DOCS always being false. The subScorer in this case was ExactPhraseScorer.
          Hide
          Shai Erera added a comment -

          oops .. . I've said it before - these two just confuse me. I ran into similar issues when I prepared the patch. Maybe we need a QueryTestImpl which impls just the deprecated methods, and we should run some searches with it. That way, our tests would catch those problems.

          Show
          Shai Erera added a comment - oops .. . I've said it before - these two just confuse me. I ran into similar issues when I prepared the patch. Maybe we need a QueryTestImpl which impls just the deprecated methods, and we should run some searches with it. That way, our tests would catch those problems.
          Hide
          Michael McCandless added a comment -

          Agreed – I'll commit shortly.

          Show
          Michael McCandless added a comment - Agreed – I'll commit shortly.
          Hide
          Uwe Schindler added a comment -

          I think the patch is wrong, it must be

             public QueryWeight createQueryWeight(Searcher searcher) throws IOException {
              return new QueryWeightWrapper(createWeight(searcher)); // and not weigth(searcher)
             }
          
          Show
          Uwe Schindler added a comment - I think the patch is wrong, it must be public QueryWeight createQueryWeight(Searcher searcher) throws IOException { return new QueryWeightWrapper(createWeight(searcher)); // and not weigth(searcher) }
          Hide
          Michael McCandless added a comment -

          Something is still amiss... from SOLR-940:

          java.lang.StackOverflowError
          at org.apache.solr.search.function.FunctionQuery.rewrite(FunctionQuery.java:50)
          at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:291)
          at org.apache.lucene.search.Query.queryWeight(Query.java:125)
          at org.apache.lucene.search.Query.weight(Query.java:117)
          at org.apache.lucene.search.Query.createQueryWeight(Query.java:108)
          at org.apache.lucene.search.Query.queryWeight(Query.java:126)
          at org.apache.lucene.search.Query.weight(Query.java:117)
          at org.apache.lucene.search.Query.createQueryWeight(Query.java:108)
          at org.apache.lucene.search.Query.queryWeight(Query.java:126)
          at org.apache.lucene.search.Query.weight(Query.java:117)
          
          Show
          Michael McCandless added a comment - Something is still amiss... from SOLR-940 : java.lang.StackOverflowError at org.apache.solr.search.function.FunctionQuery.rewrite(FunctionQuery.java:50) at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:291) at org.apache.lucene.search.Query.queryWeight(Query.java:125) at org.apache.lucene.search.Query.weight(Query.java:117) at org.apache.lucene.search.Query.createQueryWeight(Query.java:108) at org.apache.lucene.search.Query.queryWeight(Query.java:126) at org.apache.lucene.search.Query.weight(Query.java:117) at org.apache.lucene.search.Query.createQueryWeight(Query.java:108) at org.apache.lucene.search.Query.queryWeight(Query.java:126) at org.apache.lucene.search.Query.weight(Query.java:117)
          Hide
          Michael McCandless added a comment -

          I'll commit shortly! Thanks Shai.

          Show
          Michael McCandless added a comment - I'll commit shortly! Thanks Shai.
          Hide
          Uwe Schindler added a comment -

          Looks good, this is the correct way to do this.

          Show
          Uwe Schindler added a comment - Looks good, this is the correct way to do this.
          Hide
          Shai Erera added a comment -

          Very small patch, just one class ... I can't believe it's me

          Show
          Shai Erera added a comment - Very small patch, just one class ... I can't believe it's me
          Hide
          Michael McCandless added a comment -

          For these Tokenizers the cost to create the new TokenStreams is very high without this patch, too (because the new TokenStream API). The TokenStreams and Filters have to add all their attributes to a LinkedHashMap in their ctors. This is why I wanted to make NumericTokenStream reuseable when I invented it

          OK.

          at the moment it's a two-man-show between Michael and me with always showing up possible backwards-compatibility problems and so on!

          Yah nobody can keep up with you guys!!

          Show
          Michael McCandless added a comment - For these Tokenizers the cost to create the new TokenStreams is very high without this patch, too (because the new TokenStream API). The TokenStreams and Filters have to add all their attributes to a LinkedHashMap in their ctors. This is why I wanted to make NumericTokenStream reuseable when I invented it OK. at the moment it's a two-man-show between Michael and me with always showing up possible backwards-compatibility problems and so on! Yah nobody can keep up with you guys!!
          Hide
          Uwe Schindler added a comment -

          Whoa, powerful! But this sounds like a possible perf hit for analyzers that don't reuse their Tokenizers?

          For these Tokenizers the cost to create the new TokenStreams is very high without this patch, too (because the new TokenStream API). The TokenStreams and Filters have to add all their attributes to a LinkedHashMap in their ctors. This is why I wanted to make NumericTokenStream reuseable when I invented it

          And the reflection check is not too slow, as no classes are loaded with forName, its just a check to (getMethod().getDeclaringClass() != TokenStream.class)

          If there is more to discuss or some performance tests, see LUCENE-1693; at the moment it's a two-man-show between Michael and me with always showing up possible backwards-compatibility problems and so on!

          Show
          Uwe Schindler added a comment - Whoa, powerful! But this sounds like a possible perf hit for analyzers that don't reuse their Tokenizers? For these Tokenizers the cost to create the new TokenStreams is very high without this patch, too (because the new TokenStream API). The TokenStreams and Filters have to add all their attributes to a LinkedHashMap in their ctors. This is why I wanted to make NumericTokenStream reuseable when I invented it And the reflection check is not too slow, as no classes are loaded with forName, its just a check to (getMethod().getDeclaringClass() != TokenStream.class) If there is more to discuss or some performance tests, see LUCENE-1693 ; at the moment it's a two-man-show between Michael and me with always showing up possible backwards-compatibility problems and so on!
          Hide
          Michael McCandless added a comment -

          That sounds like the right approach Shai! And such legacy Query classes will receive deprecation warnings (that they are overriding a deprecated method), so I think that fixes the back-compat breakage.

          Show
          Michael McCandless added a comment - That sounds like the right approach Shai! And such legacy Query classes will receive deprecation warnings (that they are overriding a deprecated method), so I think that fixes the back-compat breakage.
          Hide
          Shai Erera added a comment -

          Previously creaetWeight() threw UOE, and I've changed it to call createQueryWeight and the latter to throw UOE. This was obviously the wrong way .

          I think we should:

          1. Revert createWeight to throw UOE
          2. Change createQueryWeight to call createWeight() and wrap w/ QueryWeightWrapper

          This should not cause infinite recursion, I think, since:

          • The Lucene code calls createQueryWeight.
          • All of the Lucene Query impls implement createQueryWeight.
          • Any external Query which did not impl createQueryWeight yet, will be fine too, since Query.createQueryWeight will call the given query's createWeight, which ought to be implemented. Otheriwse UOE was encountered long before.

          Makes sense? I can post a patch soon.

          Show
          Shai Erera added a comment - Previously creaetWeight() threw UOE, and I've changed it to call createQueryWeight and the latter to throw UOE. This was obviously the wrong way . I think we should: Revert createWeight to throw UOE Change createQueryWeight to call createWeight() and wrap w/ QueryWeightWrapper This should not cause infinite recursion, I think, since: The Lucene code calls createQueryWeight. All of the Lucene Query impls implement createQueryWeight. Any external Query which did not impl createQueryWeight yet, will be fine too, since Query.createQueryWeight will call the given query's createWeight, which ought to be implemented. Otheriwse UOE was encountered long before. Makes sense? I can post a patch soon.
          Hide
          Michael McCandless added a comment -

          With my newest TokenStream improvements (LUCENE-1693) this is no loger the case, it checks in the ctor, that one of the methods was overridden and throws UOE early (this reflection-based feature was added, because a "what method was overriden" was needed for correct delegation in now 3 different methods doing the same)

          Whoa, powerful! But this sounds like a possible perf hit for analyzers that don't reuse their Tokenizers?

          Show
          Michael McCandless added a comment - With my newest TokenStream improvements ( LUCENE-1693 ) this is no loger the case, it checks in the ctor, that one of the methods was overridden and throws UOE early (this reflection-based feature was added, because a "what method was overriden" was needed for correct delegation in now 3 different methods doing the same) Whoa, powerful! But this sounds like a possible perf hit for analyzers that don't reuse their Tokenizers?
          Hide
          Uwe Schindler added a comment -

          EG, we could have the default impl for createQueryWeight to call
          weight() and wrap it. The danger is if you subclass Query and fail to
          override either of these methods, we've created an infinite recursion.
          (Though, we have precedent here: TokenStream does the same thing).
          Too bad Java doesn't let you declare that of these 2 methods at least
          one must be overridden.

          +1

          (Though, we have precedent here: TokenStream does the same thing).

          With my newest TokenStream improvements (LUCENE-1693) this is no loger the case, it checks in the ctor, that one of the methods was overridden and throws UOE early (this reflection-based feature was added, because a "what method was overriden" was needed for correct delegation in now 3 different methods doing the same)

          But on the other hand the infinite loop is very fast failing (StackOverflow occurs mostly after <300 iterations).

          Show
          Uwe Schindler added a comment - EG, we could have the default impl for createQueryWeight to call weight() and wrap it. The danger is if you subclass Query and fail to override either of these methods, we've created an infinite recursion. (Though, we have precedent here: TokenStream does the same thing). Too bad Java doesn't let you declare that of these 2 methods at least one must be overridden. +1 (Though, we have precedent here: TokenStream does the same thing). With my newest TokenStream improvements ( LUCENE-1693 ) this is no loger the case, it checks in the ctor, that one of the methods was overridden and throws UOE early (this reflection-based feature was added, because a "what method was overriden" was needed for correct delegation in now 3 different methods doing the same) But on the other hand the infinite loop is very fast failing (StackOverflow occurs mostly after <300 iterations).
          Hide
          Michael McCandless added a comment -

          SOLR-940 is hitting an exception due to the addition of QueryWeight:
          because we've added default impl for Query.createQueryWeight, that
          throws UOE, users who pass their own Query impls, that haven't yet
          cutover to createQueryWeight, to BooleanQuery (within BooleanClause)
          will hit this exception:

          SEVERE: java.lang.UnsupportedOperationException
          	at org.apache.lucene.search.Query.createQueryWeight(Query.java:102)
          	at org.apache.lucene.search.BooleanQuery$BooleanWeight.<init>(BooleanQuery.java:185)
          	at org.apache.lucene.search.BooleanQuery.createQueryWeight(BooleanQuery.java:401)
          	at org.apache.lucene.search.Query.queryWeight(Query.java:120)
          	at org.apache.lucene.search.Searcher.createQueryWeight(Searcher.java:237)
          	at org.apache.lucene.search.Searcher.search(Searcher.java:173)
          	at org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1103)
          	at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:880)
          	at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341)
          	at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:176)
          	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
          	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
          	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1290)
          

          Because BooleanQuery is always calling createQueryWeight of its
          sub-queries. (Other queries, eg CustomScoreQuery, do so as well).

          I think this is a back-compat break. Shai, any ideas how to fix this
          so we somehow auto-wrap with QueryWeightWrapper?

          EG, we could have the default impl for createQueryWeight to call
          weight() and wrap it. The danger is if you subclass Query and fail to
          override either of these methods, we've created an infinite recursion.
          (Though, we have precedent here: TokenStream does the same thing).
          Too bad Java doesn't let you declare that of these 2 methods at least
          one must be overridden.

          Show
          Michael McCandless added a comment - SOLR-940 is hitting an exception due to the addition of QueryWeight: because we've added default impl for Query.createQueryWeight, that throws UOE, users who pass their own Query impls, that haven't yet cutover to createQueryWeight, to BooleanQuery (within BooleanClause) will hit this exception: SEVERE: java.lang.UnsupportedOperationException at org.apache.lucene.search.Query.createQueryWeight(Query.java:102) at org.apache.lucene.search.BooleanQuery$BooleanWeight.<init>(BooleanQuery.java:185) at org.apache.lucene.search.BooleanQuery.createQueryWeight(BooleanQuery.java:401) at org.apache.lucene.search.Query.queryWeight(Query.java:120) at org.apache.lucene.search.Searcher.createQueryWeight(Searcher.java:237) at org.apache.lucene.search.Searcher.search(Searcher.java:173) at org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1103) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:880) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:176) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1290) Because BooleanQuery is always calling createQueryWeight of its sub-queries. (Other queries, eg CustomScoreQuery, do so as well). I think this is a back-compat break. Shai, any ideas how to fix this so we somehow auto-wrap with QueryWeightWrapper? EG, we could have the default impl for createQueryWeight to call weight() and wrap it. The danger is if you subclass Query and fail to override either of these methods, we've created an infinite recursion. (Though, we have precedent here: TokenStream does the same thing). Too bad Java doesn't let you declare that of these 2 methods at least one must be overridden.
          Hide
          Michael McCandless added a comment -

          Thanks Shai!

          Yes, let's go look @ LUCENE-1652 now!

          Show
          Michael McCandless added a comment - Thanks Shai! Yes, let's go look @ LUCENE-1652 now!
          Hide
          Shai Erera added a comment -

          Mike, after you commit this we should take a look at LUCENE-1652 - I'm not sure if there was anything left to improve from what we had initially thought to do.

          Show
          Shai Erera added a comment - Mike, after you commit this we should take a look at LUCENE-1652 - I'm not sure if there was anything left to improve from what we had initially thought to do.
          Hide
          Michael McCandless added a comment -

          Patch looks good! I attached a new version w/ some small changes:

          • Tweaked javadocs, fixed javadocs warnings, CHANGES
          • It's nice that this change requires no changes to back-compat branch!
          • Pass "false" for topScorer arg in CustomScorer.scorer (instead of
            threading down the incoming topScorer).

          I think it's ready to commit. I'll wait a day or two...

          Show
          Michael McCandless added a comment - Patch looks good! I attached a new version w/ some small changes: Tweaked javadocs, fixed javadocs warnings, CHANGES It's nice that this change requires no changes to back-compat branch! Pass "false" for topScorer arg in CustomScorer.scorer (instead of threading down the incoming topScorer). I think it's ready to commit. I'll wait a day or two...
          Hide
          Michael McCandless added a comment -

          CustomScorer's nextDoc uses advance on its subScorers.

          Yeah I noticed that, but thought that out-of-order means a top-scorer usually, and then score(Collector) is called. But now I see CustomScorer does not implement score(Collector) which means it uses Scorer's, which calls nextDoc() and advance().

          Actually, Scorer's default impl of "score(Collector)" doesn't use
          advance() right?

          And so it's OK to use nextDoc() on a Scorer that returns out-of-order
          docs. But if you pass topScorer=true, you should not attempt to
          iterate through its docs (I think?).

          Ie, I think "scores docs in order" is somewhat orthogonal to "is top
          scorer". I can iterate through nextDocs() that come back out of
          order...

          But, right, CustomScorer's nextDoc() won't do the right thing if it's
          subQueryScorer return docs out of order.

          Regarding TopScorer, it'd need to get a Scorer as input, otherwise what would be its default impl for score(Collector)? I thought it should be the current one of Scorer's.

          Right I think it'd look roughly something like this:

          class Scorer() {
            ...
            TopScorer topScorer(boolean scoreDocsInOrder) {
              return new TopScorer(this.scorer(scoreDocInOrder));
            }
          }
          
          class TopScorer extends Scorer {
            TopScorer() {}
            TopScorer(Scorer sub) {
              collector.setScorer(this);
              int doc;
              while ((doc = nextDoc()) != NO_MORE_DOCS) {
                collector.collect(doc);
              }
            }
            void score(Collector) {
            }
          }
          

          But, I don't like having TopScorer extend Scorer; I think we'd have to
          factor out a base class that has the "float score()" method. Anyway,
          let's do this later.

          Show
          Michael McCandless added a comment - CustomScorer's nextDoc uses advance on its subScorers. Yeah I noticed that, but thought that out-of-order means a top-scorer usually, and then score(Collector) is called. But now I see CustomScorer does not implement score(Collector) which means it uses Scorer's, which calls nextDoc() and advance(). Actually, Scorer's default impl of "score(Collector)" doesn't use advance() right? And so it's OK to use nextDoc() on a Scorer that returns out-of-order docs. But if you pass topScorer=true, you should not attempt to iterate through its docs (I think?). Ie, I think "scores docs in order" is somewhat orthogonal to "is top scorer". I can iterate through nextDocs() that come back out of order... But, right, CustomScorer's nextDoc() won't do the right thing if it's subQueryScorer return docs out of order. Regarding TopScorer, it'd need to get a Scorer as input, otherwise what would be its default impl for score(Collector)? I thought it should be the current one of Scorer's. Right I think it'd look roughly something like this: class Scorer() { ... TopScorer topScorer( boolean scoreDocsInOrder) { return new TopScorer( this .scorer(scoreDocInOrder)); } } class TopScorer extends Scorer { TopScorer() {} TopScorer(Scorer sub) { collector.setScorer( this ); int doc; while ((doc = nextDoc()) != NO_MORE_DOCS) { collector.collect(doc); } } void score(Collector) { } } But, I don't like having TopScorer extend Scorer; I think we'd have to factor out a base class that has the "float score()" method. Anyway, let's do this later.
          Hide
          Shai Erera added a comment -

          Implemented the latest comments, except for TopScorer

          Show
          Shai Erera added a comment - Implemented the latest comments, except for TopScorer
          Hide
          Shai Erera added a comment -

          CustomScorer's nextDoc uses advance on its subScorers.

          Yeah I noticed that, but thought that out-of-order means a top-scorer usually, and then score(Collector) is called. But now I see CustomScorer does not implement score(Collector) which means it uses Scorer's, which calls nextDoc() and advance().

          Regarding TopScorer, it'd need to get a Scorer as input, otherwise what would be its default impl for score(Collector)? I thought it should be the current one of Scorer's.

          Will post a patch soon.

          Show
          Shai Erera added a comment - CustomScorer's nextDoc uses advance on its subScorers. Yeah I noticed that, but thought that out-of-order means a top-scorer usually, and then score(Collector) is called. But now I see CustomScorer does not implement score(Collector) which means it uses Scorer's, which calls nextDoc() and advance(). Regarding TopScorer, it'd need to get a Scorer as input, otherwise what would be its default impl for score(Collector)? I thought it should be the current one of Scorer's. Will post a patch soon.
          Hide
          Michael McCandless added a comment -

          Can you please next time give me a hint on where did you find it?

          OK It's a quick search through the patch file though

          Taking Scorer.score(Collector) out of Scorer and into TopScorer is a large re-factoring. Are you sure about this? I just think of all the Scorers we have, and out there, that need to impl a new class, and possible duplicate a lot of code that is today shared between the top-level-scorer and iterator-type-scorer.

          I'm definitely not sure about it...

          For Scorers that don't have anything special to do when they are "top", we'd have a default impl (get a non-top Scorer and iterate over it, like Scorer.score now does. So I think the only weight that'd do something interesting is BooleanQuery's.

          But I agree this is a big change, so let's hold off for now? With search specialization (LUCENE-1594) the difference between "being top" and "being sub" seems to be more important....

          I think CustomScoreQuery.scorer should actually always score docs in order?

          Why? I don't see that it relies on doc id orderness anywhere

          CustomScorer's nextDoc uses advance on its subScorers.

          Show
          Michael McCandless added a comment - Can you please next time give me a hint on where did you find it? OK It's a quick search through the patch file though Taking Scorer.score(Collector) out of Scorer and into TopScorer is a large re-factoring. Are you sure about this? I just think of all the Scorers we have, and out there, that need to impl a new class, and possible duplicate a lot of code that is today shared between the top-level-scorer and iterator-type-scorer. I'm definitely not sure about it... For Scorers that don't have anything special to do when they are "top", we'd have a default impl (get a non-top Scorer and iterate over it, like Scorer.score now does. So I think the only weight that'd do something interesting is BooleanQuery's. But I agree this is a big change, so let's hold off for now? With search specialization ( LUCENE-1594 ) the difference between "being top" and "being sub" seems to be more important.... I think CustomScoreQuery.scorer should actually always score docs in order? Why? I don't see that it relies on doc id orderness anywhere CustomScorer's nextDoc uses advance on its subScorers.
          Hide
          Shai Erera added a comment -

          I think CustomScoreQuery.scorer should actually always score docs in order?

          Why? I don't see that it relies on doc id orderness anywhere. What if its subWeight is a BooleanWeight and I use a Collector which accepts docs out-of-order? Will I have a problem if I ask for an out-of-order Scorer?

          Show
          Shai Erera added a comment - I think CustomScoreQuery.scorer should actually always score docs in order? Why? I don't see that it relies on doc id orderness anywhere. What if its subWeight is a BooleanWeight and I use a Collector which accepts docs out-of-order? Will I have a problem if I ask for an out-of-order Scorer?
          Hide
          Shai Erera added a comment -

          QyertWeight -> QueryWeight

          I'll fix. Can you please next time give me a hint on where did you find it?

          I wonder if we should have a separate TopScorer class

          I remember that at some point I suggested to have a score(Searcher, Collector) on QueryWeight, and make Scorer.score(Collector) package-private (of course we'd need to deprecate first and invent a new name). But then I realized that custom weights would still need access to Scorer.score(Collector) if they want to use an existing Scorer or something.

          Taking Scorer.score(Collector) out of Scorer and into TopScorer is a large re-factoring. Are you sure about this? I just think of all the Scorers we have, and out there, that need to impl a new class, and possible duplicate a lot of code that is today shared between the top-level-scorer and iterator-type-scorer.

          I understand what you say "so it really feels like it wants to be a different class than Scorer" - I feel that too. But I don't see a great ROI here.

          Show
          Shai Erera added a comment - QyertWeight -> QueryWeight I'll fix. Can you please next time give me a hint on where did you find it? I wonder if we should have a separate TopScorer class I remember that at some point I suggested to have a score(Searcher, Collector) on QueryWeight, and make Scorer.score(Collector) package-private (of course we'd need to deprecate first and invent a new name). But then I realized that custom weights would still need access to Scorer.score(Collector) if they want to use an existing Scorer or something. Taking Scorer.score(Collector) out of Scorer and into TopScorer is a large re-factoring. Are you sure about this? I just think of all the Scorers we have, and out there, that need to impl a new class, and possible duplicate a lot of code that is today shared between the top-level-scorer and iterator-type-scorer. I understand what you say "so it really feels like it wants to be a different class than Scorer" - I feel that too. But I don't see a great ROI here.
          Hide
          Michael McCandless added a comment -
          • I wonder if we should have a separate TopScorer class, that
            doesn't expose nextDoc/advance methods? And then a separate
            QueryWeight.topScorer method instead of a boolean arg to
            QueryWeight.scorer. (I'm torn...). EG, if you get a topScorer,
            you are not supposed to call nextDoc/advance on it, so it really
            feels like it wants to be a different class than Scorer...
          • Update CHANGES entry based on iterations on the patch
            (eg supportsDocsOutOfOrder --> acceptsDocsOutOfOrder)
          • Can we rename QW.scoresOutOfOrder -> QW.scoresDocsOutOfOrder?
          • In IndexSearcher ~line 221 shouldn't was pass "true" for
            scoresDocsInOrder in Scorer scorer = weight.scorer(reader, false, true)?
          • QyertWeight -> QueryWeight
          • I think CustomScoreQuery.scorer should actually always score docs
            in order? So CustomWeight.scoresOutOfOrder should return false?
            And CustomWeight.scorer should pass "true" for scoreDocsInOrder to
            all sub-weights?
          Show
          Michael McCandless added a comment - I wonder if we should have a separate TopScorer class, that doesn't expose nextDoc/advance methods? And then a separate QueryWeight.topScorer method instead of a boolean arg to QueryWeight.scorer. (I'm torn...). EG, if you get a topScorer, you are not supposed to call nextDoc/advance on it, so it really feels like it wants to be a different class than Scorer... Update CHANGES entry based on iterations on the patch (eg supportsDocsOutOfOrder --> acceptsDocsOutOfOrder) Can we rename QW.scoresOutOfOrder -> QW.scoresDocsOutOfOrder? In IndexSearcher ~line 221 shouldn't was pass "true" for scoresDocsInOrder in Scorer scorer = weight.scorer(reader, false, true) ? QyertWeight -> QueryWeight I think CustomScoreQuery.scorer should actually always score docs in order? So CustomWeight.scoresOutOfOrder should return false? And CustomWeight.scorer should pass "true" for scoreDocsInOrder to all sub-weights?
          Hide
          Shai Erera added a comment -
          • Collector's acceptDocsOutOfOrder is abstract - this was a really good change since I completely forgot to override it in all home brewed Collectors to return true where applicable. I also surprised to see that <5 collectors actually should return false (most of them in tests).
          • I added QueryWeight variants to Searchable and implemented in RemoteSearchable.
          • Mike - I'm afraid I did some more code cleanup (not much though) - that was before I saw your last comment. sorry
          • Handled the rest of the latest comments.

          All tests pass

          Show
          Shai Erera added a comment - Collector's acceptDocsOutOfOrder is abstract - this was a really good change since I completely forgot to override it in all home brewed Collectors to return true where applicable. I also surprised to see that <5 collectors actually should return false (most of them in tests). I added QueryWeight variants to Searchable and implemented in RemoteSearchable. Mike - I'm afraid I did some more code cleanup (not much though) - that was before I saw your last comment. sorry Handled the rest of the latest comments. All tests pass
          Hide
          Shai Erera added a comment -

          You forgot to fill in the "?" in CHANGES

          I guess you're looking at the previous patch. It already has your name in the latest

          Sorry, you're right - there are two sections in CHANGES which I've added text to, and I put your name in the second one only.

          Show
          Shai Erera added a comment - You forgot to fill in the "?" in CHANGES I guess you're looking at the previous patch. It already has your name in the latest Sorry, you're right - there are two sections in CHANGES which I've added text to, and I put your name in the second one only.
          Hide
          Shai Erera added a comment -

          You forgot to fill in the "?" in CHANGES

          I guess you're looking at the previous patch. It already has your name in the latest

          How come Document doc(int n, FieldSelector fieldSelector) throws CorruptIndexException, IOException is added to Searcher.java in your patch?

          It's leftover from when I first deprecated Searchable - I wanted to move all the methods from Searchable to Searcher so that we don't forget that later. Will remove it.

          Rethinking fixing Searchable now vs later

          Ok I will do that. Deprecate the current ones and add new ones. We need to keep the Weight-variant methods in, since someone might call it. If he doesn't extend Searcher or implement Searchable, there's no real break in back-compat for him.

          As much as I love all the little code cleanups

          Apologies ... I'll try to restrain myself. That's why I didn't want to make Collector.accepts..() abstract - it would force me to touch more files, which means more code cleanups . I'll do my best to stop.

          Show
          Shai Erera added a comment - You forgot to fill in the "?" in CHANGES I guess you're looking at the previous patch. It already has your name in the latest How come Document doc(int n, FieldSelector fieldSelector) throws CorruptIndexException, IOException is added to Searcher.java in your patch? It's leftover from when I first deprecated Searchable - I wanted to move all the methods from Searchable to Searcher so that we don't forget that later. Will remove it. Rethinking fixing Searchable now vs later Ok I will do that. Deprecate the current ones and add new ones. We need to keep the Weight-variant methods in, since someone might call it. If he doesn't extend Searcher or implement Searchable, there's no real break in back-compat for him. As much as I love all the little code cleanups Apologies ... I'll try to restrain myself. That's why I didn't want to make Collector.accepts..() abstract - it would force me to touch more files, which means more code cleanups . I'll do my best to stop.
          Hide
          Michael McCandless added a comment -

          Still working through the patch... here's what I found so far:

          • You forgot to fill in the "?" in CHANGES
          • Can you change the default for BooleanQuery.allowDocsOutOfOrder to
            true?
          • How come {{Document doc(int n, FieldSelector fieldSelector) throws
            CorruptIndexException, IOException}} is added to Searcher.java in
            your patch?
          • Rethinking fixing Searchable now vs later: first off, we've
            already changed the interface in 2.9 (added Collector); second
            off, in our changes with Fieldable we both changed our policy and
            the interface, in one release. Maybe we should in fact switch to
            QueryWeight? (I'm not sure). This patch already breaks back
            compat of Searcher (there are new abstract methods), anyway.
          • Instead of saying "there is a chance" in the javadoc in BQ, can
            you change it to say "BQ will return an out-of-order scorer if
            requested"? (There's no chance in the matter...).
          • In fact, DocumentsWriter very much needs for the docs to be scored
            in order (it breaks out of the loop on the first out-of-bounds
            doc). Can you put that back?
          • As much as I love all the little code cleanups, can you resist the
            temptation, especially in such large patches as this? I think a
            separate issue that does pure code cleanups would be a great way
            to fix these, going forward...
          • "not need anymore" --> "not needed anymore"
          • We can now make things final in BS2, like countingSumScorer,
            *Scorers, etc?
          Show
          Michael McCandless added a comment - Still working through the patch... here's what I found so far: You forgot to fill in the "?" in CHANGES Can you change the default for BooleanQuery.allowDocsOutOfOrder to true? How come {{Document doc(int n, FieldSelector fieldSelector) throws CorruptIndexException, IOException}} is added to Searcher.java in your patch? Rethinking fixing Searchable now vs later: first off, we've already changed the interface in 2.9 (added Collector); second off, in our changes with Fieldable we both changed our policy and the interface, in one release. Maybe we should in fact switch to QueryWeight? (I'm not sure). This patch already breaks back compat of Searcher (there are new abstract methods), anyway. Instead of saying "there is a chance" in the javadoc in BQ, can you change it to say "BQ will return an out-of-order scorer if requested"? (There's no chance in the matter...). In fact, DocumentsWriter very much needs for the docs to be scored in order (it breaks out of the loop on the first out-of-bounds doc). Can you put that back? As much as I love all the little code cleanups, can you resist the temptation, especially in such large patches as this? I think a separate issue that does pure code cleanups would be a great way to fix these, going forward... "not need anymore" --> "not needed anymore" We can now make things final in BS2, like countingSumScorer, *Scorers, etc?
          Hide
          Shai Erera added a comment -

          It isn't and that's what I expressed in the javadocs. If you plan to iterate on a Scorer, you should always ask for in-order one, and that's what IndexSearcher does. Mike suggested above to refine that documentation to say that if you plan to call nextDoc() only, you can still ask for an out-of-order scorer.

          Show
          Shai Erera added a comment - It isn't and that's what I expressed in the javadocs. If you plan to iterate on a Scorer, you should always ask for in-order one, and that's what IndexSearcher does. Mike suggested above to refine that documentation to say that if you plan to call nextDoc() only, you can still ask for an out-of-order scorer.
          Hide
          Earwin Burrfoot added a comment -

          I wasn't following the issue closely, so this question might by silly - how does out-of-order scoring/collection marry with filters?
          If I remember right, filter/scorer intersection relies on proper orderness.

          Show
          Earwin Burrfoot added a comment - I wasn't following the issue closely, so this question might by silly - how does out-of-order scoring/collection marry with filters? If I remember right, filter/scorer intersection relies on proper orderness.
          Hide
          Shai Erera added a comment -

          Ok I will change acceptsDocsOutOfOrder on Collector to abstract, and implement it in all core collectors.

          I've already changed BooleanWeight's impl, as I wrote above "I fixed BooleanWeight to return true if there is a chance it will return BS (i.e. there are no required clauses and <32 prohibited clauses)".

          I still don't think scoresOutOfOrder can live on Scorer. IndexSearcher's search methods all call eventually to search(QueryWeight, Filter, Collector), which means that by that time you should already have a Collector ready (note that the user may pass its own Collector). Therefore such a utility will not work for user provided collectors, and specifically this method creates a Scorer for a given reader, but never a Collector (and a Collector is created just once).

          So if we were to take your approach, it'd deviate the "fast search methods" from the other search methods. The others would call search(Weight, Filter, Collector) and the "fast ones" would not (since they don't have a Collector yet). This will complicate IndexSearcher's code, IMO unnecessarily. If we want to differentiate the two, I can do that w/o a helper class.

          Show
          Shai Erera added a comment - Ok I will change acceptsDocsOutOfOrder on Collector to abstract, and implement it in all core collectors. I've already changed BooleanWeight's impl, as I wrote above "I fixed BooleanWeight to return true if there is a chance it will return BS (i.e. there are no required clauses and <32 prohibited clauses)". I still don't think scoresOutOfOrder can live on Scorer. IndexSearcher's search methods all call eventually to search(QueryWeight, Filter, Collector), which means that by that time you should already have a Collector ready (note that the user may pass its own Collector). Therefore such a utility will not work for user provided collectors, and specifically this method creates a Scorer for a given reader, but never a Collector (and a Collector is created just once). So if we were to take your approach, it'd deviate the "fast search methods" from the other search methods. The others would call search(Weight, Filter, Collector) and the "fast ones" would not (since they don't have a Collector yet). This will complicate IndexSearcher's code, IMO unnecessarily. If we want to differentiate the two, I can do that w/o a helper class.
          Hide
          Michael McCandless added a comment -

          Can we make Collector.supportsDocsOutOfOrder abstract? Defaulting to false isn't great (I'd rather subclass think about the question).

          In general, I tried to avoid it since that would require changing all core Collectors. There aren't many, but still ...

          This goes for QueryWeight.scoresOutOfOrder - wanted to avoid changing all core Weights to impl the method w/ "return false". I actually think that many Weights/Scorers do score documents in-order, hence the default impl.

          OK... thinking more about it, I think having
          QueryWeight.scoresDocsOutOfOrder default to "false" is reasonable (I
          think most do in-order scoring). Also, I think the perf gains are
          relatively small if a QueryWeight returns "true", so, by defaulting to
          false we're not leaving much performance on the table.

          But for Collector it's a different story: the gains by allowing
          BooleanQuery to use its out-of-order scorer are sizable. And, I'd
          expect many custom Collectors would be fine with out-of-order
          collection.

          Since these are brand new classes, we have the chance to do it well.
          It's very much an expert thing already to make your own Collector...

          If a given Scorer.scoresOutOfOrder returns true, does that mean nextDoc is allowed to return docs out of order?

          When you deal with a Scorer which returns out-of-order, you can only call scorer.score(Collector). If you're going to iterate, you're going to have to create a Scorer in-order, and that's what IndexSearcher does. I'll spell it out clearly in the javadocs.

          That may be a bit too strong – eg BooleanScorer lets you nextDoc()
          your way through its out-of-order docs (just not advance()). Maybe
          state just that you can't use advance in the javadocs?

          Should scoresOutOfOrder() move from QueryWeight --> Scorer?

          We've discussed it few posts up. When this information in in Scorer, I should first ask for a Scorer, and only then I can create a Collector. If I'll use the Scorer immediately, then that'll be ok. However, that's not the case in IndexSearcher, and results in a bug in Spatial, and unless we want to uglify IndexSearcher code, it seemed that this can sit in QueryWeight.

          But I do think it's a problematic method in QW too, since if it returns false by default, I'll create a Collector which expects docs in-order, but then I'd lose the optimization in BooleanWeight which may return an out-of-order superior Scorer. If I return true, I'll create a Collector which expects out-of-order, and the Scorer (again, an example from BW) may be actually in-order, and I've wasted unnecessary 'if doc > topDoc' cycles.

          So I don't know what's better: make IndexSearcher code more complicated or sacrifice a potential loss of this optimization?

          Could we "invert" the logic in IndexSearcher that makes a collector,
          eg by creating a utility class that will on-demand provide a collector
          once told whether the docs will be in order? Basically, "curry" all
          the other details about the collector (sorting by score vs field, if
          by field whether to track scores & max score). Then inside doSearch
          when we finally know if the Scorer will be in-order, we ask that
          helper class for the collector? The first time the helper class is
          called, it makes the collector; subsequent times it returns the same
          one.

          There is a risk, though, if the Scorer returned for a given segment
          "changes its mind"... eg the first segment's scorer says the docs will
          be in order, and then some later segment's scorer says they will not
          be in order. So... that's risky.

          Maybe we leave it on QueryWeight, but fix BooleanWeight to return
          exactly the right thing? (It can be exact, right? Because we know
          the conditions under which BooleanWeight, if allowed to do so, would
          choose to return an out-of-order scorer).

          Shouldn't Searchable cutover to QueryWeight too? (We are keeping Searchable, but allowing changes to it)

          I wrote that above too - I don't think we can declare and execute right in 2.9 that Searchable can be changed unexpectedly. So I added a NOTE to its javadocs and thought to do the change post 2.9, when we remove Weight. We'd be forced to change these methods to QueryWeight, and fix RemoteSearchable too. And it will be consistent w/ our back-compat policy (at least the part where we declare on an upcoming change before it happens).

          But if you think otherwise, I don't mind deprecating and adding new methods (I've got used to it already, I almost do it blindly ).

          [Sorry, I'm losing track of all the comments]

          OK let's defer the changes to Searchable until 3.1. Make sure you
          open a follow-on issue so we remember

          Show
          Michael McCandless added a comment - Can we make Collector.supportsDocsOutOfOrder abstract? Defaulting to false isn't great (I'd rather subclass think about the question). In general, I tried to avoid it since that would require changing all core Collectors. There aren't many, but still ... This goes for QueryWeight.scoresOutOfOrder - wanted to avoid changing all core Weights to impl the method w/ "return false". I actually think that many Weights/Scorers do score documents in-order, hence the default impl. OK... thinking more about it, I think having QueryWeight.scoresDocsOutOfOrder default to "false" is reasonable (I think most do in-order scoring). Also, I think the perf gains are relatively small if a QueryWeight returns "true", so, by defaulting to false we're not leaving much performance on the table. But for Collector it's a different story: the gains by allowing BooleanQuery to use its out-of-order scorer are sizable. And, I'd expect many custom Collectors would be fine with out-of-order collection. Since these are brand new classes, we have the chance to do it well. It's very much an expert thing already to make your own Collector... If a given Scorer.scoresOutOfOrder returns true, does that mean nextDoc is allowed to return docs out of order? When you deal with a Scorer which returns out-of-order, you can only call scorer.score(Collector). If you're going to iterate, you're going to have to create a Scorer in-order, and that's what IndexSearcher does. I'll spell it out clearly in the javadocs. That may be a bit too strong – eg BooleanScorer lets you nextDoc() your way through its out-of-order docs (just not advance()). Maybe state just that you can't use advance in the javadocs? Should scoresOutOfOrder() move from QueryWeight --> Scorer? We've discussed it few posts up. When this information in in Scorer, I should first ask for a Scorer, and only then I can create a Collector. If I'll use the Scorer immediately, then that'll be ok. However, that's not the case in IndexSearcher, and results in a bug in Spatial, and unless we want to uglify IndexSearcher code, it seemed that this can sit in QueryWeight. But I do think it's a problematic method in QW too, since if it returns false by default, I'll create a Collector which expects docs in-order, but then I'd lose the optimization in BooleanWeight which may return an out-of-order superior Scorer. If I return true, I'll create a Collector which expects out-of-order, and the Scorer (again, an example from BW) may be actually in-order, and I've wasted unnecessary 'if doc > topDoc' cycles. So I don't know what's better: make IndexSearcher code more complicated or sacrifice a potential loss of this optimization? Could we "invert" the logic in IndexSearcher that makes a collector, eg by creating a utility class that will on-demand provide a collector once told whether the docs will be in order? Basically, "curry" all the other details about the collector (sorting by score vs field, if by field whether to track scores & max score). Then inside doSearch when we finally know if the Scorer will be in-order, we ask that helper class for the collector? The first time the helper class is called, it makes the collector; subsequent times it returns the same one. There is a risk, though, if the Scorer returned for a given segment "changes its mind"... eg the first segment's scorer says the docs will be in order, and then some later segment's scorer says they will not be in order. So... that's risky. Maybe we leave it on QueryWeight, but fix BooleanWeight to return exactly the right thing? (It can be exact, right? Because we know the conditions under which BooleanWeight, if allowed to do so, would choose to return an out-of-order scorer). Shouldn't Searchable cutover to QueryWeight too? (We are keeping Searchable, but allowing changes to it) I wrote that above too - I don't think we can declare and execute right in 2.9 that Searchable can be changed unexpectedly. So I added a NOTE to its javadocs and thought to do the change post 2.9, when we remove Weight. We'd be forced to change these methods to QueryWeight, and fix RemoteSearchable too. And it will be consistent w/ our back-compat policy (at least the part where we declare on an upcoming change before it happens). But if you think otherwise, I don't mind deprecating and adding new methods (I've got used to it already, I almost do it blindly ). [Sorry, I'm losing track of all the comments] OK let's defer the changes to Searchable until 3.1. Make sure you open a follow-on issue so we remember
          Hide
          Shai Erera added a comment -

          Fixed most of your comments Mike. I also noticed I did not document Collector.acceptsDocsOutOfOrder, so fixed that too.

          The remaining things we should agree on are:

          • deprecated Weight and add QueryWeight variants to Searchable. I prefer to do it post 2.9.
          • move scoresDocsOutOfOrder to Scorer instead of Weight. I fixed BooleanWeight to return true if there is a chance it will return BS (i.e. there are no required clauses and <32 prohibited clauses). I guess we'll need to discuss that one more.
          • Make Collector.acceptsDocsOutOfOrder and QueryWeight.scoresDocsOutOfOrder abstract - I think the default impl makes sense for most of the imps out there and the ones in core, but I don't have a strong feeling against making it abstract.

          All tests pass, and javadocs are good as well.

          Show
          Shai Erera added a comment - Fixed most of your comments Mike. I also noticed I did not document Collector.acceptsDocsOutOfOrder, so fixed that too. The remaining things we should agree on are: deprecated Weight and add QueryWeight variants to Searchable. I prefer to do it post 2.9. move scoresDocsOutOfOrder to Scorer instead of Weight. I fixed BooleanWeight to return true if there is a chance it will return BS (i.e. there are no required clauses and <32 prohibited clauses). I guess we'll need to discuss that one more. Make Collector.acceptsDocsOutOfOrder and QueryWeight.scoresDocsOutOfOrder abstract - I think the default impl makes sense for most of the imps out there and the ones in core, but I don't have a strong feeling against making it abstract. All tests pass, and javadocs are good as well.
          Hide
          Shai Erera added a comment -

          Thanks for the review Mike. Answers below. The comments which I did not answer will be fixed.

          Can we make Collector.supportsDocsOutOfOrder abstract? Defaulting to false isn't great (I'd rather subclass think about the question).

          In general, I tried to avoid it since that would require changing all core Collectors. There aren't many, but still ...

          This goes for QueryWeight.scoresOutOfOrder - wanted to avoid changing all core Weights to impl the method w/ "return false". I actually think that many Weights/Scorers do score documents in-order, hence the default impl.

          If a given Scorer.scoresOutOfOrder returns true, does that mean nextDoc is allowed to return docs out of order?

          When you deal with a Scorer which returns out-of-order, you can only call scorer.score(Collector). If you're going to iterate, you're going to have to create a Scorer in-order, and that's what IndexSearcher does. I'll spell it out clearly in the javadocs.

          Should scoresOutOfOrder() move from QueryWeight --> Scorer?

          We've discussed it few posts up. When this information in in Scorer, I should first ask for a Scorer, and only then I can create a Collector. If I'll use the Scorer immediately, then that'll be ok. However, that's not the case in IndexSearcher, and results in a bug in Spatial, and unless we want to uglify IndexSearcher code, it seemed that this can sit in QueryWeight.

          But I do think it's a problematic method in QW too, since if it returns false by default, I'll create a Collector which expects docs in-order, but then I'd lose the optimization in BooleanWeight which may return an out-of-order superior Scorer. If I return true, I'll create a Collector which expects out-of-order, and the Scorer (again, an example from BW) may be actually in-order, and I've wasted unnecessary 'if doc > topDoc' cycles.

          So I don't know what's better: make IndexSearcher code more complicated or sacrifice a potential loss of this optimization?

          Actually I think the way to factor the static setting in is backwards? Shouldn't it be scoreDocsInOrder |= !allowDocsOutOfOrder?

          Yes, nice catch

          Shouldn't Searchable cutover to QueryWeight too? (We are keeping Searchable, but allowing changes to it)

          I wrote that above too - I don't think we can declare and execute right in 2.9 that Searchable can be changed unexpectedly. So I added a NOTE to its javadocs and thought to do the change post 2.9, when we remove Weight. We'd be forced to change these methods to QueryWeight, and fix RemoteSearchable too. And it will be consistent w/ our back-compat policy (at least the part where we declare on an upcoming change before it happens).

          But if you think otherwise, I don't mind deprecating and adding new methods (I've got used to it already, I almost do it blindly ).

          I'll fix the other comments, and post a patch back after we resolve the remaining open issues.

          Show
          Shai Erera added a comment - Thanks for the review Mike. Answers below. The comments which I did not answer will be fixed. Can we make Collector.supportsDocsOutOfOrder abstract? Defaulting to false isn't great (I'd rather subclass think about the question). In general, I tried to avoid it since that would require changing all core Collectors. There aren't many, but still ... This goes for QueryWeight.scoresOutOfOrder - wanted to avoid changing all core Weights to impl the method w/ "return false". I actually think that many Weights/Scorers do score documents in-order, hence the default impl. If a given Scorer.scoresOutOfOrder returns true, does that mean nextDoc is allowed to return docs out of order? When you deal with a Scorer which returns out-of-order, you can only call scorer.score(Collector). If you're going to iterate, you're going to have to create a Scorer in-order, and that's what IndexSearcher does. I'll spell it out clearly in the javadocs. Should scoresOutOfOrder() move from QueryWeight --> Scorer? We've discussed it few posts up. When this information in in Scorer, I should first ask for a Scorer, and only then I can create a Collector. If I'll use the Scorer immediately, then that'll be ok. However, that's not the case in IndexSearcher, and results in a bug in Spatial, and unless we want to uglify IndexSearcher code, it seemed that this can sit in QueryWeight. But I do think it's a problematic method in QW too, since if it returns false by default, I'll create a Collector which expects docs in-order, but then I'd lose the optimization in BooleanWeight which may return an out-of-order superior Scorer. If I return true, I'll create a Collector which expects out-of-order, and the Scorer (again, an example from BW) may be actually in-order, and I've wasted unnecessary 'if doc > topDoc' cycles. So I don't know what's better: make IndexSearcher code more complicated or sacrifice a potential loss of this optimization? Actually I think the way to factor the static setting in is backwards? Shouldn't it be scoreDocsInOrder |= !allowDocsOutOfOrder? Yes, nice catch Shouldn't Searchable cutover to QueryWeight too? (We are keeping Searchable, but allowing changes to it) I wrote that above too - I don't think we can declare and execute right in 2.9 that Searchable can be changed unexpectedly. So I added a NOTE to its javadocs and thought to do the change post 2.9, when we remove Weight. We'd be forced to change these methods to QueryWeight, and fix RemoteSearchable too. And it will be consistent w/ our back-compat policy (at least the part where we declare on an upcoming change before it happens). But if you think otherwise, I don't mind deprecating and adding new methods (I've got used to it already, I almost do it blindly ). I'll fix the other comments, and post a patch back after we resolve the remaining open issues.
          Hide
          Michael McCandless added a comment -

          Patch looks good!

          • There are some new javadoc warnings (wrong links)
          • Maybe, in changes, add that BooleanQuery will now score docs out
            of order when used with a Collector that can accept docs out of
            order? Ie this is a good ootb perf gain for or queries, sorting
            by field or score.
            .
            Oh, ugh, it seems we don't actually do that today (because we
            respect the static setting). Hmm. Can't we change that static
            default (BooleanQuery.allowDocsOutOfOrder) to true? Because, our
            core collectors can handle it, and any custom collector will by
            default say it cannot handle it so there wouldn't be a break in
            back compat?
          • That's a nice cleanup, seeing BooleanQuery decide which scorer
            impl to return
          • We don't need the " = null" initializer on
            BooleanScorer2.countingSumScorer's decl
          • Can we make Collector.supportsDocsOutOfOrder abstract? Defaulting
            to false isn't great (I'd rather subclass think about the
            question).
          • Can we rename Collector.supportsDocsOutOfOrder ->
            acceptsDocsOutOfOrder?
          • Can we also make QueryWeight.scoresOutOfOrder abstract?
          • CustomScoreQuery.scoresOutOfOrder should only look at its
            subQueryWeight?
          • If a given Scorer.scoresOutOfOrder returns true, does that mean
            nextDoc is allowed to return docs out of order? How is advance
            defined for such scorers? (BooleanScore does this, and its
            advance throws UOE). Maybe we allow this but advance may not
            work for such scorers?
          • Should scoresOutOfOrder() move from QueryWeight --> Scorer? I may
            call QueryWeight.scorer with scoreDocsInOrder=false, but many will
            in fact return a scorer that does score docs in order (eg BQ does
            this depending on what kind fo clauses, and how many, it has), and
            we could then pick a faster collector in such cases?
          • Shouldn't DisjunctionMaxQuery pass true for scoreDocsInOrder when
            asking its sub-queries for their scorers? Ie even if its callee
            allow out-of-order scoring, it requires in-order of its children?
          • I think DisjunctionMaxQuery.scoresOutOfOrder should simply return
            false?
          • Actually I think the way to factor the static setting in is
            backwards? Shouldn't it be scoreDocsInOrder |= !allowDocsOutOfOrder?
          • Can you sharpen the javadocs for boolean topScorer param? Ie, "if
            true, score(Collector) will be called; if false, nextDoc/advance
            will be called". (I found myself momentarily wondering if
            DocumentWriter's usage of Scorer API was a topScorer).
          • Shouldn't Searchable cutover to QueryWeight too? (We are keeping
            Searchable, but allowing changes to it)
          • Nice catch, removing Searchable's now-wrong NOTE about scoring
            when sorting by field!
          Show
          Michael McCandless added a comment - Patch looks good! There are some new javadoc warnings (wrong links) Maybe, in changes, add that BooleanQuery will now score docs out of order when used with a Collector that can accept docs out of order? Ie this is a good ootb perf gain for or queries, sorting by field or score. . Oh, ugh, it seems we don't actually do that today (because we respect the static setting). Hmm. Can't we change that static default (BooleanQuery.allowDocsOutOfOrder) to true? Because, our core collectors can handle it, and any custom collector will by default say it cannot handle it so there wouldn't be a break in back compat? That's a nice cleanup, seeing BooleanQuery decide which scorer impl to return We don't need the " = null" initializer on BooleanScorer2.countingSumScorer's decl Can we make Collector.supportsDocsOutOfOrder abstract? Defaulting to false isn't great (I'd rather subclass think about the question). Can we rename Collector.supportsDocsOutOfOrder -> acceptsDocsOutOfOrder? Can we also make QueryWeight.scoresOutOfOrder abstract? CustomScoreQuery.scoresOutOfOrder should only look at its subQueryWeight? If a given Scorer.scoresOutOfOrder returns true, does that mean nextDoc is allowed to return docs out of order? How is advance defined for such scorers? (BooleanScore does this, and its advance throws UOE). Maybe we allow this but advance may not work for such scorers? Should scoresOutOfOrder() move from QueryWeight --> Scorer? I may call QueryWeight.scorer with scoreDocsInOrder=false, but many will in fact return a scorer that does score docs in order (eg BQ does this depending on what kind fo clauses, and how many, it has), and we could then pick a faster collector in such cases? Shouldn't DisjunctionMaxQuery pass true for scoreDocsInOrder when asking its sub-queries for their scorers? Ie even if its callee allow out-of-order scoring, it requires in-order of its children? I think DisjunctionMaxQuery.scoresOutOfOrder should simply return false? Actually I think the way to factor the static setting in is backwards? Shouldn't it be scoreDocsInOrder |= !allowDocsOutOfOrder ? Can you sharpen the javadocs for boolean topScorer param? Ie, "if true, score(Collector) will be called; if false, nextDoc/advance will be called". (I found myself momentarily wondering if DocumentWriter's usage of Scorer API was a topScorer). Shouldn't Searchable cutover to QueryWeight too? (We are keeping Searchable, but allowing changes to it) Nice catch, removing Searchable's now-wrong NOTE about scoring when sorting by field!
          Hide
          Shai Erera added a comment -

          Changed Query.createQueryWeight to public, as was suggested by Yonik. All tests pass.

          Still no volunteers?

          Show
          Shai Erera added a comment - Changed Query.createQueryWeight to public, as was suggested by Yonik. All tests pass. Still no volunteers?
          Hide
          Shai Erera added a comment -

          ok - let's start iterating on the patch. Anyone volunteer to accept it (and then I'll update CHANGES "via ?")?

          Patch include:

          • QueryWeight with the new scorer(IndexReader, soreDocsInOrder, topScorer) and scoresOutOfOrder().
          • Added methods to Searcher (this breaks back-compat, but it's already broken here because of 1575).
          • BooleanWeight now creates BS or BS2 up front, and therefore BS2's code is simplified.

          All tests pass.

          Show
          Shai Erera added a comment - ok - let's start iterating on the patch. Anyone volunteer to accept it (and then I'll update CHANGES "via ?")? Patch include: QueryWeight with the new scorer(IndexReader, soreDocsInOrder, topScorer) and scoresOutOfOrder(). Added methods to Searcher (this breaks back-compat, but it's already broken here because of 1575). BooleanWeight now creates BS or BS2 up front, and therefore BS2's code is simplified. All tests pass.
          Hide
          Earwin Burrfoot added a comment - - edited

          I like the last option (move scoresOutOfOrder to Weight) most. Creating dummy scorer looks ugly to me, and looks like it will cause more problems of the same kind in the future.

          Show
          Earwin Burrfoot added a comment - - edited I like the last option (move scoresOutOfOrder to Weight) most. Creating dummy scorer looks ugly to me, and looks like it will cause more problems of the same kind in the future.
          Hide
          Shai Erera added a comment -

          Ok I was just about to post the patch, when the Spatial tests failed. After some investigation, I found out the following, and would appreciate your suggestions. IndexSearcher.search(QueryWeight weight, Filter filter, final int nDocs, Sort sort, boolean fillFields) I wrote the following code:

              // try to create a Scorer in out-of-order mode, just to know which TFC
              // version to instantiate.
              boolean docsScoredInOrder = false;
              if (subReaders.length > 0) {
                docsScoredInOrder = !weight.scorer(subReaders[0], false, false).scoresOutOfOrder();
              }
              TopFieldCollector collector = TopFieldCollector.create(sort, nDocs,
                  fillFields, fieldSortDoTrackScores, fieldSortDoMaxScore, docsScoredInOrder);
              search(weight, filter, collector);
          

          For clarification - I need to know which TFC instance to create (in-order / out-of-order). For that, I need to first create a Scorer, asking for out-of-order one, and then check whether the Scorer is indeed an out-of-order or not. That's a dummy Scorer, as I never use it afterwards, but since we didn't want to add scoresOutOfOrder to Weight, but Scorer, I don't have any other choice.

          For Spatial, this creates a problem. One of the tests uses ConstantScoreQuery and passes in a Filter. CSQ.scorer() creates a new Scorer and uses the given Filter as reference. In Spatial, every time Filter.getDocIdSet() is called, the internal filter populates a WeakHashMap of distances (with the doc id as key), and doesn't clear it between invocations. It also updates the "base" of the key to handle multiple readers. Therefore the docs of the first reader are added twice - once for the dummy invocation and the second time since the "base" is updated (LatLongDistanceFilter.java, line 222) to reader.maxDoc().

          I tried to create a new "distances" map on every invocation, but then another test fails. I don't know this code very well, and I don't know which is the best solution:

          • Complicate the code in IndexSearcher to create a Scorer, then collect it and then proceed w/ iterating on the readers, from the 2nd forward. This is a real ugly change, I tried it and quickly reverted. It also breaks the current beauty of having all the search methods call search(Weight, Filter, Collector).
          • Fix LatLongDistanceFilter code to check if reader.maxDoc() == nextOffset, then do nextOffset -= reader.maxDoc(). This is not pretty either, since it assumes a certain implementation and use of it, which I don't like either.
          • Add scoresOutOfOrder to Weight, but I don't know if we want to add this "knowledge" to Weight, and it fits nicely in Scorer.

          Any suggestions? perhaps a different fix to Spatial?

          Show
          Shai Erera added a comment - Ok I was just about to post the patch, when the Spatial tests failed. After some investigation, I found out the following, and would appreciate your suggestions. IndexSearcher.search(QueryWeight weight, Filter filter, final int nDocs, Sort sort, boolean fillFields) I wrote the following code: // try to create a Scorer in out-of-order mode, just to know which TFC // version to instantiate. boolean docsScoredInOrder = false ; if (subReaders.length > 0) { docsScoredInOrder = !weight.scorer(subReaders[0], false , false ).scoresOutOfOrder(); } TopFieldCollector collector = TopFieldCollector.create(sort, nDocs, fillFields, fieldSortDoTrackScores, fieldSortDoMaxScore, docsScoredInOrder); search(weight, filter, collector); For clarification - I need to know which TFC instance to create (in-order / out-of-order). For that, I need to first create a Scorer, asking for out-of-order one, and then check whether the Scorer is indeed an out-of-order or not. That's a dummy Scorer, as I never use it afterwards, but since we didn't want to add scoresOutOfOrder to Weight, but Scorer, I don't have any other choice. For Spatial, this creates a problem. One of the tests uses ConstantScoreQuery and passes in a Filter. CSQ.scorer() creates a new Scorer and uses the given Filter as reference. In Spatial, every time Filter.getDocIdSet() is called, the internal filter populates a WeakHashMap of distances (with the doc id as key), and doesn't clear it between invocations. It also updates the "base" of the key to handle multiple readers. Therefore the docs of the first reader are added twice - once for the dummy invocation and the second time since the "base" is updated (LatLongDistanceFilter.java, line 222) to reader.maxDoc(). I tried to create a new "distances" map on every invocation, but then another test fails. I don't know this code very well, and I don't know which is the best solution: Complicate the code in IndexSearcher to create a Scorer, then collect it and then proceed w/ iterating on the readers, from the 2nd forward. This is a real ugly change, I tried it and quickly reverted. It also breaks the current beauty of having all the search methods call search(Weight, Filter, Collector). Fix LatLongDistanceFilter code to check if reader.maxDoc() == nextOffset, then do nextOffset -= reader.maxDoc(). This is not pretty either, since it assumes a certain implementation and use of it, which I don't like either. Add scoresOutOfOrder to Weight, but I don't know if we want to add this "knowledge" to Weight, and it fits nicely in Scorer. Any suggestions? perhaps a different fix to Spatial?
          Hide
          Shai Erera added a comment -

          If that's agreed, then I'll do it and put a NOTE in its javadocs. I'll also remove some deprecated methods (Collector vs. HitCollector). I don't know if I should keep the Weight methods around and change them to QueryWeight in 3.0, or do the change now. Just want to minimize the noise.

          Show
          Shai Erera added a comment - If that's agreed, then I'll do it and put a NOTE in its javadocs. I'll also remove some deprecated methods (Collector vs. HitCollector). I don't know if I should keep the Weight methods around and change them to QueryWeight in 3.0, or do the change now. Just want to minimize the noise.
          Hide
          Yonik Seeley added a comment -

          We could simply declare, going forward, that we may add methods to Searchable?

          I like this approach - there are super-expert classes like this that most don't override or implement, but some do. It seems better for all involved if it's allowed to stay public with suitable disclaimers and less stringent back compat.

          Show
          Yonik Seeley added a comment - We could simply declare, going forward, that we may add methods to Searchable? I like this approach - there are super-expert classes like this that most don't override or implement, but some do. It seems better for all involved if it's allowed to stay public with suitable disclaimers and less stringent back compat.
          Hide
          Shai Erera added a comment -

          My plan was to deprecate Searchable and make it package-private in 3.0. That way, we can enjoy both worlds - change Searchable as we want, and use it internally for RemoteSearchable as well as MultiSearcher. We can even not add the new QueryWeight variants to it at this point, since the class will be marked deprecated. We only need them in Searcher.

          Show
          Shai Erera added a comment - My plan was to deprecate Searchable and make it package-private in 3.0. That way, we can enjoy both worlds - change Searchable as we want, and use it internally for RemoteSearchable as well as MultiSearcher. We can even not add the new QueryWeight variants to it at this point, since the class will be marked deprecated. We only need them in Searcher.
          Hide
          Simon Willnauer added a comment -

          Keep in mind that it is easier to include a contrib jar than moving an existing infrastructure to Solr.
          We can still add methods to Searchable and define another interface in contrib/remote. still easier than moving to solr for some people. I'm not an RMI friend but based on all the back conpat discussions we should try to misplease everybody equally.

          simon

          Show
          Simon Willnauer added a comment - Keep in mind that it is easier to include a contrib jar than moving an existing infrastructure to Solr. We can still add methods to Searchable and define another interface in contrib/remote. still easier than moving to solr for some people. I'm not an RMI friend but based on all the back conpat discussions we should try to misplease everybody equally. simon
          Hide
          Michael McCandless added a comment -

          We could simply declare, going forward, that we may add methods to Searchable?

          Or... instead of moving RemoteSearchable out to contrib/remote, we could simply remove it, and state that apps should move to their own means of remoting searching (or, use Solr)?

          Show
          Michael McCandless added a comment - We could simply declare, going forward, that we may add methods to Searchable? Or... instead of moving RemoteSearchable out to contrib/remote, we could simply remove it, and state that apps should move to their own means of remoting searching (or, use Solr)?
          Hide
          Shai Erera added a comment -

          Given the changes in LUCENE-1407, I don't think we can deprecate Searchable, but instead change it to include the new QueryWeight methods. Either that or breaking Weight to include the new needed methods.

          I lean towards breaking Searchable, since I think there's a much larger chance that there are Weight implementations out there, but only few Searchable/Searcher. And the migration is really simple - just add another method which accepts QueryWeight, and have the Weight variant call the other method with QueryWeightWrapper.

          What do you think?

          MultiSearcher also expects Searchable ... Maybe we should just make Searchable package-private (in 3.0)? That way it will only be extended by RemoteSearchable. MultiSearcher can then move to work w/ Searcher, and have a package-private ctor which accepts Searchable, just in case someone wants to use RemoteSearchable.

          I'd like to post a patch soon, so any comments are welcome.

          Show
          Shai Erera added a comment - Given the changes in LUCENE-1407 , I don't think we can deprecate Searchable, but instead change it to include the new QueryWeight methods. Either that or breaking Weight to include the new needed methods. I lean towards breaking Searchable, since I think there's a much larger chance that there are Weight implementations out there, but only few Searchable/Searcher. And the migration is really simple - just add another method which accepts QueryWeight, and have the Weight variant call the other method with QueryWeightWrapper. What do you think? MultiSearcher also expects Searchable ... Maybe we should just make Searchable package-private (in 3.0)? That way it will only be extended by RemoteSearchable. MultiSearcher can then move to work w/ Searcher, and have a package-private ctor which accepts Searchable, just in case someone wants to use RemoteSearchable. I'd like to post a patch soon, so any comments are welcome.
          Hide
          Shai Erera added a comment -

          Over in LUCENE-1407 I made some comments about deprecating Searchable and moving RMI out from core. It looks like we have a problem - Today RemoteSearchable implements Searchalbe (and by inheritance java.rmi.Remote) as well as extend UnicastRemoteObject. That's because a class has to extend a concrete Remote class, or implement a concrete Remote sub-interface.

          After refactoring, it will need to extend both Searcher and UnicastRemoteObject. Unless we make Searcher extend UnicastRemoteObject, but that will bring the RMI stuff back into core.

          So either I leave Searchable around, and add a new QueryWeight variant method to it, or we make Searcher extend UnicastRemoteObject and at a later point have RemoteSearchable include some concrete implementation of RemoteServer (i.e., implement on our own, when it's in contrib) and then remove the extension of UnicastRemoteObject by Searcher.

          Regarding back-compat, due to the Collector work we already made changes to the interface as well as Searcher, so I don't think that should worry us.

          What do you think?

          Show
          Shai Erera added a comment - Over in LUCENE-1407 I made some comments about deprecating Searchable and moving RMI out from core. It looks like we have a problem - Today RemoteSearchable implements Searchalbe (and by inheritance java.rmi.Remote) as well as extend UnicastRemoteObject. That's because a class has to extend a concrete Remote class, or implement a concrete Remote sub-interface. After refactoring, it will need to extend both Searcher and UnicastRemoteObject. Unless we make Searcher extend UnicastRemoteObject, but that will bring the RMI stuff back into core. So either I leave Searchable around, and add a new QueryWeight variant method to it, or we make Searcher extend UnicastRemoteObject and at a later point have RemoteSearchable include some concrete implementation of RemoteServer (i.e., implement on our own, when it's in contrib) and then remove the extension of UnicastRemoteObject by Searcher. Regarding back-compat, due to the Collector work we already made changes to the interface as well as Searcher, so I don't think that should worry us. What do you think?
          Hide
          Earwin Burrfoot added a comment -

          Searcher is supposed to be a little cherry of userfriendliness atop a glass of Lucene murky internals, ain't it?
          I mean, even you had to be explained the ways of Query, Weight and Scorer, what would a Lucene neophyte do if we remove his beloved convenience methods?

          Show
          Earwin Burrfoot added a comment - Searcher is supposed to be a little cherry of userfriendliness atop a glass of Lucene murky internals, ain't it? I mean, even you had to be explained the ways of Query, Weight and Scorer, what would a Lucene neophyte do if we remove his beloved convenience methods?
          Hide
          Shai Erera added a comment -

          So with all its strange looks the trio of Q, W, S is still the best approach if you ask me.

          Ok, I think I'm convinced. Weight looks like it makes reusing a Query instance easier.

          So how about my other proposal, to get rid of Query-based search methods in Searcher, and stick w/ QueryWeight only? Maybe even add on Searcher a method createWeight(Query) if that's needed.

          Show
          Shai Erera added a comment - So with all its strange looks the trio of Q, W, S is still the best approach if you ask me. Ok, I think I'm convinced. Weight looks like it makes reusing a Query instance easier. So how about my other proposal, to get rid of Query-based search methods in Searcher, and stick w/ QueryWeight only? Maybe even add on Searcher a method createWeight(Query) if that's needed.
          Hide
          Earwin Burrfoot added a comment -

          You can't, because Weights produced from same Query are different for different indexes.
          You can probably modify Query inplace for a given index, produce some scorers, do scoring, then modify Query for another index, produce scorers, etc..
          But now your Query is no longer thread-safe, and I can't reuse it from different threads.

          So with all its strange looks the trio of Q, W, S is still the best approach if you ask me.

          Show
          Earwin Burrfoot added a comment - You can't, because Weights produced from same Query are different for different indexes. You can probably modify Query inplace for a given index, produce some scorers, do scoring, then modify Query for another index, produce scorers, etc.. But now your Query is no longer thread-safe, and I can't reuse it from different threads. So with all its strange looks the trio of Q, W, S is still the best approach if you ask me.
          Hide
          Shai Erera added a comment -

          Thanks for the explanation.

          Couldn't we can merge Weight with Query in a way that you could still reuse the Query instance? Weight has 6 methods:

          • getQuery will not be required anymore.
          • getValue/sumOfSquareWeigths/normalize - I think these can be part of Query.
          • scorer/explain - can be part of Query and we keep on passing an IndexReader.
            Though I admit I haven't checked all Weight implementations, I think this can work?

          If not, then I believe Marvin's idea to have QueryWeight subclass Query, was to simplify the Searcher API. We can still do that w/o having QW subclass Q, by for example expose only a Weight API on Searcher? Eventually, all the Query search() variants call a Weight variant.

          I just think that sub-classing Q by QW is not the right way, just for them to have the same super type ... So if we can't get rid of Weight and stuff all in Query, I think we should stay with the Query, QueryWeight, Scorer separation.

          Show
          Shai Erera added a comment - Thanks for the explanation. Couldn't we can merge Weight with Query in a way that you could still reuse the Query instance? Weight has 6 methods: getQuery will not be required anymore. getValue/sumOfSquareWeigths/normalize - I think these can be part of Query. scorer/explain - can be part of Query and we keep on passing an IndexReader. Though I admit I haven't checked all Weight implementations, I think this can work? If not, then I believe Marvin's idea to have QueryWeight subclass Query, was to simplify the Searcher API. We can still do that w/o having QW subclass Q, by for example expose only a Weight API on Searcher? Eventually, all the Query search() variants call a Weight variant. I just think that sub-classing Q by QW is not the right way, just for them to have the same super type ... So if we can't get rid of Weight and stuff all in Query, I think we should stay with the Query, QueryWeight, Scorer separation.
          Hide
          Earwin Burrfoot added a comment -

          To me, it'd be ok to get rid of Weight entirely and just have Query and Scorer, while Query.scorer() do whatever Weight does today. But Marvin mentions reusing Query objects (which I don't fully understand what that means - is it reusing the instance or the code), so I'd like to hear your thoughts.

          As far as my understanding goes:
          Query is reusable instance-wise. Weight is a Query primed with index-dependent stuff, like freqs that are used to calculate the score. Scorer is something that actually iterates on the docs and does the scoring.
          With per-segment collection you usually get a Query, produce a Weight from it using toplevel IR (most likely MSR), so you have freqs for the whole index, and then get a Scorer for each of the segments and iterate on them.
          I don't see how you can get freqs from toplevel reader and iterate on lower-level reader if you merge Weight and Score.
          Merging Weight and Query back is something I don't like too - I use a huge bunch of pre-parsed Queries that are run against ever-different indexes.

          Show
          Earwin Burrfoot added a comment - To me, it'd be ok to get rid of Weight entirely and just have Query and Scorer, while Query.scorer() do whatever Weight does today. But Marvin mentions reusing Query objects (which I don't fully understand what that means - is it reusing the instance or the code), so I'd like to hear your thoughts. As far as my understanding goes: Query is reusable instance-wise. Weight is a Query primed with index-dependent stuff, like freqs that are used to calculate the score. Scorer is something that actually iterates on the docs and does the scoring. With per-segment collection you usually get a Query, produce a Weight from it using toplevel IR (most likely MSR), so you have freqs for the whole index, and then get a Scorer for each of the segments and iterate on them. I don't see how you can get freqs from toplevel reader and iterate on lower-level reader if you merge Weight and Score. Merging Weight and Query back is something I don't like too - I use a huge bunch of pre-parsed Queries that are run against ever-different indexes.
          Hide
          Shai Erera added a comment -

          I want to start working on that once LUCENE-1614 is resolved. In the meantime, I'd like to discuss whether we should define a topScorer() method on Weight, or have Weight introduce a score(Collector) method.
          I think I prefer the latter since even if I call topScorer and get back a Scorer, nothing prevents me from iterating on the Scorer, rather than calling its score(Collector) method. Also, we can deprecate Scorer.score(Collector), and come up w/ another one which is package-private.

          Another thing that was brought up by Mavin:

          If you make QueryWeight a subclass of Query, do you need any new methods?

          Before Weight existed, only Query and Scorer existed. Compiling a Scorer
          involved "weighting the query", by factoring IDF etc, then calling
          query.Scorer(). To make Query objects reusable, Weight was introduced as an
          intermediate stage. Making QueryWeight a subclass of Query would be entirely
          within the spirit of the original design, since the role played by Weight was
          originally performed by a Query.

          Marvin Humphrey

          Today Weight exposes only few methods, and I don't want to have to implement some of Query methods, just because QueryWeight (the replacement of Weight) will subclass Query. But I don't know enough about what led to that decision.

          To me, it'd be ok to get rid of Weight entirely and just have Query and Scorer, while Query.scorer() do whatever Weight does today. But Marvin mentions reusing Query objects (which I don't fully understand what that means - is it reusing the instance or the code), so I'd like to hear your thoughts.

          If we do get rid of Weight, or make QueryWeight a subclass of Query, it will simplify the Searcher API.

          Show
          Shai Erera added a comment - I want to start working on that once LUCENE-1614 is resolved. In the meantime, I'd like to discuss whether we should define a topScorer() method on Weight, or have Weight introduce a score(Collector) method. I think I prefer the latter since even if I call topScorer and get back a Scorer, nothing prevents me from iterating on the Scorer, rather than calling its score(Collector) method. Also, we can deprecate Scorer.score(Collector), and come up w/ another one which is package-private. Another thing that was brought up by Mavin: If you make QueryWeight a subclass of Query, do you need any new methods? Before Weight existed, only Query and Scorer existed. Compiling a Scorer involved "weighting the query", by factoring IDF etc, then calling query.Scorer(). To make Query objects reusable, Weight was introduced as an intermediate stage. Making QueryWeight a subclass of Query would be entirely within the spirit of the original design, since the role played by Weight was originally performed by a Query. Marvin Humphrey Today Weight exposes only few methods, and I don't want to have to implement some of Query methods, just because QueryWeight (the replacement of Weight) will subclass Query. But I don't know enough about what led to that decision. To me, it'd be ok to get rid of Weight entirely and just have Query and Scorer, while Query.scorer() do whatever Weight does today. But Marvin mentions reusing Query objects (which I don't fully understand what that means - is it reusing the instance or the code), so I'd like to hear your thoughts. If we do get rid of Weight, or make QueryWeight a subclass of Query, it will simplify the Searcher API.

            People

            • Assignee:
              Michael McCandless
              Reporter:
              Shai Erera
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development