Lucene - Core
  1. Lucene - Core
  2. LUCENE-2831

Revise Weight#scorer & Filter#getDocIdSet API to pass Readers context

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.0-ALPHA
    • Fix Version/s: 4.0-ALPHA
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      Spinoff from LUCENE-2694 - instead of passing a reader into Weight#scorer(IR, boolean, boolean) we should / could revise the API and pass in a struct that has parent reader, sub reader, ord of that sub. The ord mapping plus the context with its parent would make several issues way easier. See LUCENE-2694, LUCENE-2348 and LUCENE-2829 to name some.

      1. LUCENE-2831-recursion.patch
        1 kB
        Michael McCandless
      2. LUCENE-2831-recursion.patch
        3 kB
        Michael McCandless
      3. LUCENE-2831-recursion.patch
        4 kB
        Michael McCandless
      4. LUCENE-2831-recursion.patch
        2 kB
        Simon Willnauer
      5. LUCENE-2831-nuke-SolrIndexReader.patch
        141 kB
        Simon Willnauer
      6. LUCENE-2831-nuke-SolrIndexReader.patch
        141 kB
        Simon Willnauer
      7. LUCENE-2831-no_sub_searcher.patch
        14 kB
        Simon Willnauer
      8. LUCENE-2831-no_sub_searcher.patch
        15 kB
        Simon Willnauer
      9. LUCENE-2831.patch
        152 kB
        Simon Willnauer
      10. LUCENE-2831.patch
        161 kB
        Simon Willnauer
      11. LUCENE-2831.patch
        168 kB
        Simon Willnauer
      12. LUCENE-2831.patch
        169 kB
        Simon Willnauer
      13. LUCENE-2831.patch
        169 kB
        Simon Willnauer
      14. LUCENE-2831.patch
        170 kB
        Simon Willnauer
      15. LUCENE-2831_transition_to_atomicCtx.patch
        103 kB
        Simon Willnauer
      16. LUCENE-2831_transition_to_atomicCtx.patch
        78 kB
        Simon Willnauer
      17. LUCENE-2831_transition_to_atomicCtx.patch
        77 kB
        Simon Willnauer

        Issue Links

          Activity

          Hide
          Simon Willnauer added a comment -

          Committed revision 1066669.

          Show
          Simon Willnauer added a comment - Committed revision 1066669.
          Hide
          Simon Willnauer added a comment -

          another iteration - fixed / added some javadocs and marked the LeafSlice experimental. I will wait a bit an commit later. All tests pass even with LUCENE-2751

          Show
          Simon Willnauer added a comment - another iteration - fixed / added some javadocs and marked the LeafSlice experimental. I will wait a bit an commit later. All tests pass even with LUCENE-2751
          Hide
          Michael McCandless added a comment -

          Patch looks great! No more schitzo sub searchers!

          +1 to commit

          Show
          Michael McCandless added a comment - Patch looks great! No more schitzo sub searchers! +1 to commit
          Hide
          Simon Willnauer added a comment -

          I think we can really get rid of the sub searchers and do it all on the top level searcher. I just sketched something out how this should be done from my point of view. The executors should only specify the ARC slice they want to execute and use the top level searcher to do the searches.

          this patch is just for illustration purposes... Jdocs need to be fixed etc.

          I think if we do it that way the semantics are clear for IS.

          Show
          Simon Willnauer added a comment - I think we can really get rid of the sub searchers and do it all on the top level searcher. I just sketched something out how this should be done from my point of view. The executors should only specify the ARC slice they want to execute and use the top level searcher to do the searches. this patch is just for illustration purposes... Jdocs need to be fixed etc. I think if we do it that way the semantics are clear for IS.
          Hide
          Michael McCandless added a comment -

          Patch looks good!

          But, can you change the new assert to say something like "cannot access top context when IS is a leaf reader" or something? Right now if you trip that assert it's like not clear what's gone wrong...

          And I think either remove the jdoc on that method, or, clarify that it's only sugar when IS is not based on an leaf reader?

          I really don't like this schitzo IS though Sometimes it's top reader sometimes it's leaf reader. But, the schitzo IS's should never "escape" out of the top IS that has an ES.

          Show
          Michael McCandless added a comment - Patch looks good! But, can you change the new assert to say something like "cannot access top context when IS is a leaf reader" or something? Right now if you trip that assert it's like not clear what's gone wrong... And I think either remove the jdoc on that method, or, clarify that it's only sugar when IS is not based on an leaf reader? I really don't like this schitzo IS though Sometimes it's top reader sometimes it's leaf reader. But, the schitzo IS's should never "escape" out of the top IS that has an ES.
          Hide
          Simon Willnauer added a comment -

          here is a slightly different patch that makes the dangerous ctor private and uses the leaf's reader as the IS reader. I also put an assert into getTopReaderContext to assert that nobody pulls a toplevel context from the schizo IS.

          All tests pass with LUCENE-2751

          Show
          Simon Willnauer added a comment - here is a slightly different patch that makes the dangerous ctor private and uses the leaf's reader as the IS reader. I also put an assert into getTopReaderContext to assert that nobody pulls a toplevel context from the schizo IS. All tests pass with LUCENE-2751
          Hide
          Yonik Seeley added a comment -

          fixed QueryValueSource. Since that class directly controls when it invokes weight.scorer, I think it's safe to just pass the top context?

          Yep. Note that it's fallback code in any case - the normal path is that it should have already been weighted and the weight will be found in the context.

          Show
          Yonik Seeley added a comment - fixed QueryValueSource. Since that class directly controls when it invokes weight.scorer, I think it's safe to just pass the top context? Yep. Note that it's fallback code in any case - the normal path is that it should have already been weighted and the weight will be found in the context.
          Hide
          Michael McCandless added a comment -

          Another rev – fixed QueryValueSource. Since that class directly controls when it invokes weight.scorer, I think it's safe to just pass the top context?

          Show
          Michael McCandless added a comment - Another rev – fixed QueryValueSource. Since that class directly controls when it invokes weight.scorer, I think it's safe to just pass the top context?
          Hide
          Michael McCandless added a comment -

          Patch to fix recursion, take 2.

          Still not sure about it... had to comment out a different assert

          Show
          Michael McCandless added a comment - Patch to fix recursion, take 2. Still not sure about it... had to comment out a different assert
          Hide
          Simon Willnauer added a comment -

          Simon can you check this? Thanks.

          this seems wrong since you don't maintain the actual hierarchy between parent and child. We need this for certain assertions down the road. Yet, what's problematic here is that it takes more than one leaf which kills the sematics of methods like IS#docFreq() as you figured out on IRC. If the parents reader is used for a searcher that only operates on partial leaves we have a problem. I think we need to shrapen that ctor to only take one sub and use the subs reader as the IS's reader. This should make the semantics clear. Ideally we should also make this ctor private one the users of it are "fixed". But for now

          IndexSearcher(ReaderContext ctx, AtomicReaderContext leaf)
          ...
          

          should be fine ey?

          Show
          Simon Willnauer added a comment - Simon can you check this? Thanks. this seems wrong since you don't maintain the actual hierarchy between parent and child. We need this for certain assertions down the road. Yet, what's problematic here is that it takes more than one leaf which kills the sematics of methods like IS#docFreq() as you figured out on IRC. If the parents reader is used for a searcher that only operates on partial leaves we have a problem. I think we need to shrapen that ctor to only take one sub and use the subs reader as the IS's reader. This should make the semantics clear. Ideally we should also make this ctor private one the users of it are "fixed". But for now IndexSearcher(ReaderContext ctx, AtomicReaderContext leaf) ... should be fine ey?
          Hide
          Michael McCandless added a comment -

          I think something like this patch is needed to fix the two "recursion" problems? The thing is, I needed to comment out that assert!

          Simon can you check this? Thanks.

          Show
          Michael McCandless added a comment - I think something like this patch is needed to fix the two "recursion" problems? The thing is, I needed to comment out that assert! Simon can you check this? Thanks.
          Hide
          Simon Willnauer added a comment -

          some recursion bugs in IndexSearcher(IndexReader r, boolean closeReader, ExecutorService executor)

          what's the recursion bug here? I have no idea why we are passing "this" to the subs here - maybe to save a single instance?!

          i noticed problems with IndexSearcher(ReaderContext topLevel, AtomicReaderContext... leaves), the varargs causes this ctor to become ambiguous with other ctors (such as ReaderContext, ExecutorService). I don't understand why this ctor needs to be public, i noticed Solr's QueryValueSource is the only thing using it, and I think instead that should be fixed.

          can we disambiguate by using ARC[] instead of ARC... ?

          Show
          Simon Willnauer added a comment - some recursion bugs in IndexSearcher(IndexReader r, boolean closeReader, ExecutorService executor) what's the recursion bug here? I have no idea why we are passing "this" to the subs here - maybe to save a single instance?! i noticed problems with IndexSearcher(ReaderContext topLevel, AtomicReaderContext... leaves), the varargs causes this ctor to become ambiguous with other ctors (such as ReaderContext, ExecutorService). I don't understand why this ctor needs to be public, i noticed Solr's QueryValueSource is the only thing using it, and I think instead that should be fixed. can we disambiguate by using ARC[] instead of ARC... ?
          Hide
          Robert Muir added a comment -

          There are some issues here with IndexSearcher, mostly found with LUCENE-2751 (once we started randomly using the executor service in tests).

          1. some recursion bugs in IndexSearcher(IndexReader r, boolean closeReader, ExecutorService executor)
          2. i noticed problems with IndexSearcher(ReaderContext topLevel, AtomicReaderContext... leaves), the varargs causes this ctor to become ambiguous with other ctors (such as ReaderContext, ExecutorService). I don't understand why this ctor needs to be public, i noticed Solr's QueryValueSource is the only thing using it, and I think instead that should be fixed.
          Show
          Robert Muir added a comment - There are some issues here with IndexSearcher, mostly found with LUCENE-2751 (once we started randomly using the executor service in tests). some recursion bugs in IndexSearcher(IndexReader r, boolean closeReader, ExecutorService executor) i noticed problems with IndexSearcher(ReaderContext topLevel, AtomicReaderContext... leaves), the varargs causes this ctor to become ambiguous with other ctors (such as ReaderContext, ExecutorService). I don't understand why this ctor needs to be public, i noticed Solr's QueryValueSource is the only thing using it, and I think instead that should be fixed.
          Hide
          Simon Willnauer added a comment -

          I committed the latest patch in revision 1058431, I think we are done here - yay!

          Show
          Simon Willnauer added a comment - I committed the latest patch in revision 1058431, I think we are done here - yay!
          Hide
          Simon Willnauer added a comment -

          updated to trunk

          Show
          Simon Willnauer added a comment - updated to trunk
          Hide
          Michael McCandless added a comment -

          Simon - heads up, I'm renaming TermState.copy -> TermState.copyFrom in LUCENE-2857.

          this comment should be on LUCENE-2694, right?!

          Duh, right. But it looks like you got the message anyway

          Show
          Michael McCandless added a comment - Simon - heads up, I'm renaming TermState.copy -> TermState.copyFrom in LUCENE-2857 . this comment should be on LUCENE-2694 , right?! Duh, right. But it looks like you got the message anyway
          Hide
          Simon Willnauer added a comment -

          this patch cuts over all function query stuff to AtomicReaderContext in solr & lucene. It also nukes SolrIndexReader entirely - yay!!
          I thinks somebody should give this patch a glance though, especially from the solr perspective although all tests pass.

          I had to make the IndexSearcher(ReaderContext, AtomicContext...) ctor public which is ok I think and I added a new already deprecated method to ValueSource in lucene land to make transition easier.

          if nobody objects I will commit later today

          Show
          Simon Willnauer added a comment - this patch cuts over all function query stuff to AtomicReaderContext in solr & lucene. It also nukes SolrIndexReader entirely - yay!! I thinks somebody should give this patch a glance though, especially from the solr perspective although all tests pass. I had to make the IndexSearcher(ReaderContext, AtomicContext...) ctor public which is ok I think and I added a new already deprecated method to ValueSource in lucene land to make transition easier. if nobody objects I will commit later today
          Hide
          Simon Willnauer added a comment -

          Simon - heads up, I'm renaming TermState.copy -> TermState.copyFrom in LUCENE-2857.

          this comment should be on LUCENE-2694, right?!

          Show
          Simon Willnauer added a comment - Simon - heads up, I'm renaming TermState.copy -> TermState.copyFrom in LUCENE-2857 . this comment should be on LUCENE-2694 , right?!
          Hide
          Michael McCandless added a comment -

          Simon – heads up, I'm renaming TermState.copy -> TermState.copyFrom in LUCENE-2857.

          Show
          Michael McCandless added a comment - Simon – heads up, I'm renaming TermState.copy -> TermState.copyFrom in LUCENE-2857 .
          Hide
          Simon Willnauer added a comment -

          renamed setNextContext back to setNextReader - its really just a reader with context

          Show
          Simon Willnauer added a comment - renamed setNextContext back to setNextReader - its really just a reader with context
          Hide
          Simon Willnauer added a comment -

          next transition iteration - this one cuts over Collector#setNextReader to Collector#setNextContext(AtomicReaderContext) and FieldComparator#setNextContext(AtomicReaderContext) respectively. I also replaced several SolrIndexReader uses to the use the new API rather than SolrIndexReader directly. There are just a handful of SolrIndexReader uses left which are mainly due to ValueSource#docValues(IR). This seems to be the last one left - we are close! If nobody objects I will commit soon.

          Show
          Simon Willnauer added a comment - next transition iteration - this one cuts over Collector#setNextReader to Collector#setNextContext(AtomicReaderContext) and FieldComparator#setNextContext(AtomicReaderContext) respectively. I also replaced several SolrIndexReader uses to the use the new API rather than SolrIndexReader directly. There are just a handful of SolrIndexReader uses left which are mainly due to ValueSource#docValues(IR). This seems to be the last one left - we are close! If nobody objects I will commit soon.
          Hide
          Michael McCandless added a comment -

          Awesome – patch looks great! I love the simplification to IndexSearcher for the sub-searchers, the fact that we no longer have to rebase since the provided context sends the docBase down. This is cleaner, too (we no longer have to "get a topLevel context" for a sub reader). Great!

          This will also make the cutover to statically typed readers (atomic vs composite) easier.

          Show
          Michael McCandless added a comment - Awesome – patch looks great! I love the simplification to IndexSearcher for the sub-searchers, the fact that we no longer have to rebase since the provided context sends the docBase down. This is cleaner, too (we no longer have to "get a topLevel context" for a sub reader). Great! This will also make the cutover to statically typed readers (atomic vs composite) easier.
          Hide
          Simon Willnauer added a comment -

          this patch cuts over Weight#scorer, Weight#explain & Filter#getDocIDSet to AtomicReaderContext. I also somewhat fixed IndexSearcher to handle concurrent execution correctly with respect to the actual context. Currently each sub searcher was its own context now we are just using IS as an executor of one leaf but still being aware of the entire context.

          All tests pass. I will commit soon if nobody objects

          Show
          Simon Willnauer added a comment - this patch cuts over Weight#scorer, Weight#explain & Filter#getDocIDSet to AtomicReaderContext. I also somewhat fixed IndexSearcher to handle concurrent execution correctly with respect to the actual context. Currently each sub searcher was its own context now we are just using IS as an executor of one leaf but still being aware of the entire context. All tests pass. I will commit soon if nobody objects
          Hide
          Simon Willnauer added a comment -

          And also Collector?

          yeah I think that one can move to ARC too.

            ValueSource#getValues(IndexReader)
          

          is another one

          Show
          Simon Willnauer added a comment - And also Collector? yeah I think that one can move to ARC too. ValueSource#getValues(IndexReader) is another one
          Hide
          Michael McCandless added a comment -

          It seems we also need to migrate FieldComparator to use ReaderContext (eventually AtomicReaderContext)?

          +1

          And also Collector?

          Show
          Michael McCandless added a comment - It seems we also need to migrate FieldComparator to use ReaderContext (eventually AtomicReaderContext)? +1 And also Collector?
          Hide
          Yonik Seeley added a comment -

          It seems we also need to migrate FieldComparator to use ReaderContext (eventually AtomicReaderContext)?

          Show
          Yonik Seeley added a comment - It seems we also need to migrate FieldComparator to use ReaderContext (eventually AtomicReaderContext)?
          Hide
          Yonik Seeley added a comment -

          Soooo, I took a quick shot at a high level migration of Filter.getDocIDSet to AtomicReaderContext - and after a few IDE crashes and other roadblocks + headaches, I went ahead with a bottom-up approach (as you can see from some of my commits).

          Show
          Yonik Seeley added a comment - Soooo, I took a quick shot at a high level migration of Filter.getDocIDSet to AtomicReaderContext - and after a few IDE crashes and other roadblocks + headaches, I went ahead with a bottom-up approach (as you can see from some of my commits).
          Hide
          Simon Willnauer added a comment -

          But you previously indicated there was some issue in Solr that made that problematic?

          actually for Filter that should be fine as far as I can see. I had some cases where non-atomic reader ctx where passed to scorer but maybe those where a different problem, I try to remember though. If you feel like it you can just go ahead and cut over to ARCxt on getDocIdSet() or I will do next week though.

          Show
          Simon Willnauer added a comment - But you previously indicated there was some issue in Solr that made that problematic? actually for Filter that should be fine as far as I can see. I had some cases where non-atomic reader ctx where passed to scorer but maybe those where a different problem, I try to remember though. If you feel like it you can just go ahead and cut over to ARCxt on getDocIdSet() or I will do next week though.
          Hide
          Yonik Seeley added a comment -

          would you want to open a new issue to migrate solr parts or should we do that in this one

          This one is fine. It makes sense to actual cutover from ReaderContext to AtomicReaderContext before migration away from SolrIndexReader.
          But you previously indicated there was some issue in Solr that made that problematic?

          Show
          Yonik Seeley added a comment - would you want to open a new issue to migrate solr parts or should we do that in this one This one is fine. It makes sense to actual cutover from ReaderContext to AtomicReaderContext before migration away from SolrIndexReader. But you previously indicated there was some issue in Solr that made that problematic?
          Hide
          Simon Willnauer added a comment -

          So we should either change to AtomicReaderContext, or put a getBaseInTop() method on ReaderContext.

          We should move to AtomicReaderContext if possible. Would you want to open a new issue to migrate solr parts or should we do that in this one?
          Similarly, Weight#scorer should also take a AtomicReaderContext if possible...

          Show
          Simon Willnauer added a comment - So we should either change to AtomicReaderContext, or put a getBaseInTop() method on ReaderContext. We should move to AtomicReaderContext if possible. Would you want to open a new issue to migrate solr parts or should we do that in this one? Similarly, Weight#scorer should also take a AtomicReaderContext if possible...
          Hide
          Yonik Seeley added a comment -

          Do we have any good MultiReader tests?
          wrapUnderlyingReader() sort of does... but not enough to tell if someone accidentally used baseInParent as opposed to the global base.
          Perhaps it should construct a MultiReader with an arbitrary but equivalent structure based on children and leaves?

          Show
          Yonik Seeley added a comment - Do we have any good MultiReader tests? wrapUnderlyingReader() sort of does... but not enough to tell if someone accidentally used baseInParent as opposed to the global base. Perhaps it should construct a MultiReader with an arbitrary but equivalent structure based on children and leaves?
          Hide
          Yonik Seeley added a comment -

          > Should Filter.getDocIDSet take an AtomicReaderContext? We don't have
          > to do that in this patch, though... this patch is a big enough first
          > step!

          Yeah I would like to do so, similar to Weight#scorer but currently mainly solr prevents us from this.

          Which part? I was looking into migrating some SolrIndexSearcher to ReaderContext, and realized I needed the global base.
          I clould walk up to calculate, but then I realized that AtomicReaderContext already has that! So we should either change to AtomicReaderContext, or put a getBaseInTop() method on ReaderContext.

          Show
          Yonik Seeley added a comment - > Should Filter.getDocIDSet take an AtomicReaderContext? We don't have > to do that in this patch, though... this patch is a big enough first > step! Yeah I would like to do so, similar to Weight#scorer but currently mainly solr prevents us from this. Which part? I was looking into migrating some SolrIndexSearcher to ReaderContext, and realized I needed the global base. I clould walk up to calculate, but then I realized that AtomicReaderContext already has that! So we should either change to AtomicReaderContext, or put a getBaseInTop() method on ReaderContext.
          Hide
          Simon Willnauer added a comment -

          This is a bug in ReaderUtil.build() that when passed a segment reader, it sets isTopLevel to false.

          ah good catch! Thanks!

          It assumes that a reader is top level if it has leaves.

          that one is actually intentional. if it is a CompositeReaderContext it must have leaves since it is composed of at least on other reader, right? Otherwise it should be an atomic reader or do I miss something?

          I have to admit that I didn't try to hard to get the Solr part running altogether.

          Show
          Simon Willnauer added a comment - This is a bug in ReaderUtil.build() that when passed a segment reader, it sets isTopLevel to false. ah good catch! Thanks! It assumes that a reader is top level if it has leaves. that one is actually intentional. if it is a CompositeReaderContext it must have leaves since it is composed of at least on other reader, right? Otherwise it should be an atomic reader or do I miss something? I have to admit that I didn't try to hard to get the Solr part running altogether.
          Hide
          Yonik Seeley added a comment -

          I see another related bug I think:
          CompositeReaderContext does this:
          super(parent, reader, false, leaves != null, ordInParent, docbaseInParent);
          It assumes that a reader is top level if it has leaves.

          Show
          Yonik Seeley added a comment - I see another related bug I think: CompositeReaderContext does this: super(parent, reader, false, leaves != null, ordInParent, docbaseInParent); It assumes that a reader is top level if it has leaves.
          Hide
          Yonik Seeley added a comment -

          Regarding this assert in IndexSearcher:
          // TODO: eable this assert once SolrIndexReader and friends are refactored to use ReaderContext
          // We can't assert this here since SolrIndexReader will fail in some contexts - once solr is consistent we should be fine here
          // assert context.isTopLevel: "IndexSearcher's ReaderContext must be topLevel for reader" + context.reader;

          This is a bug in ReaderUtil.build() that when passed a segment reader, it sets isTopLevel to false.
          You got bit by those extra booleans

          When I hacked ReaderContext to just set isTopLevel to parent==null, all the solr tests passed w/ the assertion enabled.

          Show
          Yonik Seeley added a comment - Regarding this assert in IndexSearcher: // TODO: eable this assert once SolrIndexReader and friends are refactored to use ReaderContext // We can't assert this here since SolrIndexReader will fail in some contexts - once solr is consistent we should be fine here // assert context.isTopLevel: "IndexSearcher's ReaderContext must be topLevel for reader" + context.reader; This is a bug in ReaderUtil.build() that when passed a segment reader, it sets isTopLevel to false. You got bit by those extra booleans When I hacked ReaderContext to just set isTopLevel to parent==null, all the solr tests passed w/ the assertion enabled.
          Hide
          Simon Willnauer added a comment -

          committed in revision 1055636

          Show
          Simon Willnauer added a comment - committed in revision 1055636
          Hide
          Uwe Schindler added a comment -

          Go ahead, looks good, +1

          If there are smaller issues, let's fix them later. The patch is quite big, so its better to commit now and let everybody use it! I was also thinking about using ReaderContext in Query.rewrite() for consistency.

          Show
          Uwe Schindler added a comment - Go ahead, looks good, +1 If there are smaller issues, let's fix them later. The patch is quite big, so its better to commit now and let everybody use it! I was also thinking about using ReaderContext in Query.rewrite() for consistency.
          Hide
          Simon Willnauer added a comment -

          final patch, fixed the leafes problem and added changes.txt entry. I commit shortly

          Show
          Simon Willnauer added a comment - final patch, fixed the leafes problem and added changes.txt entry. I commit shortly
          Hide
          Simon Willnauer added a comment -

          But there's still a numLeafes in ReaderUtil

          bloddy dyslexic german

          I'll go ahead and commit

          Show
          Simon Willnauer added a comment - But there's still a numLeafes in ReaderUtil bloddy dyslexic german I'll go ahead and commit
          Hide
          Michael McCandless added a comment -

          +1!

          But there's still a numLeafes in ReaderUtil

          Show
          Michael McCandless added a comment - +1! But there's still a numLeafes in ReaderUtil
          Hide
          Simon Willnauer added a comment -

          fixed those little spelling issues & added a children() method to ReaderContext. I also revised the leaves() jdocs to be more clear now.

          I think we are good to go!

          Show
          Simon Willnauer added a comment - fixed those little spelling issues & added a children() method to ReaderContext. I also revised the leaves() jdocs to be more clear now. I think we are good to go!
          Hide
          Michael McCandless added a comment -

          Patch looks good!

          There's a numLeafs in ReaderUtil still, and s/docbaseInParent/docBaseInParent.

          I think children() would be good.

          Show
          Michael McCandless added a comment - Patch looks good! There's a numLeafs in ReaderUtil still, and s/docbaseInParent/docBaseInParent. I think children() would be good.
          Hide
          Simon Willnauer added a comment -

          ReaderContextBuilder.numLeafes uses an AtomicInt, but ReaderUtil.Gather doesn't do any threading.

          that way I can update it in the annonymous class - not do any threading doesn't really matter that operation is not time critical at all. Impl. detail IMO which is just convenient

          ReaderContext.leaves() is a method - shouldn't it just be a member for consistency? I don't really understand the javadoc on that method either, since I don't see how I could walk the tree myself - there are no child pointers.

          well they are in CompositeReaderContext but that jdoc is missleading. I added it to prevent a cast to check if there are leaves I don't see why this is problematic here though. I would rather add a children() method for consistency here though.

          Is ReaderContext.isTopLevel redundant (i.e. it will always be equal to parent==null)? Maybe the same thing for isAtomic and leaves==null?

          yeah we could do that but I would prefer the simple booleans since they are way more expressive and easier to understand.

          Show
          Simon Willnauer added a comment - ReaderContextBuilder.numLeafes uses an AtomicInt, but ReaderUtil.Gather doesn't do any threading. that way I can update it in the annonymous class - not do any threading doesn't really matter that operation is not time critical at all. Impl. detail IMO which is just convenient ReaderContext.leaves() is a method - shouldn't it just be a member for consistency? I don't really understand the javadoc on that method either, since I don't see how I could walk the tree myself - there are no child pointers. well they are in CompositeReaderContext but that jdoc is missleading. I added it to prevent a cast to check if there are leaves I don't see why this is problematic here though. I would rather add a children() method for consistency here though. Is ReaderContext.isTopLevel redundant (i.e. it will always be equal to parent==null)? Maybe the same thing for isAtomic and leaves==null? yeah we could do that but I would prefer the simple booleans since they are way more expressive and easier to understand.
          Hide
          Yonik Seeley added a comment -

          I'm browsing through this latest patch a bit...

          • ReaderContextBuilder.numLeafes uses an AtomicInt, but ReaderUtil.Gather doesn't do any threading.
          • ReaderContext.leaves() is a method - shouldn't it just be a member for consistency? I don't really understand the javadoc on that method either, since I don't see how I could walk the tree myself - there are no child pointers.
          • Is ReaderContext.isTopLevel redundant (i.e. it will always be equal to parent==null)? Maybe the same thing for isAtomic and leaves==null?
          Show
          Yonik Seeley added a comment - I'm browsing through this latest patch a bit... ReaderContextBuilder.numLeafes uses an AtomicInt, but ReaderUtil.Gather doesn't do any threading. ReaderContext.leaves() is a method - shouldn't it just be a member for consistency? I don't really understand the javadoc on that method either, since I don't see how I could walk the tree myself - there are no child pointers. Is ReaderContext.isTopLevel redundant (i.e. it will always be equal to parent==null)? Maybe the same thing for isAtomic and leaves==null?
          Hide
          Simon Willnauer added a comment -

          Updated to trunk and fixed some variable naming s/info/context

          all tests pass

          Show
          Simon Willnauer added a comment - Updated to trunk and fixed some variable naming s/info/context all tests pass
          Hide
          Simon Willnauer added a comment -

          next iteration, this time I think we are very close.

          • I renamed AtomicContext to AtomicReaderContext and likewise for CompositeContext
          • s/topLevelReaderContext/getTopReaderContext
          • updated to latest trunk and adopted the changes to IS in LUCENE-2837
          • Removed the dummy searcher in QueryWrapperFilter which now works just fine with a IS instance
          • added ReaderContext ctors to IS
          • replaced some members in IS in favor of AtomicReader[] leaves <--- leafs
          • s/leafs/leaves
          • Sharpened the JDocs in Weight - review please
          • added missing JDocs to IR, IS & ReaderContext + subs

          Should Filter.getDocIDSet take an AtomicReaderContext? We don't have
          to do that in this patch, though... this patch is a big enough first
          step!

          Yeah I would like to do so, similar to Weight#scorer but currently mainly solr prevents us from this. There is also the functionqueries that still operate on IR instead of ReaderContext but maybe this is a good usecase to consolidate them move them into a module and get them out of core?!
          Anyway, we should do this in a different issue - this has its purpose as you stated.
          Likewise I would do issues for CachingWrapperFilter & DuplcateFilter though.

          simon

          Show
          Simon Willnauer added a comment - next iteration, this time I think we are very close. I renamed AtomicContext to AtomicReaderContext and likewise for CompositeContext s/topLevelReaderContext/getTopReaderContext updated to latest trunk and adopted the changes to IS in LUCENE-2837 Removed the dummy searcher in QueryWrapperFilter which now works just fine with a IS instance added ReaderContext ctors to IS replaced some members in IS in favor of AtomicReader[] leaves <--- leafs s/leafs/leaves Sharpened the JDocs in Weight - review please added missing JDocs to IR, IS & ReaderContext + subs Should Filter.getDocIDSet take an AtomicReaderContext? We don't have to do that in this patch, though... this patch is a big enough first step! Yeah I would like to do so, similar to Weight#scorer but currently mainly solr prevents us from this. There is also the functionqueries that still operate on IR instead of ReaderContext but maybe this is a good usecase to consolidate them move them into a module and get them out of core?! Anyway, we should do this in a different issue - this has its purpose as you stated. Likewise I would do issues for CachingWrapperFilter & DuplcateFilter though. simon
          Hide
          Michael McCandless added a comment -

          Looks good Simon! Random comments...

          Maybe rename AtomicContext -> AtomicReaderContext? And same for
          CompositeContext?

          Should Filter.getDocIDSet take an AtomicReaderContext? We don't have
          to do that in this patch, though... this patch is a big enough first
          step!

          Leafes -> Leaves

          Maybe IR.getTopReaderContext() instead of IR.topLevelReaderContext()?
          (Or .getRootReaderContext()?).

          I agree this should eventually subsume
          .getSequentialReaders... though, we probably should change IR base
          method to return null not throw UOE, if so (until we succeed in
          statically typing composite vs atomic readers...).

          I think we can change the expert IndexSearcher ctor that takes the
          forced subReaders to instead take a root ReaderContext? In fact,
          maybe we can remove it altogether? It was added to avoid the
          "relatively costly" gatherSubReaders that IS does if you just pass it
          an IR, but, we are now fixing that w/ this issue, by having IR cache
          the root ReaderContext...

          If we did that could we go back to having QueryWrapperFilter just make
          an IndexSearcher?

          Do we really need forceLeafs()? Can't QueryWrapperFilter make a
          MultiReader holding just its atomic IR and pass that to IS? And then
          we can remove the AtomicContext ctor that takes a "naked" atomic
          reader?

          QueryWrapperFilter's WeightOnlyearcher should be WeightOnlySearcher.

          Show
          Michael McCandless added a comment - Looks good Simon! Random comments... Maybe rename AtomicContext -> AtomicReaderContext? And same for CompositeContext? Should Filter.getDocIDSet take an AtomicReaderContext? We don't have to do that in this patch, though... this patch is a big enough first step! Leafes -> Leaves Maybe IR.getTopReaderContext() instead of IR.topLevelReaderContext()? (Or .getRootReaderContext()?). I agree this should eventually subsume .getSequentialReaders... though, we probably should change IR base method to return null not throw UOE, if so (until we succeed in statically typing composite vs atomic readers...). I think we can change the expert IndexSearcher ctor that takes the forced subReaders to instead take a root ReaderContext? In fact, maybe we can remove it altogether? It was added to avoid the "relatively costly" gatherSubReaders that IS does if you just pass it an IR, but, we are now fixing that w/ this issue, by having IR cache the root ReaderContext... If we did that could we go back to having QueryWrapperFilter just make an IndexSearcher? Do we really need forceLeafs()? Can't QueryWrapperFilter make a MultiReader holding just its atomic IR and pass that to IS? And then we can remove the AtomicContext ctor that takes a "naked" atomic reader? QueryWrapperFilter's WeightOnlyearcher should be WeightOnlySearcher.
          Hide
          Simon Willnauer added a comment -

          Attaching my current state. There is still one testcase failing in solr land that violates the hierarchy (TestFunctionQuery) or rather the contract between Query#weight() and Weight#scorer() in not using the correct leaf readers etc.

          I added 2 structs CompositeContext and AtomicContext which made it somewhat simpler IMO. Yet, I try to fix the solr problems as far as i can - if that is not possible without major changes i will remove the assert in TermWeight since all tests pass without it though.

          if someone finds the time to look at the Context impls - comments very welcome!

          simon

          Show
          Simon Willnauer added a comment - Attaching my current state. There is still one testcase failing in solr land that violates the hierarchy (TestFunctionQuery) or rather the contract between Query#weight() and Weight#scorer() in not using the correct leaf readers etc. I added 2 structs CompositeContext and AtomicContext which made it somewhat simpler IMO. Yet, I try to fix the solr problems as far as i can - if that is not possible without major changes i will remove the assert in TermWeight since all tests pass without it though. if someone finds the time to look at the Context impls - comments very welcome! simon
          Hide
          Michael McCandless added a comment -

          I like ReaderContext as the name instead of ReaderInfo... Info is too "generic".

          Show
          Michael McCandless added a comment - I like ReaderContext as the name instead of ReaderInfo... Info is too "generic".
          Hide
          Michael McCandless added a comment -

          The same IR could be a sub-reader in a different context though.

          Right, but that's perfectly fine w/ the proposed API.

          Ie, any IR is able to be a top reader if you ask it to. So you can
          call IR.getTopReaderInfo(), and this will fill in the full tree as
          seen by that top reader only. Meaning, the resulting tree is
          "private" to that reader. If you go and ask another reader for its
          tree, a new tree (private to that reader's "view" (parent/child
          relationships)) is computed.

          The API is fully general: it allows for readers that are shared by
          more than one top reader.

          Really (as Yonik said before) this is a question of caching. It's
          gonna be the same tree, whether it's stored on IR or IS. Yet, the
          tree in no way depends on IS – it's entirely a function of the
          relationships b/w IR and its subs. An app should be able to store the
          top IR, and pull the tree from it on demand. And init'ing an IS from
          an IR should continue to be cheap.

          Also, looking towards the future... at some point we will split apart
          "composite reader" and "atomic reader". Ie, these really should be
          separate classes, but today in Lucene they are one and the same and so
          we have "dynamic typing" (you hit exceptions at runtime) instead of
          static typing (compilation errors). At some point we have to fix this
          and make these two IRs separately classes.

          But to say that this API should be moved to IS instead of IR because
          of this dynamic/static typing problem is a step in the wrong direction
          – ie once we split out these two IR classes we'll want to move the
          the getTopReaderInfo API back to the "composite" IR.

          BTW we will have to somehow fix CachingWrapperFilter to behave
          properly here. Ie, it needs to know whether the cache key is purely
          the sub-reader (as it is, always, today), or the combo of root reader
          + sub (as it is for eg DuplicateFilter).

          Show
          Michael McCandless added a comment - The same IR could be a sub-reader in a different context though. Right, but that's perfectly fine w/ the proposed API. Ie, any IR is able to be a top reader if you ask it to. So you can call IR.getTopReaderInfo(), and this will fill in the full tree as seen by that top reader only . Meaning, the resulting tree is "private" to that reader. If you go and ask another reader for its tree, a new tree (private to that reader's "view" (parent/child relationships)) is computed. The API is fully general: it allows for readers that are shared by more than one top reader. Really (as Yonik said before) this is a question of caching. It's gonna be the same tree, whether it's stored on IR or IS. Yet, the tree in no way depends on IS – it's entirely a function of the relationships b/w IR and its subs. An app should be able to store the top IR, and pull the tree from it on demand. And init'ing an IS from an IR should continue to be cheap. Also, looking towards the future... at some point we will split apart "composite reader" and "atomic reader". Ie, these really should be separate classes, but today in Lucene they are one and the same and so we have "dynamic typing" (you hit exceptions at runtime) instead of static typing (compilation errors). At some point we have to fix this and make these two IRs separately classes. But to say that this API should be moved to IS instead of IR because of this dynamic/static typing problem is a step in the wrong direction – ie once we split out these two IR classes we'll want to move the the getTopReaderInfo API back to the "composite" IR. BTW we will have to somehow fix CachingWrapperFilter to behave properly here. Ie, it needs to know whether the cache key is purely the sub-reader (as it is, always, today), or the combo of root reader + sub (as it is for eg DuplicateFilter).
          Hide
          Simon Willnauer added a comment -

          I'm confused - there'd be no restriction with this approach? Ie it would allow for a sub that's shared in different top readers.

          I think what yonik it saying it that you can not be sure that the topLevelReader is a TopLevelReader since it could have yet another parent.
          This will always be true and essentially only well defined in the context of an IndexSearcher. IndexSearcher takes an arbitrary IndexReader as a topLevel and this will be the one for this IS context. The same IR could be a sub-reader in a different context though.

          I'm tend to lean towards holding ReaderInfo on IndexSearcher rather than on IndexReader. It is really an information that 100% depends on a context which is given by IS. IS also defines the context for Query#weigth / createWeight and Weight ctors and with LUCENE-2837 we can eventually rely on the given Searcher is an IS.

          Another thing to think about is if this should be unified with the very similar ReaderUtil.Slice

          I agree this should be easy to obtain.

          SolrIndexReader could be removed altogether if this is solved correctly - it's only real purpose was really to solve this context problem.

          yeah this makes sense. I didn't look into it too deep during the first patch.

          I think we should give Yoniks full ReaderInfo a go an create it only in IS. This might make some Solr classes go away too and reduce complexity. I will go ahead and work on a new patch soon.

          Show
          Simon Willnauer added a comment - I'm confused - there'd be no restriction with this approach? Ie it would allow for a sub that's shared in different top readers. I think what yonik it saying it that you can not be sure that the topLevelReader is a TopLevelReader since it could have yet another parent. This will always be true and essentially only well defined in the context of an IndexSearcher. IndexSearcher takes an arbitrary IndexReader as a topLevel and this will be the one for this IS context. The same IR could be a sub-reader in a different context though. I'm tend to lean towards holding ReaderInfo on IndexSearcher rather than on IndexReader. It is really an information that 100% depends on a context which is given by IS. IS also defines the context for Query#weigth / createWeight and Weight ctors and with LUCENE-2837 we can eventually rely on the given Searcher is an IS. Another thing to think about is if this should be unified with the very similar ReaderUtil.Slice I agree this should be easy to obtain. SolrIndexReader could be removed altogether if this is solved correctly - it's only real purpose was really to solve this context problem. yeah this makes sense. I didn't look into it too deep during the first patch. I think we should give Yoniks full ReaderInfo a go an create it only in IS. This might make some Solr classes go away too and reduce complexity. I will go ahead and work on a new patch soon.
          Hide
          Michael McCandless added a comment -

          I'm confused – there'd be no restriction with this approach? Ie it
          would allow for a sub that's shared in different top readers.

          The IR has .getTopReaderInfo(). This returns a full ReaderInfo tree
          for all subs under that top. That info is stored, privately, in only
          that top reader (not on the subs). When you call .getTopReaderInfo(),
          no ReaderInfo is set on any of the subs. Only that top reader holds
          this tree.

          Show
          Michael McCandless added a comment - I'm confused – there'd be no restriction with this approach? Ie it would allow for a sub that's shared in different top readers. The IR has .getTopReaderInfo(). This returns a full ReaderInfo tree for all subs under that top. That info is stored, privately, in only that top reader (not on the subs). When you call .getTopReaderInfo(), no ReaderInfo is set on any of the subs. Only that top reader holds this tree.
          Hide
          Yonik Seeley added a comment -

          I think we should add the sub-ReaderInfos too, so the tree is
          doubly-linked?

          Yeah, sounds like a good idea. Solr currently has this functionality via SolrIndexReader (all lucene readers in the tree are wrapped on every reopen), and every SolrIndexReader has the list of children, and a parent pointer. So this is turning into the same thing, just with a parallel data structure (which mirrors the actual reader tree) instead of wrapping.

          Actually I think we can and should store this in IndexReader?

          But, only the top-level reader is allowed to hold ReaderInfo for all
          subs it [recurisvely] contains.

          A top level reader may also be a sub-reader of another top level reader... so it doesn't seem
          like we can make that restriction, and any ReaderInfo stored on an IndexReader would only be valid
          in some contexts. Think about simply trying to walk up to the top level reader, or using "offset"
          or "ord' which vary depending on the top-level reader.

          And if that info is passed down to weight() and scorer() anyway, what's the point of storing it on IR?
          I guess if we made the restriction that things that vary depending on top-level reader should be avoided
          in ReaderInfo if obtained directly from an IndexReader, it would be OK. But that's a severe and strange restriction.

          We can try approaching it from the other direction too... what is gained or made easier by storing ReaderInfo on IR? What are the usecases?

          Show
          Yonik Seeley added a comment - I think we should add the sub-ReaderInfos too, so the tree is doubly-linked? Yeah, sounds like a good idea. Solr currently has this functionality via SolrIndexReader (all lucene readers in the tree are wrapped on every reopen), and every SolrIndexReader has the list of children, and a parent pointer. So this is turning into the same thing, just with a parallel data structure (which mirrors the actual reader tree) instead of wrapping. Actually I think we can and should store this in IndexReader? But, only the top-level reader is allowed to hold ReaderInfo for all subs it [recurisvely] contains. A top level reader may also be a sub-reader of another top level reader... so it doesn't seem like we can make that restriction, and any ReaderInfo stored on an IndexReader would only be valid in some contexts. Think about simply trying to walk up to the top level reader, or using "offset" or "ord' which vary depending on the top-level reader. And if that info is passed down to weight() and scorer() anyway, what's the point of storing it on IR? I guess if we made the restriction that things that vary depending on top-level reader should be avoided in ReaderInfo if obtained directly from an IndexReader, it would be OK. But that's a severe and strange restriction. We can try approaching it from the other direction too... what is gained or made easier by storing ReaderInfo on IR? What are the usecases?
          Hide
          Michael McCandless added a comment -

          I like the full ReaderInfo!

          It holds details not only relative to top-level reader but also
          relative to its immediate parent. This way we don't lose any
          information on the full tree structure (in case there are apps that
          care).

          I think we should add the sub-ReaderInfos too, so the tree is
          doubly-linked?

          ReaderInfo (info about parents, and reader context in general) should not be kept on the IndexReader since a reader can be used in multiple contexts.

          Actually I think we can and should store this in IndexReader?

          But, only the top-level reader is allowed to hold ReaderInfo for all
          subs it [recurisvely] contains. Ie a given IR is not allowed to hold
          "its" ReaderInfo, since it can in general have more than one
          ReaderInfo if it belongs to multiple top-level readers.

          So eg IR would have a getTopReaderInfo(), to return its own top-level
          ReaderInfo. From there, eg, IS would walk the subs down to gather the
          leaf ReaderInfo array that are the subs visited for searching.

          Man I wish we had done this back in 2.9 w/ the per-segment cutover,
          instead of passing around naked readers this whole time!!

          Show
          Michael McCandless added a comment - I like the full ReaderInfo! It holds details not only relative to top-level reader but also relative to its immediate parent. This way we don't lose any information on the full tree structure (in case there are apps that care). I think we should add the sub-ReaderInfos too, so the tree is doubly-linked? ReaderInfo (info about parents, and reader context in general) should not be kept on the IndexReader since a reader can be used in multiple contexts. Actually I think we can and should store this in IndexReader? But, only the top-level reader is allowed to hold ReaderInfo for all subs it [recurisvely] contains. Ie a given IR is not allowed to hold "its" ReaderInfo, since it can in general have more than one ReaderInfo if it belongs to multiple top-level readers. So eg IR would have a getTopReaderInfo(), to return its own top-level ReaderInfo. From there, eg, IS would walk the subs down to gather the leaf ReaderInfo array that are the subs visited for searching. Man I wish we had done this back in 2.9 w/ the per-segment cutover, instead of passing around naked readers this whole time!!
          Hide
          Yonik Seeley added a comment -

          Another thing to think about is if this should be unified with the very similar ReaderUtil.Slice

          Show
          Yonik Seeley added a comment - Another thing to think about is if this should be unified with the very similar ReaderUtil.Slice
          Hide
          Yonik Seeley added a comment -

          Remember that we may be dealing with a tree with a depth > 2, so If we want to preserve maximum information, we may want something like this:

          public static final class ReaderInfo {
              public final IndexReader reader;
          
              /** the reader info for this reader's immediate parent, or null if none */
              public final ReaderInfo parent;
          
              /** the ord of this reader in the parent */
              public final int ordInParent;
          
              /** the offset of this reader in the parent */
             public final int offsetInParent;
          
             /** the ord of this reader in the top level reader */
             public final int ord;
          
            /** the offset of this reader in the top level reader */
            public final int offset;
          
          Show
          Yonik Seeley added a comment - Remember that we may be dealing with a tree with a depth > 2, so If we want to preserve maximum information, we may want something like this: public static final class ReaderInfo { public final IndexReader reader; /** the reader info for this reader's immediate parent, or null if none */ public final ReaderInfo parent; /** the ord of this reader in the parent */ public final int ordInParent; /** the offset of this reader in the parent */ public final int offsetInParent; /** the ord of this reader in the top level reader */ public final int ord; /** the offset of this reader in the top level reader */ public final int offset;
          Hide
          Yonik Seeley added a comment -

          # Removed all uses of SolrIndexReader#leaveReaders outside of SIR - IMO we can make those private to SIR now.

          SolrIndexReader could be removed altogether if this is solved correctly - it's only real purpose was really to solve this context problem.

          Show
          Yonik Seeley added a comment - # Removed all uses of SolrIndexReader#leaveReaders outside of SIR - IMO we can make those private to SIR now. SolrIndexReader could be removed altogether if this is solved correctly - it's only real purpose was really to solve this context problem.
          Hide
          Yonik Seeley added a comment -

          ReaderInfo (info about parents, and reader context in general) should not be kept on the IndexReader since a reader can be used in multiple contexts. For example, the idea that a reader has a single parent is false. This info should be keept at a higher level (like IndexSearcher) and passed down.

          Show
          Yonik Seeley added a comment - ReaderInfo (info about parents, and reader context in general) should not be kept on the IndexReader since a reader can be used in multiple contexts. For example, the idea that a reader has a single parent is false. This info should be keept at a higher level (like IndexSearcher) and passed down.
          Hide
          Simon Willnauer added a comment - - edited

          here is an initial patch that cuts over the API to use a ReaderInfo struct. I upload that patch to get initial feedback for this rather massive change. Before i fix all JavaDoc etc. some others should review that first. The patch contains the following changes:

          • Cut over to use ReaderInfo in Weight#scorer, Weight#explain & Filter.getDocIdSet
          • Added ReaderInfo[] to IndexReader and its subclasses as well as IndexSearcher and Searcher which just forwards if applicable.
          • Fixed several, IMO legacy implementations in Solr code that still fixed the doc offset in explain
          • Removed all uses of SolrIndexReader#leaveReaders outside of SIR - IMO we can make those private to SIR now.
          • Added a DummySearcher to QueryWrapperFilter since it uses the subreader to obtain the weight from a query. That didn't work anymore since I assert now that
            the parent readers are identical in Weight#scorer
          • cut over IndexSearcher to use ReaderInfo instead of seq. subreader

          What is definitly missing here from my point of view is:

          • sharpen the javadoc in weight, filter, and query to make clear how Weight, Rewrite & Scorer play together. For instance it should be clear that a query should only be executed against the same reader it was rewritten against.
          • possibly revise the getSeqSub & getSeqDocBase API in IR and maybe merge then in ReaderInfo
          • We should think about what we need in Scorer and Filter before we commit since this already changes the interface so we can do it here though. For instance could we include a needsScoring flag in either Query#weight() or in the Weight#scorer() method. One other way would be to add those flags to the struct
            that is passed in but that would mean that we can not simply bind them to the IR and use them from there.
          • maybe think about the name ReaderContext sounds also good instead of ReaderInfo

          comments welcome!!

          Show
          Simon Willnauer added a comment - - edited here is an initial patch that cuts over the API to use a ReaderInfo struct. I upload that patch to get initial feedback for this rather massive change. Before i fix all JavaDoc etc. some others should review that first. The patch contains the following changes: Cut over to use ReaderInfo in Weight#scorer, Weight#explain & Filter.getDocIdSet Added ReaderInfo[] to IndexReader and its subclasses as well as IndexSearcher and Searcher which just forwards if applicable. Fixed several, IMO legacy implementations in Solr code that still fixed the doc offset in explain Removed all uses of SolrIndexReader#leaveReaders outside of SIR - IMO we can make those private to SIR now. Added a DummySearcher to QueryWrapperFilter since it uses the subreader to obtain the weight from a query. That didn't work anymore since I assert now that the parent readers are identical in Weight#scorer cut over IndexSearcher to use ReaderInfo instead of seq. subreader What is definitly missing here from my point of view is: sharpen the javadoc in weight, filter, and query to make clear how Weight, Rewrite & Scorer play together. For instance it should be clear that a query should only be executed against the same reader it was rewritten against. possibly revise the getSeqSub & getSeqDocBase API in IR and maybe merge then in ReaderInfo We should think about what we need in Scorer and Filter before we commit since this already changes the interface so we can do it here though. For instance could we include a needsScoring flag in either Query#weight() or in the Weight#scorer() method. One other way would be to add those flags to the struct that is passed in but that would mean that we can not simply bind them to the IR and use them from there. maybe think about the name ReaderContext sounds also good instead of ReaderInfo comments welcome!!

            People

            • Assignee:
              Simon Willnauer
              Reporter:
              Simon Willnauer
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development