Lucene - Core
  1. Lucene - Core
  2. LUCENE-1771

Using explain may double ram reqs for fieldcaches when using ValueSourceQuery/CustomScoreQuery or for ConstantScoreQuerys that use a caching Filter.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available
    1. LUCENE-1771.bc-tests.patch
      21 kB
      Mark Miller
    2. LUCENE-1771.patch
      94 kB
      Mark Miller
    3. LUCENE-1771.patch
      90 kB
      Mark Miller
    4. LUCENE-1771.patch
      91 kB
      Mark Miller
    5. LUCENE-1771.patch
      83 kB
      Hoss Man
    6. LUCENE-1771.patch
      87 kB
      Mark Miller
    7. LUCENE-1771.patch
      22 kB
      Mark Miller
    8. LUCENE-1771.patch
      22 kB
      Mark Miller
    9. LUCENE-1771.patch
      20 kB
      Mark Miller
    10. LUCENE-1771.patch
      10 kB
      Mark Miller

      Issue Links

        Activity

        Hide
        Mark Miller added a comment -

        reminder to add warning for custom queries - you shouldn't use top level reader to access fieldcache values for explain

        Show
        Mark Miller added a comment - reminder to add warning for custom queries - you shouldn't use top level reader to access fieldcache values for explain
        Hide
        Mark Miller added a comment -

        Current state of fix (being iterated on in the fieldcache introspection issue - will be fully spun over to this issue after its shaken out)

        Show
        Mark Miller added a comment - Current state of fix (being iterated on in the fieldcache introspection issue - will be fully spun over to this issue after its shaken out)
        Hide
        Mark Miller added a comment -

        Unless there is an objection, and after another review, I will commit this soon.

        Show
        Mark Miller added a comment - Unless there is an objection, and after another review, I will commit this soon.
        Hide
        Michael McCandless added a comment -

        I think we should change the approach here, based on Yonik's last suggestion on LUCENE-1749, so that QueryWeight.explain receives both the top-level Searcher and the sub-reader, with the docID pre-resolved to that sub-reader. This way individual queries are not responsible for resolving to sub-readers.

        Actually, since QueryWeight is new in 2.9, we can simply make this change w/o deprecation.

        Show
        Michael McCandless added a comment - I think we should change the approach here, based on Yonik's last suggestion on LUCENE-1749 , so that QueryWeight.explain receives both the top-level Searcher and the sub-reader, with the docID pre-resolved to that sub-reader. This way individual queries are not responsible for resolving to sub-readers. Actually, since QueryWeight is new in 2.9, we can simply make this change w/o deprecation.
        Hide
        Mark Miller added a comment -

        okay, sounds good

        Show
        Mark Miller added a comment - okay, sounds good
        Hide
        Mark Miller added a comment - - edited

        searcher doesn't have access to numDocs (termqueryweight) at the moment ...

        edit

        N/M - the IndexSearcher will have getIndexReader

        Show
        Mark Miller added a comment - - edited searcher doesn't have access to numDocs (termqueryweight) at the moment ... edit N/M - the IndexSearcher will have getIndexReader
        Hide
        Yonik Seeley added a comment -

        searcher doesn't have access to numDocs (termqueryweight) at the moment ...

        Good... cause it's a bug and it shouldn't be using it When was that added?

        Show
        Yonik Seeley added a comment - searcher doesn't have access to numDocs (termqueryweight) at the moment ... Good... cause it's a bug and it shouldn't be using it When was that added?
        Hide
        Mark Miller added a comment -

        Actually, since QueryWeight is new in 2.9, we can simply make this change w/o deprecation.

        We do have to be back compat with Weight though - ugg - so QueryWeight would somehow need to expose a deprecated working explain(IndexReader, int) ? ...
        but if we could do that ...

        Show
        Mark Miller added a comment - Actually, since QueryWeight is new in 2.9, we can simply make this change w/o deprecation. We do have to be back compat with Weight though - ugg - so QueryWeight would somehow need to expose a deprecated working explain(IndexReader, int) ? ... but if we could do that ...
        Hide
        Yonik Seeley added a comment -

        Good... cause it's a bug and it shouldn't be using it When was that added?

        Tried out IntelliJ's "show history for selection" for the first time - cool stuff!
        Anyway, the bug was introduced in LUCENE-1066, Nov-2007

        Show
        Yonik Seeley added a comment - Good... cause it's a bug and it shouldn't be using it When was that added? Tried out IntelliJ's "show history for selection" for the first time - cool stuff! Anyway, the bug was introduced in LUCENE-1066 , Nov-2007
        Hide
        Yonik Seeley added a comment -

        We do have to be back compat with Weight though - ugg - so QueryWeight would somehow need to expose a deprecated working explain(IndexReader, int) ?

        QueryWeight doesn't need a explain(IndexReader, int)... but perhaps implementations of explain(IndexReader, Searcher, int) do need to handle a null searcher. And no, the output of explain won't be 100% compatible (or 100% accurate) but we're too far down the road of per-segment searching, and too close to a release to fix that now IMO.

        Show
        Yonik Seeley added a comment - We do have to be back compat with Weight though - ugg - so QueryWeight would somehow need to expose a deprecated working explain(IndexReader, int) ? QueryWeight doesn't need a explain(IndexReader, int)... but perhaps implementations of explain(IndexReader, Searcher, int) do need to handle a null searcher. And no, the output of explain won't be 100% compatible (or 100% accurate) but we're too far down the road of per-segment searching, and too close to a release to fix that now IMO.
        Hide
        Mark Miller added a comment -

        CustomScoreQuery has no access to the top level Searcher without docking it on the QueryWeight -

        /*(non-Javadoc) @see org.apache.lucene.search.Scorer#explain(int) */
        public Explanation explain(int doc) throws IOException {
        Explanation subQueryExpl = weight.subQueryWeight.explain(null, reader,doc); // nocommit: needs resolution

        Show
        Mark Miller added a comment - CustomScoreQuery has no access to the top level Searcher without docking it on the QueryWeight - /*(non-Javadoc) @see org.apache.lucene.search.Scorer#explain(int) */ public Explanation explain(int doc) throws IOException { Explanation subQueryExpl = weight.subQueryWeight.explain(null, reader,doc); // nocommit: needs resolution
        Hide
        Mark Miller added a comment -

        QueryWeight doesn't need a explain(IndexReader, int)...

        Why not? QueryWeight implements Weight for back compat ...

        Show
        Mark Miller added a comment - QueryWeight doesn't need a explain(IndexReader, int)... Why not? QueryWeight implements Weight for back compat ...
        Hide
        Yonik Seeley added a comment -

        Why not? QueryWeight implements Weight for back compat...

        Couldn't QueryWeight just have the following default implementation?
        public Explanation explain(IndexReader reader, int doc) throws IOException

        { explain(reader,doc,null); }

        And implementations would just need to be able to handle a null searcher (for now).

        Show
        Yonik Seeley added a comment - Why not? QueryWeight implements Weight for back compat... Couldn't QueryWeight just have the following default implementation? public Explanation explain(IndexReader reader, int doc) throws IOException { explain(reader,doc,null); } And implementations would just need to be able to handle a null searcher (for now).
        Hide
        Mark Miller added a comment -

        Yes - but then I think we have to use the top reader for that (impls already expect it) - so you could have double caching problems where Weight is still used. I think we will have to live with that though.

        Show
        Mark Miller added a comment - Yes - but then I think we have to use the top reader for that (impls already expect it) - so you could have double caching problems where Weight is still used. I think we will have to live with that though.
        Hide
        Mark Miller added a comment -

        This way individual queries are not responsible for resolving to sub-readers.

        ... so who is? The user I guess? The only place explain(IndexReader reader, int doc) appears to be called in Lucene code is with the customscore query. So with the current patch, you can call these methods and pass the top level reader like normal, but with this change, you have to get the subreader yourself.

        Also, the one internal place that explain(IndexReader reader, int doc) is called (CustomScoreQuery) would have to pass null for the IndexSearcher.

        Something doesn't seem right with this stuff ...

        Show
        Mark Miller added a comment - This way individual queries are not responsible for resolving to sub-readers. ... so who is? The user I guess? The only place explain(IndexReader reader, int doc) appears to be called in Lucene code is with the customscore query. So with the current patch, you can call these methods and pass the top level reader like normal, but with this change, you have to get the subreader yourself. Also, the one internal place that explain(IndexReader reader, int doc) is called (CustomScoreQuery) would have to pass null for the IndexSearcher. Something doesn't seem right with this stuff ...
        Hide
        Yonik Seeley added a comment -

        .. so who is? (responsible for resolving to sub-readers)

        I assumed IndexSearcher.explain().... which brings up another point... isn't a back-compat break too (since Weight is not a QueryWeight)?

        IndexSearcher.Explanation explain(QueryWeight weight, int doc) throws IOException;

        Show
        Yonik Seeley added a comment - .. so who is? (responsible for resolving to sub-readers) I assumed IndexSearcher.explain().... which brings up another point... isn't a back-compat break too (since Weight is not a QueryWeight)? IndexSearcher.Explanation explain(QueryWeight weight, int doc) throws IOException;
        Hide
        Mark Miller added a comment - - edited

        which brings up another point... isn't a back-compat break too (since Weight is not a QueryWeight)? IndexSearcher.Explanation explain(QueryWeight weight, int doc) throws IOException;

        nice catch.

        Hmmm - so scorer.explain(doc) would need to be deprecated cause it would need to take a Searcher (too avoid putting it on QueryWeight) and the sub-reader - and again this would need to be solved (like the current patch) a Query at a time.

        edit

        And then there is somehow handling the back compat of that deprecation ...

        Show
        Mark Miller added a comment - - edited which brings up another point... isn't a back-compat break too (since Weight is not a QueryWeight)? IndexSearcher.Explanation explain(QueryWeight weight, int doc) throws IOException; nice catch. Hmmm - so scorer.explain(doc) would need to be deprecated cause it would need to take a Searcher (too avoid putting it on QueryWeight) and the sub-reader - and again this would need to be solved (like the current patch) a Query at a time. edit And then there is somehow handling the back compat of that deprecation ...
        Hide
        Mark Miller added a comment -

        Exploratory test patch

        Show
        Mark Miller added a comment - Exploratory test patch
        Hide
        Mark Miller added a comment -

        Ignore my previous comment - was missing the IndexSearcher call because it was still calling explain(reader,doc) rather than explain(searcher, reader, doc)

        Show
        Mark Miller added a comment - Ignore my previous comment - was missing the IndexSearcher call because it was still calling explain(reader,doc) rather than explain(searcher, reader, doc)
        Hide
        Mark Miller added a comment -

        whoops - don't need to sub reader break out in boolean query

        Show
        Mark Miller added a comment - whoops - don't need to sub reader break out in boolean query
        Hide
        Mark Miller added a comment -

        Okay - here is patch that should be good enough for evaluating this method.

        My scorer.explain concern was invalid because Mike deprecated it anyway, so I just moved that logic up a level, and now none of the searcher,reader stuff needs to be passed to it. IndexSearcher.explain now does the reader resolve.

        Show
        Mark Miller added a comment - Okay - here is patch that should be good enough for evaluating this method. My scorer.explain concern was invalid because Mike deprecated it anyway, so I just moved that logic up a level, and now none of the searcher,reader stuff needs to be passed to it. IndexSearcher.explain now does the reader resolve.
        Hide
        Mark Miller added a comment -

        Too continue my JIRA spam on this issue:

        So the only way I can kind of get away with this without even more back compat issues is because back compat is already broken where it uses QueryWeight rather than Weight (as Yonik points out).

        Show
        Mark Miller added a comment - Too continue my JIRA spam on this issue: So the only way I can kind of get away with this without even more back compat issues is because back compat is already broken where it uses QueryWeight rather than Weight (as Yonik points out).
        Hide
        Michael McCandless added a comment -

        searcher doesn't have access to numDocs (termqueryweight) at the moment ...

        Good... cause it's a bug and it shouldn't be using it When was that added?

        I'm confused... why is that a bug?

        Show
        Michael McCandless added a comment - searcher doesn't have access to numDocs (termqueryweight) at the moment ... Good... cause it's a bug and it shouldn't be using it When was that added? I'm confused... why is that a bug?
        Hide
        Michael McCandless added a comment -

        isn't a back-compat break too (since Weight is not a QueryWeight)

        Searcher defines this:

          public Explanation explain(Weight weight, int doc) throws IOException {
            return explain(new QueryWeightWrapper(weight), doc);
          }
        

        But, yes, QueryWeightWrapper will need to forward explain to the Weight, somehow. This stuff is hard to think about!!

        Show
        Michael McCandless added a comment - isn't a back-compat break too (since Weight is not a QueryWeight) Searcher defines this: public Explanation explain(Weight weight, int doc) throws IOException { return explain( new QueryWeightWrapper(weight), doc); } But, yes, QueryWeightWrapper will need to forward explain to the Weight, somehow. This stuff is hard to think about!!
        Hide
        Mark Miller added a comment -

        Ah right - the wrapper call is on Searcher - tricky.

        So one issue with the current patch is that the old weight impls will get a sub-reader and they should still get a top level reader.

        Show
        Mark Miller added a comment - Ah right - the wrapper call is on Searcher - tricky. So one issue with the current patch is that the old weight impls will get a sub-reader and they should still get a top level reader.
        Hide
        Mark Miller added a comment -

        Also the doc passed to explain from IndexSearcher needs to be de-based

        Show
        Mark Miller added a comment - Also the doc passed to explain from IndexSearcher needs to be de-based
        Hide
        Mark Miller added a comment -

        with doc debase and top level reader to for weight explain

        Show
        Mark Miller added a comment - with doc debase and top level reader to for weight explain
        Hide
        Yonik Seeley added a comment -

        I'm confused... why is that a bug?

        Because it's maxDoc() not numDocs() that's used in the idf calculation... so if you tried to duplicate the idf calculation given the explain factors, it wouldn't match up for terms with deleted docs.

        Show
        Yonik Seeley added a comment - I'm confused... why is that a bug? Because it's maxDoc() not numDocs() that's used in the idf calculation... so if you tried to duplicate the idf calculation given the explain factors, it wouldn't match up for terms with deleted docs.
        Hide
        Michael McCandless added a comment -

        I'm confused... why is that a bug?

        Because it's maxDoc() not numDocs() that's used in the idf calculation

        Ahh, right. Mark do you want to fold that fix into your next patch here?

        Show
        Michael McCandless added a comment - I'm confused... why is that a bug? Because it's maxDoc() not numDocs() that's used in the idf calculation Ahh, right. Mark do you want to fold that fix into your next patch here?
        Hide
        Michael McCandless added a comment -

        Hmm... shouldn't explain take the top-level Searcher (not IndexSearcher)?

        I think we have to use the top reader for that (impls already expect it) - so you could have double caching problems where Weight is still used. I think we will have to live with that though.

        Maybe instead we should simply throw an exception if QueryWeight.explain(IndexReader, int) is called, stating that the Query impl must instead implement the new explain (that takes the top-level Searcher)? Would that be safer than risking accidental 2X mem usage due to a custom Query's explain?

        Show
        Michael McCandless added a comment - Hmm... shouldn't explain take the top-level Searcher (not IndexSearcher)? I think we have to use the top reader for that (impls already expect it) - so you could have double caching problems where Weight is still used. I think we will have to live with that though. Maybe instead we should simply throw an exception if QueryWeight.explain(IndexReader, int) is called, stating that the Query impl must instead implement the new explain (that takes the top-level Searcher)? Would that be safer than risking accidental 2X mem usage due to a custom Query's explain?
        Hide
        Mark Miller added a comment -

        Hmm... shouldn't explain take the top-level Searcher (not IndexSearcher)?

        Yeah, I can switch it - I only changed it to that to match the ability to get numDocs (by getting the IndexReader). If we fix that issue though, we don't need that and it can go back to Searcher.

        Maybe instead we should simply throw an exception if QueryWeight.explain(IndexReader, int) is called, stating that the Query impl must instead implement the new explain (that takes the top-level Searcher)? Would that be safer than risking accidental 2X mem usage due to a custom Query's explain?

        And break back compat?! Yeah, that seems reasonable to me. Anyone else?

        Show
        Mark Miller added a comment - Hmm... shouldn't explain take the top-level Searcher (not IndexSearcher)? Yeah, I can switch it - I only changed it to that to match the ability to get numDocs (by getting the IndexReader). If we fix that issue though, we don't need that and it can go back to Searcher. Maybe instead we should simply throw an exception if QueryWeight.explain(IndexReader, int) is called, stating that the Query impl must instead implement the new explain (that takes the top-level Searcher)? Would that be safer than risking accidental 2X mem usage due to a custom Query's explain? And break back compat?! Yeah, that seems reasonable to me. Anyone else?
        Hide
        Michael McCandless added a comment -

        And break back compat?! Yeah, that seems reasonable to me. Anyone else?

        Right. Though, we'd fix all core/contrib Query classes to cutover to the new API. It'd only be external (custom) Query impls that'd hit this. I think a hard break is better than a subtle and rather shocking "my explain method just consumed XXX MB" failure.

        Show
        Michael McCandless added a comment - And break back compat?! Yeah, that seems reasonable to me. Anyone else? Right. Though, we'd fix all core/contrib Query classes to cutover to the new API. It'd only be external (custom) Query impls that'd hit this. I think a hard break is better than a subtle and rather shocking "my explain method just consumed XXX MB" failure.
        Hide
        Mark Miller added a comment -

        I think a hard break is better than a subtle and rather shocking "my explain method just consumed XXX MB" failure.

        Yeah, I agreed But shouldn't we just make it a compile time error? What if explain is part of their current apps code? Should we wait to blow up on them?

        Show
        Mark Miller added a comment - I think a hard break is better than a subtle and rather shocking "my explain method just consumed XXX MB" failure. Yeah, I agreed But shouldn't we just make it a compile time error? What if explain is part of their current apps code? Should we wait to blow up on them?
        Hide
        Michael McCandless added a comment -

        But shouldn't we just make it a compile time error?

        +1, that's even better!

        Show
        Michael McCandless added a comment - But shouldn't we just make it a compile time error? +1, that's even better!
        Hide
        Mark Miller added a comment -

        But it would imply that we merge QueryWeight back into Weight? That seems like the logical break ...

        Show
        Mark Miller added a comment - But it would imply that we merge QueryWeight back into Weight? That seems like the logical break ...
        Hide
        Michael McCandless added a comment -

        Hmm, Weight is an interface, which we are wanting to get away from. We could simply change the explain in Weight to take the Searcher? That'd break compilation but let us keep the switch to abstract class QueryWeight?

        Show
        Michael McCandless added a comment - Hmm, Weight is an interface, which we are wanting to get away from. We could simply change the explain in Weight to take the Searcher? That'd break compilation but let us keep the switch to abstract class QueryWeight?
        Hide
        Mark Miller added a comment -

        That was my first consideration - but then I though, why not just make them switch to QueryWeight then? And then why not just make QueryWeight back to Weight (but abstract) - its not really much more work, and if you have to do work anyway, why not come all the way forward for a couple more lines?

        Show
        Mark Miller added a comment - That was my first consideration - but then I though, why not just make them switch to QueryWeight then? And then why not just make QueryWeight back to Weight (but abstract) - its not really much more work, and if you have to do work anyway, why not come all the way forward for a couple more lines?
        Hide
        Michael McCandless added a comment -

        why not just make QueryWeight back to Weight (but abstract)

        OK I think that's a good idea... so then the compilation errors a custom Query will see is 1) they have to change from "implements Weight" to "extends Weight", and 2) they have to add the Searcher arg to explain (and fix their explain code so it expects a single segment w/ the IndexReader, and the top-level Searcher).

        Sheesh we're really pulling on the little string, here

        Show
        Michael McCandless added a comment - why not just make QueryWeight back to Weight (but abstract) OK I think that's a good idea... so then the compilation errors a custom Query will see is 1) they have to change from "implements Weight" to "extends Weight", and 2) they have to add the Searcher arg to explain (and fix their explain code so it expects a single segment w/ the IndexReader, and the top-level Searcher). Sheesh we're really pulling on the little string, here
        Hide
        Mark Miller added a comment -

        Here it is basically.

        Core tests pass, but oddly TestCompoundWordTokenFilter took a real long time in contrib.

        [junit] Testsuite: org.apache.lucene.analysis.compound.TestCompoundWordTokenFilter
        [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 195.028 sec

        Benchmark also was pretty slow. I'm guessing because of autocommit. Similar symptoms.

        Perhaps my OS is messed up - would be odd though - fresh install from a week ago. Never noticed things being so slow before though. TestQualityRun is taking forever. So the benchmark thing could very well be an isolated issue.

        Show
        Mark Miller added a comment - Here it is basically. Core tests pass, but oddly TestCompoundWordTokenFilter took a real long time in contrib. [junit] Testsuite: org.apache.lucene.analysis.compound.TestCompoundWordTokenFilter [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 195.028 sec Benchmark also was pretty slow. I'm guessing because of autocommit. Similar symptoms. Perhaps my OS is messed up - would be odd though - fresh install from a week ago. Never noticed things being so slow before though. TestQualityRun is taking forever. So the benchmark thing could very well be an isolated issue.
        Hide
        Mark Miller added a comment - - edited

        Well nevermind - no auto commit in either of those. Looks like it was a network issue of some kind for downloading files. All of a sudden both are fast again, and when they were so slow, TestCompoundWordTokenFilter was hung up sun.net.www.http.KeepAliveCache.run(). Different issue, but had something of a similar look to it.

        Show
        Mark Miller added a comment - - edited Well nevermind - no auto commit in either of those. Looks like it was a network issue of some kind for downloading files. All of a sudden both are fast again, and when they were so slow, TestCompoundWordTokenFilter was hung up sun.net.www.http.KeepAliveCache.run(). Different issue, but had something of a similar look to it.
        Hide
        Robert Muir added a comment -

        it looks like it is downloading files due to licensing constraints (it checks compounding against a german file that is under the LaTeX license).

        however that project has apache 2 licensed files for danish and italian so we can look at testing in one of those languages instead in the future, that way test data could simply be included?

        the test being slow/downloading files has confused me before as well.

        Show
        Robert Muir added a comment - it looks like it is downloading files due to licensing constraints (it checks compounding against a german file that is under the LaTeX license). however that project has apache 2 licensed files for danish and italian so we can look at testing in one of those languages instead in the future, that way test data could simply be included? the test being slow/downloading files has confused me before as well.
        Hide
        Michael McCandless added a comment -

        however that project has apache 2 licensed files for danish and italian so we can look at testing in one of those languages instead in the future, that way test data could simply be included?

        +1, this would be awesome

        In general I'm wary of having our tests rely on external sites being available...

        TestCompoundWordTokenFilter frequently takes waaaay long for me, while it goes and downloads the test files. We must be generating too much traffic by running our unit tests so much

        Show
        Michael McCandless added a comment - however that project has apache 2 licensed files for danish and italian so we can look at testing in one of those languages instead in the future, that way test data could simply be included? +1, this would be awesome In general I'm wary of having our tests rely on external sites being available... TestCompoundWordTokenFilter frequently takes waaaay long for me, while it goes and downloads the test files. We must be generating too much traffic by running our unit tests so much
        Hide
        Michael McCandless added a comment -

        Patch looks good Mark! This is a nice cleanup (coming full circle, back to just Weight, again).

        You need to fix back-compat tests, too.

        Show
        Michael McCandless added a comment - Patch looks good Mark! This is a nice cleanup (coming full circle, back to just Weight, again). You need to fix back-compat tests, too.
        Hide
        Hoss Man added a comment -

        FWIW: the last patch was giving me compile errors because BoostingNearQuery still referenced the removed QueryWeight class.

        I think this is the correct fix.

        Show
        Hoss Man added a comment - FWIW: the last patch was giving me compile errors because BoostingNearQuery still referenced the removed QueryWeight class. I think this is the correct fix.
        Hide
        Mark Miller added a comment -

        Thanks - BoostingNearQuery was just added, so it wasn't around for the patch.

        Show
        Mark Miller added a comment - Thanks - BoostingNearQuery was just added, so it wasn't around for the patch.
        Hide
        Mark Miller added a comment -

        workarounds for the back compat test branch

        Show
        Mark Miller added a comment - workarounds for the back compat test branch
        Hide
        Michael McCandless added a comment -

        Patch looks good!:

        • Looks like you need to "svn rm
          src/java/org/apache/lucene/search/QueryWeight.java"
        • Some javadocs still reference QueryWeight
        • Why do we need this in Weight?
          public Explanation explain(IndexReader reader, int doc) throws IOException {
            return explain(null, reader, doc);
          }
          

          Ie, do we think there are places outside of Lucene that invoke
          Weight.explain directly?

        Show
        Michael McCandless added a comment - Patch looks good!: Looks like you need to "svn rm src/java/org/apache/lucene/search/QueryWeight.java" Some javadocs still reference QueryWeight Why do we need this in Weight? public Explanation explain(IndexReader reader, int doc) throws IOException { return explain( null , reader, doc); } Ie, do we think there are places outside of Lucene that invoke Weight.explain directly?
        Hide
        Mark Miller added a comment -

        The Changes for this one is kind of complicated ...

        Show
        Mark Miller added a comment - The Changes for this one is kind of complicated ...
        Hide
        Mark Miller added a comment -

        javadoc tweak/fix
        removed explain(reader,doc)

        how to handle all of the previous change entries this affects ... ?

        Show
        Mark Miller added a comment - javadoc tweak/fix removed explain(reader,doc) how to handle all of the previous change entries this affects ... ?
        Hide
        Mark Miller added a comment -

        One last patch - fix/doc null weight.explain null searcher issue
        remove a few unused imports

        Show
        Mark Miller added a comment - One last patch - fix/doc null weight.explain null searcher issue remove a few unused imports
        Hide
        Michael McCandless added a comment -

        how to handle all of the previous change entries this affects ... ?

        I think they should all be fixed?

        Ie, the CHANGES.txt is something users can read to see what's changed since the last release, not the blow-by-blow account of the series of iterations we went through before releasing.

        Show
        Michael McCandless added a comment - how to handle all of the previous change entries this affects ... ? I think they should all be fixed? Ie, the CHANGES.txt is something users can read to see what's changed since the last release, not the blow-by-blow account of the series of iterations we went through before releasing.
        Hide
        Mark Miller added a comment -

        updates to apply to trunk and adds a stab at reconciling Changes

        Show
        Mark Miller added a comment - updates to apply to trunk and adds a stab at reconciling Changes
        Hide
        Mark Miller added a comment -

        Hoss Man uses Chris Hostetter in Changes? Weak. I'll update it before committing.

        Show
        Mark Miller added a comment - Hoss Man uses Chris Hostetter in Changes? Weak. I'll update it before committing.
        Hide
        Mark Miller added a comment - - edited

        I'm going to give this one another day or two and then commit if nothing comes up.

        Show
        Mark Miller added a comment - - edited I'm going to give this one another day or two and then commit if nothing comes up.
        Hide
        Mark Miller added a comment -

        I think the Changes entries could still use some polish.

        Here are the originals:

        3. LUCENE-1630: Deprecate Weight in favor of QueryWeight: added
        matching methods to Searcher to take QueryWeight and deprecated
        those taking Weight. If you have a Weight implementation, you can
        turn it into a QueryWeight with QueryWeightWrapper (will be
        removed in 3.0). All of the Weight-based methods were implemented
        by calling the QueryWeight variants by wrapping the given Weight.
        Going forward Searchable will be kept for convenience only and may
        be changed between minor releases without any deprecation
        process. It is not recommended to implement it, but rather extend
        Searcher. (Shai Erera via Mike McCandless)

        24. LUCENE-1630: Deprecate Weight in favor of QueryWeight, which adds
        a scorer(IndexReader, boolean /* scoreDocsInOrder /, boolean /
        topScorer */) method instead of scorer(IndexReader) (now
        deprecated). The new method is used by IndexSearcher to mate
        between Collector and Scorer orderness of doc IDs. Some Scorers
        (like BooleanScorer) are much more efficient if out-of-order
        documents scoring is allowed by a Collector. Collector must now
        implement acceptsDocsOutOfOrder. If you write a Collector which
        does not care about doc ID orderness, it is recommended that you
        return true. QueryWeight has the scoresDocsOutOfOrder method,
        which by default returns false. If you create a QueryWeight which
        will score documents out of order if that's requested, you should
        override that method to return true. Also deprecated
        BooleanQuery's setAllowDocsOutOfOrder and getAllowDocsOutOfOrder
        as they are not needed anymore. BooleanQuery will now score docs
        out of order when used with a Collector that can accept docs out
        of order. (Shai Erera via Mike McCandless)

        And I've changed to:

        3. LUCENE-1630, LUCENE-1771: Weight, previously an interface, is now an abstract
        class. Some of the method signatures have changed, but it should be fairly
        easy to see what adjustments must be made to existing code to sync up
        with the new API. You can find more detail in the API Changes section.

        Going forward Searchable will be kept for convenience only and may
        be changed between minor releases without any deprecation
        process. It is not recommended that you implement it, but rather extend
        Searcher. (Shai Erera, Chris Hostetter, Mark Miller via Mike McCandless)

        24. LUCENE-1630, LUCENE-1771: Weight is now an abstract class, andd adds
        a scorer(IndexReader, boolean /* scoreDocsInOrder /, boolean /
        topScorer */) method instead of scorer(IndexReader). The new method
        is used by IndexSearcher to mate Collector and Scorer orderness
        of doc IDs. Some Scorers (like BooleanScorer) are much more efficient
        if out-of-order documents scoring is allowed by a Collector. Collector
        must now implement acceptsDocsOutOfOrder. If you write a Collector
        which does not care about doc ID orderness, it is recommended that you
        return true. Weight has a scoresDocsOutOfOrder method, which by default
        returns false. If you create a Weight which will score documents out of
        order if requested, you should override that method to return true.
        BooleanQuery's setAllowDocsOutOfOrder and getAllowDocsOutOfOrder have
        been deprecated as they are not needed anymore. BooleanQuery will now
        score docs out of order when used with a Collector that can accept docs
        out of order. Finally, Weight#explain now also takes a Searcher.
        (Shai Erera, Chris Hostetter, Mark Miller via Mike McCandless)

        So one thing I don't like is

        The new method is used by IndexSearcher to mate Collector and Scorer orderness of doc IDs.

        I know what it means, but its a bit vague and awkward. I'm not sure how to best rewrite it though.

        Also, while the entry talks a lot about the orderedIds param, it doesn't really mention the topScorer param. I'm still not 100% clear on it after looking at the javadocs either (without looking at the code). I don't think top scorer can be defined using primarily the term top scorer - it might as well be the omega scorer unless you can point me to the Lucene dictionary where I can look it up:

          * @param topScorer
           *          specifies whether the returned {@link Scorer} will be used as a
           *          top scorer or as in iterator. I.e., if true,
           *          {@link Scorer#score(Collector)} will be called; if false,
           *          {@link Scorer#nextDoc()} and/or {@link Scorer#advance(int)} will
           *          be called.
        

        So I still want to beef up these changes a bit, and I think the javadoc needs to be a bit more clear - as we don't provide a convenience method that chooses defaults for those params, it should be very easy to quickly figure out whats best to pass. Right now, its just a little less than clear without some code browsing.

        Show
        Mark Miller added a comment - I think the Changes entries could still use some polish. Here are the originals: 3. LUCENE-1630 : Deprecate Weight in favor of QueryWeight: added matching methods to Searcher to take QueryWeight and deprecated those taking Weight. If you have a Weight implementation, you can turn it into a QueryWeight with QueryWeightWrapper (will be removed in 3.0). All of the Weight-based methods were implemented by calling the QueryWeight variants by wrapping the given Weight. Going forward Searchable will be kept for convenience only and may be changed between minor releases without any deprecation process. It is not recommended to implement it, but rather extend Searcher. (Shai Erera via Mike McCandless) 24. LUCENE-1630 : Deprecate Weight in favor of QueryWeight, which adds a scorer(IndexReader, boolean /* scoreDocsInOrder /, boolean / topScorer */) method instead of scorer(IndexReader) (now deprecated). The new method is used by IndexSearcher to mate between Collector and Scorer orderness of doc IDs. Some Scorers (like BooleanScorer) are much more efficient if out-of-order documents scoring is allowed by a Collector. Collector must now implement acceptsDocsOutOfOrder. If you write a Collector which does not care about doc ID orderness, it is recommended that you return true. QueryWeight has the scoresDocsOutOfOrder method, which by default returns false. If you create a QueryWeight which will score documents out of order if that's requested, you should override that method to return true. Also deprecated BooleanQuery's setAllowDocsOutOfOrder and getAllowDocsOutOfOrder as they are not needed anymore. BooleanQuery will now score docs out of order when used with a Collector that can accept docs out of order. (Shai Erera via Mike McCandless) And I've changed to: 3. LUCENE-1630 , LUCENE-1771 : Weight, previously an interface, is now an abstract class. Some of the method signatures have changed, but it should be fairly easy to see what adjustments must be made to existing code to sync up with the new API. You can find more detail in the API Changes section. Going forward Searchable will be kept for convenience only and may be changed between minor releases without any deprecation process. It is not recommended that you implement it, but rather extend Searcher. (Shai Erera, Chris Hostetter, Mark Miller via Mike McCandless) 24. LUCENE-1630 , LUCENE-1771 : Weight is now an abstract class, andd adds a scorer(IndexReader, boolean /* scoreDocsInOrder /, boolean / topScorer */) method instead of scorer(IndexReader). The new method is used by IndexSearcher to mate Collector and Scorer orderness of doc IDs. Some Scorers (like BooleanScorer) are much more efficient if out-of-order documents scoring is allowed by a Collector. Collector must now implement acceptsDocsOutOfOrder. If you write a Collector which does not care about doc ID orderness, it is recommended that you return true. Weight has a scoresDocsOutOfOrder method, which by default returns false. If you create a Weight which will score documents out of order if requested, you should override that method to return true. BooleanQuery's setAllowDocsOutOfOrder and getAllowDocsOutOfOrder have been deprecated as they are not needed anymore. BooleanQuery will now score docs out of order when used with a Collector that can accept docs out of order. Finally, Weight#explain now also takes a Searcher. (Shai Erera, Chris Hostetter, Mark Miller via Mike McCandless) So one thing I don't like is The new method is used by IndexSearcher to mate Collector and Scorer orderness of doc IDs. I know what it means, but its a bit vague and awkward. I'm not sure how to best rewrite it though. Also, while the entry talks a lot about the orderedIds param, it doesn't really mention the topScorer param. I'm still not 100% clear on it after looking at the javadocs either (without looking at the code). I don't think top scorer can be defined using primarily the term top scorer - it might as well be the omega scorer unless you can point me to the Lucene dictionary where I can look it up: * @param topScorer * specifies whether the returned {@link Scorer} will be used as a * top scorer or as in iterator. I.e., if true , * {@link Scorer#score(Collector)} will be called; if false , * {@link Scorer#nextDoc()} and/or {@link Scorer#advance( int )} will * be called. So I still want to beef up these changes a bit, and I think the javadoc needs to be a bit more clear - as we don't provide a convenience method that chooses defaults for those params, it should be very easy to quickly figure out whats best to pass. Right now, its just a little less than clear without some code browsing.
        Hide
        Michael McCandless added a comment -

        Maybe say "Weight#explain now takes the top-level searcher, sub-reader and sub-docID"?

        Also there's a small typo (andd -> and).

        So one thing I don't like is

        The new method is used by IndexSearcher to mate Collector and Scorer orderness of doc IDs.

        How about "IndexSearcher uses this method to obtain a scorer matching the capabilities of the Collector wrt orderness of docIDs"?

        Also, while the entry talks a lot about the orderedIds param, it doesn't really mention the topScorer param

        How about simply removing the first sentence, ie, just jump straight into "if true..."?

        Show
        Michael McCandless added a comment - Maybe say "Weight#explain now takes the top-level searcher, sub-reader and sub-docID"? Also there's a small typo (andd -> and). So one thing I don't like is The new method is used by IndexSearcher to mate Collector and Scorer orderness of doc IDs. How about "IndexSearcher uses this method to obtain a scorer matching the capabilities of the Collector wrt orderness of docIDs"? Also, while the entry talks a lot about the orderedIds param, it doesn't really mention the topScorer param How about simply removing the first sentence, ie, just jump straight into "if true..."?
        Hide
        Mark Miller added a comment -

        Okay - just ran the tests one more time and looked over everything. I'm going to commit this later tonight unless anything comes up.

        Show
        Mark Miller added a comment - Okay - just ran the tests one more time and looked over everything. I'm going to commit this later tonight unless anything comes up.
        Hide
        Mark Miller added a comment -

        committed - thanks all!

        Show
        Mark Miller added a comment - committed - thanks all!

          People

          • Assignee:
            Mark Miller
            Reporter:
            Mark Miller
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development