Lucene - Core
  1. Lucene - Core
  2. LUCENE-1847

PhraseQuery/TermQuery/SpanQuery use IndexReader specific stats in their explains

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      PhraseQuery uses IndexReader in explainfor top level stats - as mentioned by Mike McCandless in LUCENE-1837.
      TermQuery uses IndexReader in explain for top level stats

      Always been a bug with MultiSearcher, but per segment search makes it worse.

      1. LUCENE-1847.patch
        8 kB
        Mark Miller
      2. LUCENE-1847.patch
        16 kB
        Mark Miller
      3. LUCENE-1847.patch
        18 kB
        Mark Miller
      4. LUCENE-1847.patch
        18 kB
        Mark Miller
      5. LUCENE-1847.patch
        16 kB
        Mark Miller
      6. LUCENE-1847.patch
        17 kB
        Mark Miller
      7. LUCENE-1847.patch
        17 kB
        Mark Miller

        Issue Links

          Activity

          Hide
          Mark Miller added a comment -

          Okay - I'm going to use the other issue just to revert the Searcher - more of a task.

          This issue can then be used to track the new work for this bug here.

          Show
          Mark Miller added a comment - Okay - I'm going to use the other issue just to revert the Searcher - more of a task. This issue can then be used to track the new work for this bug here.
          Hide
          Mark Miller added a comment -

          rough draft - works as the fix, but could use some polish.

          And what to do about back compat - because its just explain output, I don't think its a huge deal,
          but if you overrode idf(Collection terms, Searcher searcher) or idf(Term term, Searcher searcher) on
          Sim, right now they wouldn't be called for TermQuery and PhraseQuery. An override of float idf(int docFreq, int numDocs)
          would still be taken into account though.

          Show
          Mark Miller added a comment - rough draft - works as the fix, but could use some polish. And what to do about back compat - because its just explain output, I don't think its a huge deal, but if you overrode idf(Collection terms, Searcher searcher) or idf(Term term, Searcher searcher) on Sim, right now they wouldn't be called for TermQuery and PhraseQuery. An override of float idf(int docFreq, int numDocs) would still be taken into account though.
          Hide
          Mark Miller added a comment -

          TestSimilarity fails due to the back compat issue by the way - it overrides a sim to return hard coded values for the two above ref'd methods

          Show
          Mark Miller added a comment - TestSimilarity fails due to the back compat issue by the way - it overrides a sim to return hard coded values for the two above ref'd methods
          Hide
          Michael McCandless added a comment -

          Patch looks good. Can we name it IDFExplain instead of IdfExplain? Nit-picky I know, but I try to do all caps when the thing is already an acronym.

          Shouldn't we deprecate Similarity's idf methods that return float?

          You may want to defer calling IdfExplain.explain() until the Weight's explain is invoked? And, then, store the IdfExplain instance in the Weight, and then IdfExplain must implement Serializable, and maybe add a Javadoc stating that Weight instances may hold onto your IdfExplain so be careful about holding references to big/unserializable things if you need to serialize.

          On back-compat... we could check (using reflection) if the subclass has overridden it, and in that case forward to the subclass's method, with some canned explanation eg "inexplicable"

          Show
          Michael McCandless added a comment - Patch looks good. Can we name it IDFExplain instead of IdfExplain? Nit-picky I know, but I try to do all caps when the thing is already an acronym. Shouldn't we deprecate Similarity's idf methods that return float? You may want to defer calling IdfExplain.explain() until the Weight's explain is invoked? And, then, store the IdfExplain instance in the Weight, and then IdfExplain must implement Serializable, and maybe add a Javadoc stating that Weight instances may hold onto your IdfExplain so be careful about holding references to big/unserializable things if you need to serialize. On back-compat... we could check (using reflection) if the subclass has overridden it, and in that case forward to the subclass's method, with some canned explanation eg "inexplicable"
          Hide
          Mark Miller added a comment -

          Can we name it IDFExplain instead of IdfExplain? Nit-picky I know, but I try to do all caps when the thing is already an acronym.

          Yup, sounds good.

          Shouldn't we deprecate Similarity's idf methods that return float?

          Yes - I've done that locally - just didn't concentrate it on this patch because I hadn't solved the back compat problems yet.

          You may want to defer calling IdfExplain.explain() until the Weight's explain is invoked?

          +1

          On back-compat... we could check (using reflection) if the subclass has overridden it, and in that case forward to the subclass's method, with some canned explanation eg "inexplicable"

          Ugg ... at least this code isn't performance sensitive I'll work with that.

          Show
          Mark Miller added a comment - Can we name it IDFExplain instead of IdfExplain? Nit-picky I know, but I try to do all caps when the thing is already an acronym. Yup, sounds good. Shouldn't we deprecate Similarity's idf methods that return float? Yes - I've done that locally - just didn't concentrate it on this patch because I hadn't solved the back compat problems yet. You may want to defer calling IdfExplain.explain() until the Weight's explain is invoked? +1 On back-compat... we could check (using reflection) if the subclass has overridden it, and in that case forward to the subclass's method, with some canned explanation eg "inexplicable" Ugg ... at least this code isn't performance sensitive I'll work with that.
          Hide
          Mark Miller added a comment -

          Its a problem with SpanQuery as well

          Show
          Mark Miller added a comment - Its a problem with SpanQuery as well
          Hide
          Mark Miller added a comment -

          Okay, almost there. This one fixes SpanWeight as well and checks for back compat. Plus other little polish stuff.

          Show
          Mark Miller added a comment - Okay, almost there. This one fixes SpanWeight as well and checks for back compat. Plus other little polish stuff.
          Hide
          Michael McCandless added a comment -

          Looks good!

          Your isMethodOverridden is incorrectly checking against
          TokenStream.class (should be Similarity.class instead). Maybe add a
          test case that asserts "maxDocs=XXX" is present in TermQuery's explain?

          Small typo: explaination -> explanation

          Show
          Michael McCandless added a comment - Looks good! Your isMethodOverridden is incorrectly checking against TokenStream.class (should be Similarity.class instead). Maybe add a test case that asserts "maxDocs=XXX" is present in TermQuery's explain? Small typo: explaination -> explanation
          Hide
          Mark Miller added a comment -

          add test case for TermQuery explain over multisearcher.

          fixed spelling - my ubber eclipse spellcheck file likes that version ...

          nice eyes on the TokenStream - classic cut and paste folly.

          Show
          Mark Miller added a comment - add test case for TermQuery explain over multisearcher. fixed spelling - my ubber eclipse spellcheck file likes that version ... nice eyes on the TokenStream - classic cut and paste folly.
          Hide
          Michael McCandless added a comment -

          New patch has explaination in the javadoc

          Otherwise patch looks great! I like the new unit test

          Show
          Michael McCandless added a comment - New patch has explaination in the javadoc Otherwise patch looks great! I like the new unit test
          Hide
          Mark Miller added a comment -

          Okay, one final version I hope:

          fixed javadoc spelling of explanations
          removed unused imports
          minor javadoc tweaks

          Show
          Mark Miller added a comment - Okay, one final version I hope: fixed javadoc spelling of explanations removed unused imports minor javadoc tweaks
          Hide
          Michael McCandless added a comment -

          Looks good!

          Show
          Michael McCandless added a comment - Looks good!
          Hide
          Mark Miller added a comment -

          I'm wondering if we should cache the isMethodOverridden now. clazz.getMethod is painfully slow.

          If I keep teeing up a ton of new TermQuery/PhraseQuery/SpanQuery Weights for searching, its going to be much, much slower in a throughput benchmark.

          Show
          Mark Miller added a comment - I'm wondering if we should cache the isMethodOverridden now. clazz.getMethod is painfully slow. If I keep teeing up a ton of new TermQuery/PhraseQuery/SpanQuery Weights for searching, its going to be much, much slower in a throughput benchmark.
          Hide
          Mark Miller added a comment -

          Hmm - it would have to be a static cache - thats not great. But perhaps the lesser of two evils.

          Show
          Mark Miller added a comment - Hmm - it would have to be a static cache - thats not great. But perhaps the lesser of two evils.
          Hide
          Michael McCandless added a comment -

          You could cache it on the Similarity instance?

          Show
          Michael McCandless added a comment - You could cache it on the Similarity instance?
          Hide
          Mark Miller added a comment -

          of course. right where it belongs Darn brain is lazier than my aim.

          Show
          Mark Miller added a comment - of course. right where it belongs Darn brain is lazier than my aim.
          Hide
          Mark Miller added a comment -

          Okay, adds a cache and puts it on Similarity -

          This will let user queries that use the new explain stuff use Sims that overload those methods too - not a likely scerioro, but nice to know its covered now.

          I changed to inexplicable - thats a great word.

          Just finished, so merits another look over before commit ready.

          Show
          Mark Miller added a comment - Okay, adds a cache and puts it on Similarity - This will let user queries that use the new explain stuff use Sims that overload those methods too - not a likely scerioro, but nice to know its covered now. I changed to inexplicable - thats a great word. Just finished, so merits another look over before commit ready.
          Hide
          Mark Miller added a comment -

          Okay, I'm still awake, so one more:

          Removed some white space to clean up diff
          Changed IDFExplain to IDFExplanation
          Added Phrase and SpanQuery to the new test.

          Show
          Mark Miller added a comment - Okay, I'm still awake, so one more: Removed some white space to clean up diff Changed IDFExplain to IDFExplanation Added Phrase and SpanQuery to the new test.
          Hide
          Michael McCandless added a comment -

          I changed to inexplicable - thats a great word.

          Yay

          Small typo: explantion. This sure is one hard word to get right!

          Otherwise looks good to go!

          Show
          Michael McCandless added a comment - I changed to inexplicable - thats a great word. Yay Small typo: explantion. This sure is one hard word to get right! Otherwise looks good to go!
          Hide
          Mark Miller added a comment -

          One more -

          I killed my spellchecker trying to get explaination out of it

          Fixed the new misspelling though, and removed the java 1.5 code 'contains' - thanks to whoever committed that to the qp test as well - I wouldn't have noticed it was a problem here otherwise

          Show
          Mark Miller added a comment - One more - I killed my spellchecker trying to get explaination out of it Fixed the new misspelling though, and removed the java 1.5 code 'contains' - thanks to whoever committed that to the qp test as well - I wouldn't have noticed it was a problem here otherwise
          Hide
          Mark Miller added a comment -

          Thanks a lot Mike !

          r807595.

          Show
          Mark Miller added a comment - Thanks a lot Mike ! r807595.

            People

            • Assignee:
              Mark Miller
              Reporter:
              Mark Miller
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development