Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-903

FilteredQuery explanation inaccuracy with boost

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.2
    • Fix Version/s: 2.2
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      Patch Available

      Description

      The value of explanation is different than the product of its part if boost > 1.
      This is exposed after tightening the explanation check (part of LUCENE-446).

      1. lucene-903.patch
        7 kB
        Doron Cohen
      2. lucene-903.patch
        12 kB
        Hoss Man
      3. lucene-903.patch
        15 kB
        Doron Cohen

        Issue Links

          Activity

          Hide
          doronc Doron Cohen added a comment -

          committed the last patch, thanks for your help Hoss.

          Show
          doronc Doron Cohen added a comment - committed the last patch, thanks for your help Hoss.
          Hide
          doronc Doron Cohen added a comment -

          Thanks for the review Hoss!

          I like with your changes to the patch.
          I modified further:

          • do a "deep" check in TestBoostingTermQuery.java
          • remove the call to checkExplanations from TestExplanations.qtest() because
            that is also called in QueryUtils.checkQuery(). (otherwise it would be done twice.)

          I am ok with original checkExplanations() remaining "shallow", since all core tests now call the "deep" version.

          I intend to commit this shortly.

          Show
          doronc Doron Cohen added a comment - Thanks for the review Hoss! I like with your changes to the patch. I modified further: do a "deep" check in TestBoostingTermQuery.java remove the call to checkExplanations from TestExplanations.qtest() because that is also called in QueryUtils.checkQuery(). (otherwise it would be done twice.) I am ok with original checkExplanations() remaining "shallow", since all core tests now call the "deep" version. I intend to commit this shortly.
          Hide
          hossman Hoss Man added a comment -

          in general i'm not fond of the inspecting the string to determine whether the math is correct ... in an ideal world every query would have it's own unit test class and it would have a "testExplanation" method that would know exactly what structure (ie: how many sub-details) to expect from an artificially constructed query instance. the current approach of having big "explanation" test classes that validate an explanation by comparing them to the scores is a fairly big hack (and i say that as the guy that added it).

          But i'm okay with the string comparisons as a "Convenience" to make it easier for us to test the explanations of the core queries (since so many of them follow very clear patterns) ... but i think we should avoid having any sort of expectation on the string for custom queries (either that they match "product of" or that they match "other:")

          my main concern is purely with giving people a way to do the same kind of Explanation testing they could do before (that the root value equals the score) even if the details of their explanation don't fit any of the patterns of the existing core queries. (i'd hate for people to not even do shallow testing of their Explanations just because we don't give them a method for doing so)

          this refactoring...

          • makes verifyExplanations a public method in CheckHits that anyone can call
          • adds a "deep" boolean option to verifyExplanations
          • adds variants of CheckHits.checkExplanations and the ExplanationAsserter to make the "deep" testing optional
          • makes QueryUtils.checkExplanations use the "deep" option
          • makes TestExplanations explicitly use deep testing

          ...at the moment, the "old" APIs default to deep==false but i'm on board with changing those to "true" if you want to (in theory it's a non-backawards compatible API change, but since it's in src/test we can probably get away with it).

          Show
          hossman Hoss Man added a comment - in general i'm not fond of the inspecting the string to determine whether the math is correct ... in an ideal world every query would have it's own unit test class and it would have a "testExplanation" method that would know exactly what structure (ie: how many sub-details) to expect from an artificially constructed query instance. the current approach of having big "explanation" test classes that validate an explanation by comparing them to the scores is a fairly big hack (and i say that as the guy that added it). But i'm okay with the string comparisons as a "Convenience" to make it easier for us to test the explanations of the core queries (since so many of them follow very clear patterns) ... but i think we should avoid having any sort of expectation on the string for custom queries (either that they match "product of" or that they match "other:") my main concern is purely with giving people a way to do the same kind of Explanation testing they could do before (that the root value equals the score) even if the details of their explanation don't fit any of the patterns of the existing core queries. (i'd hate for people to not even do shallow testing of their Explanations just because we don't give them a method for doing so) this refactoring... makes verifyExplanations a public method in CheckHits that anyone can call adds a "deep" boolean option to verifyExplanations adds variants of CheckHits.checkExplanations and the ExplanationAsserter to make the "deep" testing optional makes QueryUtils.checkExplanations use the "deep" option makes TestExplanations explicitly use deep testing ...at the moment, the "old" APIs default to deep==false but i'm on board with changing those to "true" if you want to (in theory it's a non-backawards compatible API change, but since it's in src/test we can probably get away with it).
          Hide
          doronc Doron Cohen added a comment -

          > holding them to the same standard (when they can't easily modify
          > the string pattern rules in CheckHits) doesn't seem fair.

          > lemme try to refactor the current patch a bit so that the "deep"
          > Explanation testing is optional (and used by the core tests)

          I agree (and pointed that) the "deep" explanation test is imperfect in this regard. But it seems to me that it is the benefit of new query writers to have their new queries explanations tested as thorough as possible. So I am not sure that disabling this test by default (for non core queries) is the right thing to do.

          How do you feel about adding an escape way, allowing one to specify that a certain (part of an) explanation is not to be checked in this manner?

          I see two immediate ways to add this:

          • by the explanation text: if the description starts with "other:" the explanation is not subject to this "deep" check
          • by api: adding a property to Explanation class - disabledDeepCheck - false by default - allows new unique queries to deliberately disable this "deep" test.
          Show
          doronc Doron Cohen added a comment - > holding them to the same standard (when they can't easily modify > the string pattern rules in CheckHits) doesn't seem fair. > lemme try to refactor the current patch a bit so that the "deep" > Explanation testing is optional (and used by the core tests) I agree (and pointed that) the "deep" explanation test is imperfect in this regard. But it seems to me that it is the benefit of new query writers to have their new queries explanations tested as thorough as possible. So I am not sure that disabling this test by default (for non core queries) is the right thing to do. How do you feel about adding an escape way, allowing one to specify that a certain (part of an) explanation is not to be checked in this manner? I see two immediate ways to add this: by the explanation text: if the description starts with "other:" the explanation is not subject to this "deep" check by api: adding a property to Explanation class - disabledDeepCheck - false by default - allows new unique queries to deliberately disable this "deep" test.
          Hide
          hossman Hoss Man added a comment -

          I have some reservations about this patch.

          I think it's fine to set as a goal that all "core" query classes should have Explanations where the description accurately describes a mathematical calculation that can be performed on the details to arrive at the value of the Explanation – which is easy to do since we have the luxury of changing the CHeckHits class to suit our needs anytime we add a new class that has a more "interesting" mathematical calculation then we can current account for.

          But we should also try to maintain CheckHIts as a reusable class that clients can use to run basic sanity tests on their own custom Query classes, and holding them to the same standard (when they can't easily modify the string pattern rules in CheckHits) doesn't seem fair.

          lemme try to refactor the current patch a bit so that the "deep" Explanation testing is optional (and used by the core tests)

          Show
          hossman Hoss Man added a comment - I have some reservations about this patch. I think it's fine to set as a goal that all "core" query classes should have Explanations where the description accurately describes a mathematical calculation that can be performed on the details to arrive at the value of the Explanation – which is easy to do since we have the luxury of changing the CHeckHits class to suit our needs anytime we add a new class that has a more "interesting" mathematical calculation then we can current account for. But we should also try to maintain CheckHIts as a reusable class that clients can use to run basic sanity tests on their own custom Query classes, and holding them to the same standard (when they can't easily modify the string pattern rules in CheckHits) doesn't seem fair. lemme try to refactor the current patch a bit so that the "deep" Explanation testing is optional (and used by the core tests)
          Hide
          doronc Doron Cohen added a comment -

          (explanation validity including for FilteredQuery was handled there.)

          Show
          doronc Doron Cohen added a comment - (explanation validity including for FilteredQuery was handled there.)
          Hide
          doronc Doron Cohen added a comment -

          Attached lucene-903.patch change the explanation test in CheckHits to verify that the explanation value matches the explanation details. This new test makes assumptions on the description of an explanation that has multi sub-details, which might be controversial: the description is assumed to comply with one of:

          • end with "product of:"
          • end with "sum of:"
          • end with "sum of:"
          • contain "max plus <x> times others" (where <x> is float)
            While this seems to cover all the current scoring combinations, future scorers might not fit in.
            We can always expand this list of course.

          With this check, TestSimpleExplanations fails for FilteredQuery - testFQ6().
          Fix included.

          Also failing is TestBoostingTermQuery, just because its explanation had no "product of:".
          Fix included.

          All tests pass.

          Show
          doronc Doron Cohen added a comment - Attached lucene-903.patch change the explanation test in CheckHits to verify that the explanation value matches the explanation details. This new test makes assumptions on the description of an explanation that has multi sub-details, which might be controversial: the description is assumed to comply with one of: end with "product of:" end with "sum of:" end with "sum of:" contain "max plus <x> times others" (where <x> is float) While this seems to cover all the current scoring combinations, future scorers might not fit in. We can always expand this list of course. With this check, TestSimpleExplanations fails for FilteredQuery - testFQ6(). Fix included. Also failing is TestBoostingTermQuery, just because its explanation had no "product of:". Fix included. All tests pass.

            People

            • Assignee:
              doronc Doron Cohen
              Reporter:
              doronc Doron Cohen
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development