Lucene - Core
  1. Lucene - Core
  2. LUCENE-1302

explain should not mask negative scores

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Invalid
    • Affects Version/s: None
    • Fix Version/s: 2.4
    • Component/s: core/query/scoring
    • Labels:
      None
    • Lucene Fields:
      Patch Available

      Description

      Explanation.isMatch() returns false for 0 or negative scores.
      Hence negative scores are omitted from the explanation.
      This causes, when using e.g. BoostingTermQuery with negative boosts, a difference between the collected doc score and the score shown by explain().

      A word on the usage of this - BTQ with negative boosts is useful for "punishing" documents for containing a term. It also allows all sorts of tricks with multiplying query boost by the BTQ boost, so you get a positive score if both boosts have the same sign but negative otherwise. - I am sure there other uses as well.

        Activity

        Hide
        Doron Cohen added a comment -

        Okay I am convinced, thanks for bearing with me Hoss and Paul.

        So I am closing this issue as "Invalid".

        Just to summarize: "A match is a match is a match" (quoting Hoss) seem the key concept:
        I always thought of "match" as a saying (by the query) - "that document is relevant for this query".
        Seems to me now that a more accurate interpretation is: "that document is related to this query",
        where positively related would be relevant and negatively related would be the opposite.

        Show
        Doron Cohen added a comment - Okay I am convinced, thanks for bearing with me Hoss and Paul. So I am closing this issue as "Invalid". Just to summarize: "A match is a match is a match" (quoting Hoss) seem the key concept: I always thought of "match" as a saying (by the query) - "that document is relevant for this query". Seems to me now that a more accurate interpretation is: "that document is related to this query", where positively related would be relevant and negatively related would be the opposite.
        Hide
        Hoss Man added a comment -

        minNrShouldMatch is all about matching, and a match is a match is a match ... it doesn't matter if the score is negative or not.

        if a sub query doesn't want to count as a match, it shouldn't match.

        having the notion of a wrapper query like BQ that cares about a "minScore" from subqueries in order for it to count a doc as a match would be something new.

        Show
        Hoss Man added a comment - minNrShouldMatch is all about matching, and a match is a match is a match ... it doesn't matter if the score is negative or not. if a sub query doesn't want to count as a match, it shouldn't match. having the notion of a wrapper query like BQ that cares about a "minScore" from subqueries in order for it to count a doc as a match would be something new.
        Hide
        Paul Elschot added a comment - - edited

        > Is it somewhat strange for BQ to count a negative score contributer for its minNrShouldMatch condition?

        The count of the number of matchers in BooleanQuery depends on the return values of next() and skipTo() only. It could be made to ignore scorers with a negative score value, but that would mean another option to BooleanQuery.

        Show
        Paul Elschot added a comment - - edited > Is it somewhat strange for BQ to count a negative score contributer for its minNrShouldMatch condition? The count of the number of matchers in BooleanQuery depends on the return values of next() and skipTo() only. It could be made to ignore scorers with a negative score value, but that would mean another option to BooleanQuery.
        Hide
        Doron Cohen added a comment -

        I can live with BTQ printing "match" although it actually just hurt a certain doc.

        Is it somewhat strange for BQ to count a negative score contributer for its minNrShouldMatch condition?

        Show
        Doron Cohen added a comment - I can live with BTQ printing "match" although it actually just hurt a certain doc. Is it somewhat strange for BQ to count a negative score contributer for its minNrShouldMatch condition?
        Hide
        Hoss Man added a comment -

        FWIW: Paul's comment is exactly what i was trying to get at.

        The only reason Explaination.isMatch returns true for positive scores without requiring that the match state be specified explicitly was is so it would be backwards compatible for older legacy query types (possibly created by lucene users) when the match/score semantics where clarified in order to allow/support negative scores for matching documents.

        Show
        Hoss Man added a comment - FWIW: Paul's comment is exactly what i was trying to get at. The only reason Explaination.isMatch returns true for positive scores without requiring that the match state be specified explicitly was is so it would be backwards compatible for older legacy query types (possibly created by lucene users) when the match/score semantics where clarified in order to allow/support negative scores for matching documents.
        Hide
        Paul Elschot added a comment -

        I'd like to have Explanation.isMatch() returning true mean that Scorer.next() or Scorer.skipTo() returned true for the explained document score, and to have the Scorer.score() value completely independent of that.

        The javadocs in of the Scorers in the core trunk are consistent about using 'match' this way.

        Also, I think there are currently no more backward compatibility issues for this.

        Show
        Paul Elschot added a comment - I'd like to have Explanation.isMatch() returning true mean that Scorer.next() or Scorer.skipTo() returned true for the explained document score, and to have the Scorer.score() value completely independent of that. The javadocs in of the Scorers in the core trunk are consistent about using 'match' this way. Also, I think there are currently no more backward compatibility issues for this.
        Hide
        Hoss Man added a comment -

        The one thing that bothers me still is that the BTQ sub-expl would print as MATCH when in fact it just hurts the current doc score (apparently "not enough"). Would it be more correct for the (negative) BTQ part to say "NON-MATCH"?

        Bottom Line: if "next()" stops on a doc, then the Explanation for that doc should indicate a "MATCH" ... regardless of what the score is.

        the the Explanation by BTQ indicates it's a "NON-MACH" even thought the BTQ Scorer would return it as a "hit" when searching, that would screw up the Explanation generated by a BooleanQuery. Consider a 3 clause BooleanQuery with minNrShouldMath set to 2. one of the clauses only matches doc#42, one of the clauses matches no docs, and the third clause is a BTQ – if the Explanation for the BTQ says NO_MATCH for doc#42 just because it has a negative score then the BooleanQuery Explanation is going to say the wrong thing.

        Show
        Hoss Man added a comment - The one thing that bothers me still is that the BTQ sub-expl would print as MATCH when in fact it just hurts the current doc score (apparently "not enough"). Would it be more correct for the (negative) BTQ part to say "NON-MATCH"? Bottom Line: if "next()" stops on a doc, then the Explanation for that doc should indicate a "MATCH" ... regardless of what the score is. the the Explanation by BTQ indicates it's a "NON-MACH" even thought the BTQ Scorer would return it as a "hit" when searching, that would screw up the Explanation generated by a BooleanQuery. Consider a 3 clause BooleanQuery with minNrShouldMath set to 2. one of the clauses only matches doc#42, one of the clauses matches no docs, and the third clause is a BTQ – if the Explanation for the BTQ says NO_MATCH for doc#42 just because it has a negative score then the BooleanQuery Explanation is going to say the wrong thing.
        Hide
        Doron Cohen added a comment -

        Hoss, thanks for the pointer to LUCENE-605.

        :: Still it would disturb to declare a "negative score" as a "match".

        ... no idea what you ment there.

        Assume a boolean query that contains a BTQ, among other things.
        The score of a doc is a sum over several scores, all elements are positive, except the BTQ element which is negative.
        The total sum is positive and hence (say) was accepted by the collector in effect.
        As the code is right now, the explanation for the boolean query would ignore the (negative) BTQ part and its value would differ that of the actual search.

        Fixing BTQ to return a complex explanation and calling setMatch(True) will fix the score difference, and now the BQ explanation would also contain the sub-expl of the BTQ. Great. This also makes perfect sense in BTQ, because the way the score is computed, it can be negative or even 0. I'll open a separate issue for fixing BTQ.

        The one thing that bothers me still is that the BTQ sub-expl would print as MATCH when in fact it just hurts the current doc score (apparently "not enough"). Would it be more correct for the (negative) BTQ part to say "NON-MATCH"?

        Show
        Doron Cohen added a comment - Hoss, thanks for the pointer to LUCENE-605 . :: Still it would disturb to declare a "negative score" as a "match". ... no idea what you ment there. Assume a boolean query that contains a BTQ, among other things. The score of a doc is a sum over several scores, all elements are positive, except the BTQ element which is negative. The total sum is positive and hence (say) was accepted by the collector in effect. As the code is right now, the explanation for the boolean query would ignore the (negative) BTQ part and its value would differ that of the actual search. Fixing BTQ to return a complex explanation and calling setMatch(True) will fix the score difference, and now the BQ explanation would also contain the sub-expl of the BTQ. Great. This also makes perfect sense in BTQ, because the way the score is computed, it can be negative or even 0. I'll open a separate issue for fixing BTQ. The one thing that bothers me still is that the BTQ sub-expl would print as MATCH when in fact it just hurts the current doc score (apparently "not enough"). Would it be more correct for the (negative) BTQ part to say "NON-MATCH"?
        Hide
        Hoss Man added a comment -

        Doron: i haven't looked at your patch, but Explanation.isMatch returns false for negative scores to be backwards compatible (see LUCENE-605). Any non-trivial query types where a match can get a non-positive score should be using a ComplexExplanation.

        these comments don't make sense to me...

        Another option is for BTQ to return a ComplexExplanation instead of Explanation and setMatch(true) if the score is non-zero.

        ...BTQ should call setMatch(true) if it's a "match" (ie: if the doc would be passed to a HItCollectors collect method) not conditionally based on the score.

        Still it would disturb to declare a "negative score" as a "match".

        ... no idea what you ment there.

        Show
        Hoss Man added a comment - Doron: i haven't looked at your patch, but Explanation.isMatch returns false for negative scores to be backwards compatible (see LUCENE-605 ). Any non-trivial query types where a match can get a non-positive score should be using a ComplexExplanation. these comments don't make sense to me... Another option is for BTQ to return a ComplexExplanation instead of Explanation and setMatch(true) if the score is non-zero. ...BTQ should call setMatch(true) if it's a "match" (ie: if the doc would be passed to a HItCollectors collect method) not conditionally based on the score. Still it would disturb to declare a "negative score" as a "match". ... no idea what you ment there.
        Hide
        Doron Cohen added a comment -

        Another option is for BTQ to return a ComplexExplanation instead of Explanation and setMatch(true) if the score is non-zero.
        Still it would disturb to declare a "negative score" as a "match".

        Show
        Doron Cohen added a comment - Another option is for BTQ to return a ComplexExplanation instead of Explanation and setMatch(true) if the score is non-zero. Still it would disturb to declare a "negative score" as a "match".
        Hide
        Doron Cohen added a comment -

        With this patch isMatch() returns false only for score == zero .
        All tests pass.
        It does disturb though to announce a negative score as a "match".
        So, while this patch easily makes things work with negative scores, a better fix may be to leave isMatch() as is, but modify calls to it that are meant to decide if the questioned explanation should be included in the constructed explanation by a test of the score, or perhaps a specific new method e.g. isInformative().

        Show
        Doron Cohen added a comment - With this patch isMatch() returns false only for score == zero . All tests pass. It does disturb though to announce a negative score as a "match". So, while this patch easily makes things work with negative scores, a better fix may be to leave isMatch() as is, but modify calls to it that are meant to decide if the questioned explanation should be included in the constructed explanation by a test of the score, or perhaps a specific new method e.g. isInformative().

          People

          • Assignee:
            Doron Cohen
            Reporter:
            Doron Cohen
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development