Lucy
  1. Lucy
  2. LUCY-180

ORQuery, ANDQuery, RequiredOptionalQuery optimizations affect scoring

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.1.0 (incubating), 0.2.0 (incubating), 0.2.1 (incubating)
    • Component/s: None
    • Labels:
      None

      Description

      ORQuery, ANDQuery, and RequiredOptionalQuery all have optimizations which kick
      in when only one child Query can match: they all compile down to the inner
      Matcher.

      In the case of ORQuery and RequiredOptionalQuery, this optimization can kick
      in per-segment, resulting in an ORMatcher/RequiredOptionalMatcher for some
      segments and e.g. a child TermMatcher for others. This skews scoring because
      coord() affects the ORMatcher/RequiredOptionalMatcher, but not the TermMatcher
      – the ORMatcher/RequiredOptionalMatcher damps the score of the matching term
      by a coord() multiplier which is typically less than 1.0, but the TermMatcher
      contributes 100% of its score. The punchline is that two documents in
      different segments which present identical match criteria can produce
      different scores, depending on whether terms not present in the document are
      represented in the segment.

      In addition, ORQuery may compile down to a smaller ORMatcher when
      e.g. 3 out of 5 OR'd terms are present. This skews scoring for similar
      reasons.

      To present consistent scoring across all segments, Queries should always
      compile down to the same Matcher node structore for each segment. By the time
      you are compiling per-segment Matchers, it is too late to re-calculate the
      weighting, so you can't optimize the Matcher structure when you find that e.g.
      one of two terms doesn't exist in a given segment.

      In addition, when compiling down to a single child Matcher, ORQuery, ANDQuery
      and RequiredOptionalQuery all discard custom boosts. This is solvable by
      moving the optimization from Compiler_Make_Matcher() up into
      Query_Make_Compiler().

      1. LUCY-180-minimal.patch
        4 kB
        Marvin Humphrey
      2. LUCY-180.patch
        9 kB
        Marvin Humphrey

        Activity

        Hide
        Marvin Humphrey added a comment -

        Issue description edited:

        Thankfully, a closer look at ANDQuery, ORQuery, and RequiredOptionalQuery has
        revealed that while they do not pass down custom boosts when compiling down to
        an "only child" Matcher, the boosts have not been discarded. Instead, the
        boosts have been propagated down into all child Compiler objects during the
        weighting phase.

        (By multiplying e.g. ANDQuery's boost into its children during the weighting
        phase, it frees ANDMatcher from the need to multiply the boost into the score
        for each document.)

        Show
        Marvin Humphrey added a comment - Issue description edited: Thankfully, a closer look at ANDQuery, ORQuery, and RequiredOptionalQuery has revealed that while they do not pass down custom boosts when compiling down to an "only child" Matcher, the boosts have not been discarded. Instead, the boosts have been propagated down into all child Compiler objects during the weighting phase. (By multiplying e.g. ANDQuery's boost into its children during the weighting phase, it frees ANDMatcher from the need to multiply the boost into the score for each document.)
        Hide
        Marvin Humphrey added a comment -

        The attached file, LUCY-180-minimal.patch, makes minimally invasive changes to
        disable the "only child" optimizations in ORQuery.c and
        RequiredOptionalQuery.c.

        It should not be necessary to modify ANDQuery.c.

        • If no clauses match, ANDCompiler_Make_Matcher() returns NULL.
        • If a single child matches, ANDCompiler_Make_Matcher() returns the "only
          child" submatcher, but its scoring behavior is exactly equivalent to
          what it would be wrapped in ANDMatcher_Score(), since boost is already
          factored in and Sim_Coord() will be 1.0.
        Show
        Marvin Humphrey added a comment - The attached file, LUCY-180 -minimal.patch, makes minimally invasive changes to disable the "only child" optimizations in ORQuery.c and RequiredOptionalQuery.c. It should not be necessary to modify ANDQuery.c. If no clauses match, ANDCompiler_Make_Matcher() returns NULL. If a single child matches, ANDCompiler_Make_Matcher() returns the "only child" submatcher, but its scoring behavior is exactly equivalent to what it would be wrapped in ANDMatcher_Score(), since boost is already factored in and Sim_Coord() will be 1.0.
        Hide
        Marvin Humphrey added a comment -

        Here is an improved patch, which keeps the optimizations whenever possible.

        • When an ORQuery only has one clause, the child matcher will be handed
          down unwrapped.
        • However, when an ORQuery has multiple clauses but only one clause
          matches in a segment, an ORMatcher will be handed down to keep scoring
          consistent across all segments.
        • A RequiredOptionalQuery will always produce either a
          RequiredOptionalMatcher or NULL. It will no longer return the unwrapped
          required child matcher when the optional clause cannot match.
        • A missing coord multiplier in RequiredOptionalMatcher has been fixed.
        • ORMatcher and RequiredOptionalMatcher have been hardened so that they
          can accept NULL child matchers.
        Show
        Marvin Humphrey added a comment - Here is an improved patch, which keeps the optimizations whenever possible. When an ORQuery only has one clause, the child matcher will be handed down unwrapped. However, when an ORQuery has multiple clauses but only one clause matches in a segment, an ORMatcher will be handed down to keep scoring consistent across all segments. A RequiredOptionalQuery will always produce either a RequiredOptionalMatcher or NULL. It will no longer return the unwrapped required child matcher when the optional clause cannot match. A missing coord multiplier in RequiredOptionalMatcher has been fixed. ORMatcher and RequiredOptionalMatcher have been hardened so that they can accept NULL child matchers.

          People

          • Assignee:
            Marvin Humphrey
            Reporter:
            Marvin Humphrey
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development