Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.1.0 (incubating), 0.2.0 (incubating), 0.2.1 (incubating)
-
None
-
None
Description
ORQuery, ANDQuery, and RequiredOptionalQuery all have optimizations which kick
in when only one child Query can match: they all compile down to the inner
Matcher.
In the case of ORQuery and RequiredOptionalQuery, this optimization can kick
in per-segment, resulting in an ORMatcher/RequiredOptionalMatcher for some
segments and e.g. a child TermMatcher for others. This skews scoring because
coord() affects the ORMatcher/RequiredOptionalMatcher, but not the TermMatcher
– the ORMatcher/RequiredOptionalMatcher damps the score of the matching term
by a coord() multiplier which is typically less than 1.0, but the TermMatcher
contributes 100% of its score. The punchline is that two documents in
different segments which present identical match criteria can produce
different scores, depending on whether terms not present in the document are
represented in the segment.
In addition, ORQuery may compile down to a smaller ORMatcher when
e.g. 3 out of 5 OR'd terms are present. This skews scoring for similar
reasons.
To present consistent scoring across all segments, Queries should always
compile down to the same Matcher node structore for each segment. By the time
you are compiling per-segment Matchers, it is too late to re-calculate the
weighting, so you can't optimize the Matcher structure when you find that e.g.
one of two terms doesn't exist in a given segment.
In addition, when compiling down to a single child Matcher, ORQuery, ANDQuery
and RequiredOptionalQuery all discard custom boosts. This is solvable by
moving the optimization from Compiler_Make_Matcher() up into
Query_Make_Compiler().