Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7151

Nested spanNear scoring error when inner clauses overlap positions

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 5.3.1, 5.5
    • Fix Version/s: None
    • Component/s: core/query/scoring
    • Labels:
    • Environment:

      Windows, Linux

    • Lucene Fields:
      New

      Description

      For spanNear([spanNear([contents:word1, contents:word3], 2, true), spanNear([contents:word2, contents:word3], 2, true)], 2, false)

      Scores for the following two documents should be the same but are not.
      doc1: [----- word1 word3 ----- word2 word3 ----- word1 word2 word3 -----]
      doc2: [----- word2 word3 ----- word1 word3 ----- word1 word2 word3 -----]

      The positions of the inner clauses effect the scoring for the of the final 3-term phrase. This appears to be a side-effect of the span-scoring rewrite in 5.2.

      NearSpansUnordered's SpansCell.adjustMax() uses end-position values to decide maxEndPositionCell while the SpanPositionQueue uses start-position and end-position values to sort the SpanCells. This means that maxEndPositionCell will be incorrectly set or not set depending on previous positions.

      I can provide example code illustrating the score error.

        Attachments

        1. SpanScore5Bug.java
          5 kB
          David Wendt

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              davwendt David Wendt
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: