Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-2880

SpanQuery scoring inconsistencies

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 5.3
    • None
    • None
    • New

    Description

      Spinoff of LUCENE-2879.

      You can see a full description there, but the gist is that SpanQuery sums up freqs with "sloppyFreq".
      However this slop is simply spans.end() - spans.start()

      For a SpanTermQuery for example, this means its scoring 0.5 for TF versus TermQuery's 1.0.
      As you can imagine, I think in practical situations this would make it difficult for SpanQuery users to
      really use SpanQueries for effective ranking, especially in combination with non-Spanqueries (maybe via DisjunctionMaxQuery, etc)

      The problem is more general than this simple example: for example SpanNearQuery should be consistent with PhraseQuery's slop.

      Attachments

        1. LUCENE-2880.patch
          13 kB
          Robert Muir
        2. LUCENE-2880.patch
          10 kB
          Adrien Grand

        Issue Links

          Activity

            People

              Unassigned Unassigned
              rcmuir Robert Muir
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: