Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8633

Remove term weighting from interval scoring

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 8.0, master (9.0)
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      IntervalScorer currently uses the same scoring mechanism as SpanScorer, summing the IDF of all possibly matching terms from its parent IntervalsSource and using that in conjunction with a sloppy frequency to produce a similarity-based score. This doesn't really make sense, however, as it means that terms that don't appear in a document can still contribute to the score, and appears to make scores from interval queries comparable with scores from term or phrase queries when they really aren't.

      I'd like to explore a different scoring mechanism for intervals, based purely on sloppy frequency and ignoring term weighting. This should make the scores easier to reason about, as well as making them useful for things like proximity boosting on boolean queries.

        Attachments

        1. LUCENE-8633.patch
          37 kB
          Alan Woodward
        2. LUCENE-8633.patch
          27 kB
          Alan Woodward

          Activity

            People

            • Assignee:
              romseygeek Alan Woodward
              Reporter:
              romseygeek Alan Woodward
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: