Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-10229

Match offsets should be consistent for fields with positions and fields with offsets

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 9.2
    • None
    • None
    • New

    Description

      This is a follow-up of LUCENE-10223 in which it was discovered that fields with
      offsets don't highlight some more complex interval queries properly. Alan says:

      It's because it returns the position of the inner match, but the offsets of the outer. And so if you're re-analyzing and retrieving offsets by looking at the positions, you get the 'right' thing. It's not obvious to me what the correct response is here, but thinking about it the current behaviour is kind of the worst of both worlds, and perhaps we should change it so that you get offsets of the inner match as standard, and then the outer match is returned as part of the sub matches.

      Intervals are nicely separated into "basic intervals" and "filters" which restrict some other source of intervals, here is the original documentation:

      https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/intervals/package-info.java#L29-L50

      My experience from an extended period of using interval queries in a frontend where they're highlighted is that filters are restrictions that should not be highlighted - it's the source intervals that people care about. Filters are what you remove or where you give proper context to source intervals.

      The test code contributed in LUCENE-10223 contains numerous query-highlight examples (on fields with positions) where this intuition is demonstrated on all kinds of interval functions:

      https://github.com/apache/lucene/blob/main/lucene/highlighter/src/test/org/apache/lucene/search/matchhighlight/TestMatchHighlighter.java#L335-L542

      This issue is about making the internals work consistently for fields with positions and fields with offsets.

      Attachments

        Issue Links

          Activity

            People

              dweiss Dawid Weiss
              dweiss Dawid Weiss
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h 50m
                  3h 50m