Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-6334

Fast Vector Highlighter does not properly span neighboring term offsets

    Details

    • Lucene Fields:
      New

      Description

      If you are using term vectors for fast vector highlighting along with a multivalue field while matching a phrase that crosses two elements, then it will not properly highlight even though it properly finds the correct values to highlight.

      A good example of this is when matching source code, where you might have lines like:

      one two three five
      two three four
      five six five
      six seven eight nine eight nine eight nine eight nine eight nine eight nine
      eight nine
      ten eleven
      twelve thirteen
      

      Matching the phrase "four five" will return

      two three four
      five six five
      six seven eight nine eight nine eight nine eight nine eight
      eight nine
      ten eleven
      

      However, it does not properly highlight "four" (on the first line) and "five" (on the second line) and it is returning too many lines, but not all of them.

      The problem lies in the BaseFragmentsBuilder at line 269 because it is not checking for cross-coverage. Here is a possible solution:

      boolean started = toffs.getStartOffset() >= fieldStart;
      boolean ended = toffs.getEndOffset() <= fieldEnd;
      
      // existing behavior:
      if (started && ended) {
          toffsList.add(toffs);
          toffsIterator.remove();
      }
      else if (started) {
          toffsList.add(new Toffs(toffs.getStartOffset(), field.end));
          // toffsIterator.remove(); // is this necessary?
      }
      else if (ended) {
          toffsList.add(new Toffs(fieldStart, toff.getEndOffset()));
          // toffsIterator.remove(); // is this necessary?
      }
      else if (toffs.getEndOffset() > fieldEnd) {
          // ie the toff spans whole field
          toffsList.add(new Toffs(fieldStart, fieldEnd));
          // toffsIterator.remove(); // is this necessary?
      }
      

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              pickypg Chris Earle
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: