Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
New
Description
If you are using term vectors for fast vector highlighting along with a multivalue field while matching a phrase that crosses two elements, then it will not properly highlight even though it properly finds the correct values to highlight.
A good example of this is when matching source code, where you might have lines like:
one two three five two three four five six five six seven eight nine eight nine eight nine eight nine eight nine eight nine eight nine ten eleven twelve thirteen
Matching the phrase "four five" will return
two three four five six five six seven eight nine eight nine eight nine eight nine eight eight nine ten eleven
However, it does not properly highlight "four" (on the first line) and "five" (on the second line) and it is returning too many lines, but not all of them.
The problem lies in the BaseFragmentsBuilder at line 269 because it is not checking for cross-coverage. Here is a possible solution:
boolean started = toffs.getStartOffset() >= fieldStart; boolean ended = toffs.getEndOffset() <= fieldEnd; // existing behavior: if (started && ended) { toffsList.add(toffs); toffsIterator.remove(); } else if (started) { toffsList.add(new Toffs(toffs.getStartOffset(), field.end)); // toffsIterator.remove(); // is this necessary? } else if (ended) { toffsList.add(new Toffs(fieldStart, toff.getEndOffset())); // toffsIterator.remove(); // is this necessary? } else if (toffs.getEndOffset() > fieldEnd) { // ie the toff spans whole field toffsList.add(new Toffs(fieldStart, fieldEnd)); // toffsIterator.remove(); // is this necessary? }