The default highlighter has a TokenGroup class that is passed to Formatter.highlightTerm(). TokenGroup also has getStartOffset() and getEndOffset() methods that ostensibly return the start and end offsets into the original text of the current term. These getters aren't called by Lucene or Solr but they are made available and are useful to me. The problem is that they return the wrong offsets when there are tokens at the same position. I believe this was an oversight of
LUCENE-627 in which these getters should have been updated but weren't. The fix is simple: return matchStartOffset and matchEndOffset from these getters, not startOffset and endOffset. I think this oversight would not have occurred if Highlighter didn't have package-access to TokenGroup's fields.