I propose that TokenGroup's fields become private and Highlighter access them via it's getters – the ones it already has, actually, no need for more.
This begs the question if the distinction of a "matchStartOffset" vs. "startOffset" (and "end" variants) serves any purpose. That is, toss startOffset (& endOffset) then rename matchStartOffset (& matchEndOffset) to startOffset (& endOffset). They aren't used, and I doubt others are because I think the offset info, when needed, is accessed at the end via TextFragment (populated from TokenGroup.matchStartOffset & matchEndOffset). FYI I didn't go that route because I want all matches and I found the custom Formatter approach to be more appealing than passing a very large numFragments, from an efficiency standpoint.
Unrelated questions about Highlighter
Not directly related to this is a couple burning questions I have in Highlighter:
- Why oh why does Highlighter call formatter.highlightTerm for essentially every token? If TokenGroup.getTotalScore() is 0, I argue it shouldn't. All the built-in Fragmenters (and one I just wrote) start with a zero score short-circuit.
- Why does a 0-score fragment remains a part of the fragments priority queue; why it isn't tossed out when the fragment closes out? One might argue it's needless when numFragments is small, which is the size of the PQ but it'd be nice to ask for 'all' fragments/matches without a huge PQ even if there is just one real match.
- Why is all text run through the encoder and appended to a "newText" StringBuilder, even when the fragment has no score? If there's no point then it's a waste to do it and then not use it as it won't be a part of a returned fragment. Again, I think 0-score fragments should be immediately dropped, and newText should only be for the current fragment.