I agree on the caching thing – that is, what I said in which you ask for Terms for the same document again. Never-mind that part – as I thought about it I realized I didn't need that after all.
But i dont think it should be in the default codec. I also happen to think term vectors arent a good datastructure for highlighting anyway.
The default highlighter fully respects the positions and other aspects of the user's query, unlike the other highlighters. Some applications demand that a highlight is accurate to the query, even if the query uses custom span queries that do tricks with payloads, etc. It would be nice if the other highlighters supported accurate highlights for such queries but they don't, so today, this is the applicable one for accurate highlights for complex queries. The default highlighter requires a Terms instance reflecting the current document – it currently gets it via a re-inverting into a MemoryIndex but it can be hacked to accept a Terms directly from term vectors.
So you don't like the idea of enhancing performance of term vector seekCeil in the default codec? Is that a -1 or -0? This change I propose seems harmless – the code would not create & build up the new offset array if consuming code doesn't call seekCeil or the ord methods.