I would say we do have all of the functionality of this patch +. I have not checked how well this handles all of the corner cases, but it looks like Mark H did a bit of that. I would say it currently offers no functional value though...but it may be faster than what we have for PhraseQuery's (it does not support Spans). The patch uses the offsets from the TokenStream for highlighting and just makes sure PhraseQuery's terms are next to each other (not sure how exact this emulates slop), so this can be rather fast on larger docs.
I analyzed all of the old Highlight code in JIRA when considering how best to do the SpanScorer, and passed on them for one reason or another. The main pass on this was the lack of Span support, loss of current highlighter features/api, pseudo duplicating Lucene phrase query searching in the Highlighter code. I think a solution that doesn't duplicate Query code is much cleaner.
So I don't think this is very useful in regards to the general Highlighter. The idea of using Token offset info to do the Highlighting was also tried in Ronnie's JIRA issue (though in that case it was done through TermVectors and not from the TokenStream), and while it proves to be faster on large documents, it doesn't appear easy to retain the speed when working with Spans, and it doesn't fit well with the old API.
Should we ditch the old API some day though, I have been playing around with this technique with my LargeDocHighlighter, and I still have hope that will go somewhere. I just don't see the old token scoring API being thrown away in the near future.