Elaborating on the description:
This patch includes a tweak to the TokenLL array size initialization to consider this new limit when guessing a good size.
This patch includes memory saving optimizations to the information it accumulates. Before the patch, each TokenLL had a char, so there were a total of 2 objects per token (including the token itself). Now I use a shared CharsRefBuilder with a pointer & length into it, so there's just 1 object now, plus byte savings by avoiding a char header. I also reduced the bytes needed for a TokenLL instance from 40 to 32. It does assume that the char offset delta (endOffset - startOffset) can fit within a short, which seems like a reasonable assumption to me. For safety I guard against overflow and substitute Short.MAX_VALUE.
Finally, to encourage users to supply a limit (even if "-1" to mean no limit), I decided to deprecate many of the methods in TokenSources for new ones that include a limit parameter. But for those methods that fall-back to a provided Analyzer, I have to wonder now if it makes sense for these methods to filter the analyzers. I think it does – if you want to limit the tokens then it shouldn't matter where you got them from – you want to limit them. I haven't added that but I'm looking for feedback first.