First draft of a patch that updates the various TokenFilters, etc. in Solr to use the new Lucene reuse API. Notes on implementation below:
Also, cleans up some of the javadocs in various files
Added Test for the Porter stemmer.
Cleaned up some string literals to be constants so that they can be safely referred to in the tests.
In the PatternTokenFilter, it would be cool if there was a way to just operate on the char array, but I don't see that the Pattern/Matcher API supports it.
Same goes for PhoneticTokenFilter
I'm not sure yet if the BufferedTokenStream can take advantage of reuse, so I have left them alone for now, other than some minor doc fixes. I will think about this some more.
In RemoveDuplicatesTF, I only converted to using termBuffer, not Token reuse. I removed the "IN" and "OUT" loop labels, as I don't see what functionality they provide.
Added ArraysUtils class and test to provide a bit more functionality than Arrays.java offers in terms of comparing two char arrays. This could be expanded at some point to cover other primitive comparisons.
My understanding of the new reusableTokenStream means we can't use it in the SolrAnalyzer
On the TrimFilter, it is not clear to me that there would be a token that is ever all whitespace. However, since the test handles it, I wonder why the a Token of " ", when update offsets are on, reports the offsets as the end and not the start. Just a minor nit, but it seems like the start/end offsets should be 0, not the end of the token.
I'm not totally sure on the WordDelimiterFilter, as there is a fair amount of new token creation, Also, I think, the newTok() method doesn't set the position increment based on the original position increment, so I added that.
I'm also not completely sure how to handle FieldType DefaultAnalyzer.next(). It seems like it could reuse the token
Also not sure why the duplicate code for the MultiValueTokenStream in HighlighterUtils and SolrHighlighter, so I left the highlighter TokenStreams alone.