Just to conclude:
This bug is not so serious as it appears (else someone would have noticed before), as it would never happen on 0-8-15 TokenStreams, when used like IndexWriter does.
This bug only appears if you have TokenFilters and you add Attributes on the top level Filter later (after using the TokenStream for first time). Using the TokenStream means that you calculate the states and so every Filter/Tokenizer got his own cached state. Adding them a new Attribute on the last filter will never invalidate the cache of the Tokenizer.
This bug could affect:
- Analyzers that reuse TokenStreams partly and plug filters on top in the reuseableTokenStream() method, reusing the partially cached tokenstream. Like those, that always add a non-cacheable TokenFilter on top of a base TS.
- TokenStreams that add attributes on the-fly in one of their filters.
We should backport this patch to 3.x, 3.1.1 and maybe even 2.9.x and 3.0.x branches (if somebody wants to patch 3.0). In general this is a serious issue of the new TokenStream API since 2.9.