have you looked at TeeSinkTokenFilter
Yes, and from my current understanding, it is similar to our current implementation. The problem with this approach is that the exchange of attributes is performed using the AttributeSource.State API with AttributeSource#captureState and AttributeSource#restoreState, which copies the values of all attribute implementations that the state contains, and this is very inefficient as it has to copies arrays and other objects (e.g., char term arrays, etc.) for every single token.
Concerning the problem of UOEs, the new patch of Steve reduces the number of UOEs to one only, which is much more reasonable than my first approach. I have looked at the current state of the Lucene trunk, and there are already a lot of UOEs in many places. So, I would suggest that this problem may not be a blocking one (but I might be wrong).
Concerning the problem of constructor explosion, maybe we can find a consensus. Your proposition of removing Tokenizer(AttributeSource) cannot work for us, as we need it to share a same AttributeSource across multiple streams. However, as I proposed, removing the Tokenizer(AttributeFactory) could work as it could be emulated by using Tokenizer(AttributeSource).