The issue here is mostly that we need to create a new TokenStream (StringTokenStream) for the normalization and we need to use the same attribute types.
Exactly. For instance if a term attribute produces utf-16 encoded tokens,
Although this is sometimes broken for use-cases, where TokenStreams create binary tokens. But those would never be normalized, I think (!?)
Do you mean that you cannot think of any use-case for using both a non-default term attribute and token filters in the same analysis chain? I am wondering about CJK analyzers since I think UTF16 stores CJK characters a bit more efficiently on average than UTF8 (I may be completely wrong, in which case please let me know) so users might be tempted to use a different term attribute impl?