A new plugin which can be placed in front of <tokenizer/>.
<charFilter/> can be multiple (chained). I'll post a JPEG file to show character normalization sample soon.
In Japan, there are two types of tokenizers – N-gram (CJKTokenizer) and Morphological Analyzer.
When we use morphological analyzer, because the analyzer uses Japanese dictionary to detect terms,
we need to normalize characters before tokenization.
I'll post a patch soon, too.