Lucene 4.10.3 correctly tokenize a syllable into one token. However in Lucune 5.5.0 it end up being two tokens which is incorrect. Please let me know segmentation rules are implemented by native speakers of a particular language? In this particular example, it is M-y-a-n-m-a-r language. I have understood that L-a-o, K-m-e-r and M-y-a-n-m-a-r fall into ICU category. Thanks a lot.