Currently SASI offers only two tokenizer options:
The latter is built upon Snowball, powerful for human languages but overkill for simple tokenization.
A simple tokenizer is proposed here. The need for this arose as a workaround of CASSANDRA-11182, and to avoid the disk usage explosion when having to resort to CONTAINS. See https://github.com/openzipkin/zipkin/issues/1861
Example use of this would be:
Original credit for this work goes to https://github.com/zuochangan