Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Won't Fix
-
None
-
None
-
None
-
Operating System: All
Platform: All
-
27182
Description
Unlike other languages, Thai do not have a clear word boundary within a
sentence. Words are written consecutively without a delimiter. The Lucene
StandardTokenizer currently cannot tokenize a Thai sentence and returns the
whole sentence as a token. A special tokenizer to break Thai sentences into
words is required.