Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Duplicate
-
4.3.1
-
None
-
None
-
all
-
New
Description
The whitespace tokenizer supports only Java whitespace as defined in http://docs.oracle.com/javase/6/docs/api/java/lang/Character.html#isWhitespace(char)
A useful improvement would be to support also Unicode whitespace as defined in the Unicode property list http://www.unicode.org/Public/UCD/latest/ucd/PropList.txt
Attachments
Issue Links
- is duplicated by
-
LUCENE-6874 WhitespaceTokenizer should tokenize on NBSP
-
- Closed
-