-
Type:
Bug
-
Status: Closed
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: None
-
Component/s: modules/analysis
-
Labels:None
-
Lucene Fields:New, Patch Available
StandardTokenizer currently only supports the BMP.
If it encounters characters outside of the BMP, it just discards them...
it should instead implement fully implement UAX#29 across all of unicode.
- is part of
-
LUCENE-1689 supplementary character handling
-
- Resolved
-