Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
New, Patch Available
Description
CharTokenizer is an abstract base class for all Tokenizers operating on a character level. Yet, those tokenizers still use char primitives instead of int codepoints. CharTokenizer should operate on codepoints and preserve bw compatibility.
Attachments
Attachments
Issue Links
- depends upon
-
LUCENE-2188 A handy utility class for tracking deprecated overridden methods
- Closed
- is part of
-
LUCENE-1689 supplementary character handling
- Resolved
- relates to
-
LUCENE-2240 SimpleAnalyzer and WhitespaceAnalyzer should have Version ctors
- Closed