Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
None
-
None
-
None
-
New, Patch Available
Description
Current NGramTokenizer can't handle character stream that is longer than 1024. This is too short for non-whitespace-separated languages.
I created a patch for this issues.
Attachments
Attachments
Issue Links
- incorporates
-
LUCENE-1225 NGramTokenizer creates bad TokenStream
- Resolved