[LUCENE-1227] NGramTokenizer to handle more than 1024 chars - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: None
Component/s: modules/analysis
Labels:
None

Lucene Fields:

New, Patch Available

Description

Current NGramTokenizer can't handle character stream that is longer than 1024. This is too short for non-whitespace-separated languages.

I created a patch for this issues.

Attachments

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-1227.patch
16/Mar/08 14:59
12 kB
Hiroaki Kawai
NGramTokenizer.patch
13/Mar/08 15:44
3 kB
Hiroaki Kawai
NGramTokenizer.patch
13/Mar/08 08:23
3 kB
Hiroaki Kawai

Issue Links

incorporates

LUCENE-1225 NGramTokenizer creates bad TokenStream

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Hiroaki Kawai

Votes:: 3 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 13/Mar/08 08:22

Updated:: 28/Aug/22 11:48

Resolved:: 26/Apr/13 22:11