Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
4.3
-
None
-
New
Description
NGramTokenFilter increments positions for each gram rather for the actual token which can lead to rather funny problems especially with highlighting. if this filter should be used for highlighting is a different story but today this seems to be a common practice in many situations to highlight sub-term matches.
I have a test for highlighting that uses ngram failing with a StringIOOB since tokens are sorted by position which causes offsets to be mixed up due to ngram token filter.
Attachments
Attachments
Issue Links
- is related to
-
LUCENE-3920 ngram tokenizer/filters create nonsense offsets if followed by a word combiner
- Resolved
- relates to
-
LUCENE-3920 ngram tokenizer/filters create nonsense offsets if followed by a word combiner
- Resolved