[LUCENE-3920] ngram tokenizer/filters create nonsense offsets if followed by a word combiner - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.6, 4.0-ALPHA
Fix Version/s: None
Component/s: None
Labels:
None

Lucene Fields:

New

Description

It seems like maybe its possibly applying the offsets from the wrong token?

Because after shingling, the resulting token has a startOffset thats after the endoffset.

Attachments

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-3920_test.patch
26/Mar/12 01:25
1 kB
Robert Muir

Issue Links

is related to

LUCENE-4955 NGramTokenFilter increments positions for each gram

Closed

LUCENE-4641 Fix analyzer bugs documented in TestRandomChains

Patch Available

relates to

LUCENE-4955 NGramTokenFilter increments positions for each gram

Closed

LUCENE-3907 Improve the Edge/NGramTokenizer/Filters

Closed

Activity

People

Assignee:: Adrien Grand

Reporter:: Robert Muir

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 26/Mar/12 01:24

Updated:: 28/Aug/22 13:12

Resolved:: 26/Apr/13 14:35