[LUCENE-4810] Positions are incremented for each ngram in EdgeNGramTokenFilter - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 4.3, 6.0
Component/s: modules/analysis
Labels:
None

Lucene Fields:

New

Description

Edge ngrams should be like synonyms, with all the ngrams generated from a token having the same position as that original token. The current code increments position.

For the text "molecular biology", the query "mol bio" should match as a phrase in neighboring positions. It does not.

You can see this in the Analysis page in the admin UI.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-4810.diff
02/Mar/13 00:52
2 kB
Walter Underwood
LUCENE-4810.patch
05/Mar/13 17:26
6 kB
Michael McCandless
LUCENE-4810.patch
05/Mar/13 16:31
4 kB
Michael McCandless
LUCENE-4810-first-token-position-increment.patch
22/Apr/13 05:59
4 kB
Steven Rowe

Issue Links

relates to

LUCENE-3907 Improve the Edge/NGramTokenizer/Filters

Closed

Activity

People

Assignee:: Michael McCandless

Reporter:: Walter Underwood

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 02/Mar/13 00:46

Updated:: 28/Aug/22 13:40

Resolved:: 22/Apr/13 14:49