[LUCENE-1224] NGramTokenFilter creates bad TokenStream - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 4.3
Component/s: modules/analysis
Labels:
None

Lucene Fields:

New, Patch Available

Description

With current trunk NGramTokenFilter(min=2,max=4) , I index "abcdef" string into an index, but I can't query it with "abc". If I query with "ab", I can get a hit result.

The reason is that the NGramTokenFilter generates badly ordered TokenStream. Query is based on the Token order in the TokenStream, that how stemming or phrase should be anlayzed is based on the order (Token.positionIncrement).

With current filter, query string "abc" is tokenized to : ab bc abc
meaning "query a string that has ab bc abc in this order".
Expected filter will generate : ab abc(positionIncrement=0) bc
meaning "query a string that has (ab|abc) bc in this order"

I'd like to submit a patch for this issue.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-1224.patch
16/Mar/08 14:46
9 kB
Hiroaki Kawai
NGramTokenFilter.patch
13/Mar/08 10:54
1 kB
Hiroaki Kawai
NGramTokenFilter.patch
12/Mar/08 09:30
1 kB
Hiroaki Kawai

Issue Links

is related to

LUCENE-1225 NGramTokenizer creates bad TokenStream

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Hiroaki Kawai

Votes:: 2 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 12/Mar/08 09:30

Updated:: 28/Aug/22 11:47

Resolved:: 26/Apr/13 22:20