[LUCENE-2035] TokenSources.getTokenStream() does not assign positionIncrement - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 2.4, 2.4.1, 2.9
Fix Version/s: 3.1, 4.0-ALPHA
Component/s: modules/highlighter
Labels:
None

Lucene Fields:

New

Description

TokenSources.StoredTokenStream does not assign positionIncrement information. This means that all tokens in the stream are considered adjacent. This has implications for the phrase highlighting in QueryScorer when using non-contiguous tokens.

For example:
Consider a token stream that creates tokens for both the stemmed and unstemmed version of each word - the fox (jump|jumped)
When retrieved from the index using TokenSources.getTokenStream(tpv,false), the token stream will be - the fox jump jumped

Now try a search and highlight for the phrase query "fox jumped". The search will correctly find the document; the highlighter will fail to highlight the phrase because it thinks that there is an additional word between "fox" and "jumped". If we use the original (from the analyzer) token stream then the highlighter works.

Also, consider the converse - the fox did not jump
"not" is a stop word and there is an option to increment the position to account for stop words - (the,0) (fox,1) (did,2) (jump,4)
When retrieved from the index using TokenSources.getTokenStream(tpv,false), the token stream will be - (the,0) (fox,1) (did,2) (jump,3).

So the phrase query "did jump" will cause the "did" and "jump" terms in the text "did not jump" to be highlighted. If we use the original (from the analyzer) token stream then the highlighter works correctly.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-2035.patch
16/Dec/09 23:53
40 kB
Mark Miller
LUCENE-2035.patch
16/Dec/09 02:49
20 kB
Mark Miller
LUCENE-2305.patch
05/Nov/09 13:43
20 kB
Christopher Morris

Activity

People

Assignee:: Mark Miller

Reporter:: Christopher Morris

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 05/Nov/09 13:40

Updated:: 28/Aug/22 12:13

Resolved:: 27/Nov/10 23:25

Time Tracking

Estimated:

24h

Remaining:

24h

Logged:

Not Specified