[LUCENE-3849] position increments should be implemented by TokenStream.end() - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 3.6, 4.0-ALPHA
Fix Version/s: 4.5, 6.0
Component/s: None
Labels:
None

Lucene Fields:

New

Description

if you have pages of a book as multivalued fields, with the default position increment gap
of analyzer.java (0), phrase queries won't work across pages if one ends with stopword(s).

This is because the 'trailing holes' are not taken into account in end(). So I think in
TokenStream.end(), subclasses of FilteringTokenFilter (e.g. stopfilter) should do:

super.end();
posIncAtt += skippedPositions;

One problem is that these filters need to 'add' to the posinc, but currently nothing clears
the attributes for end() [they are dirty, except offset which is set by the tokenizer].

Also the indexer should be changed to pull posIncAtt from end().

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-3849.patch
20/Aug/12 21:18
7 kB
Robert Muir
LUCENE-3849.patch
20/Aug/12 23:23
27 kB
Robert Muir
LUCENE-3849.patch
17/Aug/13 14:45
31 kB
Michael McCandless
LUCENE-3849.patch
19/Aug/13 17:01
34 kB
Michael McCandless

Issue Links

blocks

LUCENE-5180 ShingleFilter should make shingles from trailing holes

Resolved

Activity

People

Assignee:: Michael McCandless

Reporter:: Robert Muir

Votes:: 1 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 05/Mar/12 01:24

Updated:: 28/Aug/22 13:10

Resolved:: 20/Aug/13 18:14