[LUCENE-2668] offset gap should be added regardless of existence of tokens in DocInverterPerField - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.9.3, 3.0.2, 3.1, 4.0-ALPHA
Fix Version/s: 3.1, 4.0-ALPHA
Component/s: core/index
Labels:
None

Lucene Fields:

New

Description

Problem: If a multiValued field which contains a stop word (e.g. "will" in the following sample) only value is analyzed by StopAnalyzer when indexing, the offsets of the subsequent tokens are not correct.

indexing a multiValued field

doc.add( new Field( F, "Mike", Store.YES, Index.ANALYZED, TermVector.WITH_OFFSETS ) );
doc.add( new Field( F, "will", Store.YES, Index.ANALYZED, TermVector.WITH_OFFSETS ) );
doc.add( new Field( F, "use", Store.YES, Index.ANALYZED, TermVector.WITH_OFFSETS ) );
doc.add( new Field( F, "Lucene", Store.YES, Index.ANALYZED, TermVector.WITH_OFFSETS ) );

In this program (soon to be attached), if you use WhitespaceAnalyzer, you'll get the offset(start,end) for "use" and "Lucene" will be use(10,13) and Lucene(14,20). But if you use StopAnalyzer, the offsets will be use(9,12) and lucene(13,19). When searching, since searcher cannot know what analyzer was used at indexing time, this problem causes out of alignment of FVH.

Cause of the problem: StopAnalyzer filters out "will", anyToken flag set to false then offset gap is not added in DocInverterPerField:

DocInverterPerField.java

if (anyToken)
  fieldState.offset += docState.analyzer.getOffsetGap(field);

I don't understand why the condition is there... If always the gap is added, I think things are simple.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-2668.patch
27/Sep/10 03:04
13 kB
Koji Sekiguchi
LUCENE-2668.patch
26/Sep/10 04:21
11 kB
Koji Sekiguchi
LUCENE-2668.patch
26/Sep/10 01:22
1 kB
Koji Sekiguchi
Test.java
25/Sep/10 18:05
3 kB
Koji Sekiguchi

Issue Links

is related to

LUCENE-2529 always apply position increment gap between values

Closed

Activity

People

Assignee:: Koji Sekiguchi

Reporter:: Koji Sekiguchi

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 25/Sep/10 18:02

Updated:: 28/Aug/22 12:33

Resolved:: 28/Sep/10 02:20