Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
2.9.3, 3.0.2, 3.1, 4.0-ALPHA
-
None
-
New
Description
Problem: If a multiValued field which contains a stop word (e.g. "will" in the following sample) only value is analyzed by StopAnalyzer when indexing, the offsets of the subsequent tokens are not correct.
doc.add( new Field( F, "Mike", Store.YES, Index.ANALYZED, TermVector.WITH_OFFSETS ) ); doc.add( new Field( F, "will", Store.YES, Index.ANALYZED, TermVector.WITH_OFFSETS ) ); doc.add( new Field( F, "use", Store.YES, Index.ANALYZED, TermVector.WITH_OFFSETS ) ); doc.add( new Field( F, "Lucene", Store.YES, Index.ANALYZED, TermVector.WITH_OFFSETS ) );
In this program (soon to be attached), if you use WhitespaceAnalyzer, you'll get the offset(start,end) for "use" and "Lucene" will be use(10,13) and Lucene(14,20). But if you use StopAnalyzer, the offsets will be use(9,12) and lucene(13,19). When searching, since searcher cannot know what analyzer was used at indexing time, this problem causes out of alignment of FVH.
Cause of the problem: StopAnalyzer filters out "will", anyToken flag set to false then offset gap is not added in DocInverterPerField:
if (anyToken)
fieldState.offset += docState.analyzer.getOffsetGap(field);
I don't understand why the condition is there... If always the gap is added, I think things are simple.
Attachments
Attachments
Issue Links
- is related to
-
LUCENE-2529 always apply position increment gap between values
- Closed