Lucene - Core
  1. Lucene - Core
  2. LUCENE-2668

offset gap should be added regardless of existence of tokens in DocInverterPerField

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.9.3, 3.0.2, 3.1, 4.0-ALPHA
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Problem: If a multiValued field which contains a stop word (e.g. "will" in the following sample) only value is analyzed by StopAnalyzer when indexing, the offsets of the subsequent tokens are not correct.

      indexing a multiValued field
      doc.add( new Field( F, "Mike", Store.YES, Index.ANALYZED, TermVector.WITH_OFFSETS ) );
      doc.add( new Field( F, "will", Store.YES, Index.ANALYZED, TermVector.WITH_OFFSETS ) );
      doc.add( new Field( F, "use", Store.YES, Index.ANALYZED, TermVector.WITH_OFFSETS ) );
      doc.add( new Field( F, "Lucene", Store.YES, Index.ANALYZED, TermVector.WITH_OFFSETS ) );
      

      In this program (soon to be attached), if you use WhitespaceAnalyzer, you'll get the offset(start,end) for "use" and "Lucene" will be use(10,13) and Lucene(14,20). But if you use StopAnalyzer, the offsets will be use(9,12) and lucene(13,19). When searching, since searcher cannot know what analyzer was used at indexing time, this problem causes out of alignment of FVH.

      Cause of the problem: StopAnalyzer filters out "will", anyToken flag set to false then offset gap is not added in DocInverterPerField:

      DocInverterPerField.java
      if (anyToken)
        fieldState.offset += docState.analyzer.getOffsetGap(field);
      

      I don't understand why the condition is there... If always the gap is added, I think things are simple.

      1. LUCENE-2668.patch
        13 kB
        Koji Sekiguchi
      2. LUCENE-2668.patch
        11 kB
        Koji Sekiguchi
      3. LUCENE-2668.patch
        1 kB
        Koji Sekiguchi
      4. Test.java
        3 kB
        Koji Sekiguchi

        Issue Links

          Activity

          Koji Sekiguchi created issue -
          Koji Sekiguchi made changes -
          Field Original Value New Value
          Attachment Test.java [ 12455571 ]
          Koji Sekiguchi made changes -
          Attachment LUCENE-2668.patch [ 12455584 ]
          Koji Sekiguchi made changes -
          Attachment LUCENE-2668.patch [ 12455586 ]
          Robert Muir made changes -
          Link This issue is related to LUCENE-2529 [ LUCENE-2529 ]
          Koji Sekiguchi made changes -
          Assignee Koji Sekiguchi [ koji ]
          Koji Sekiguchi made changes -
          Attachment LUCENE-2668.patch [ 12455632 ]
          Koji Sekiguchi made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Fix Version/s 3.1 [ 12314822 ]
          Fix Version/s 4.0 [ 12314025 ]
          Resolution Fixed [ 1 ]
          Mark Thomas made changes -
          Workflow jira [ 12521519 ] Default workflow, editable Closed status [ 12564317 ]
          Mark Thomas made changes -
          Workflow Default workflow, editable Closed status [ 12564317 ] jira [ 12584847 ]
          Grant Ingersoll made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Koji Sekiguchi
              Reporter:
              Koji Sekiguchi
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development