Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-3849

position increments should be implemented by TokenStream.end()

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.6, 4.0-ALPHA
    • Fix Version/s: 4.5, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      if you have pages of a book as multivalued fields, with the default position increment gap
      of analyzer.java (0), phrase queries won't work across pages if one ends with stopword(s).

      This is because the 'trailing holes' are not taken into account in end(). So I think in
      TokenStream.end(), subclasses of FilteringTokenFilter (e.g. stopfilter) should do:

      super.end();
      posIncAtt += skippedPositions;
      

      One problem is that these filters need to 'add' to the posinc, but currently nothing clears
      the attributes for end() [they are dirty, except offset which is set by the tokenizer].

      Also the indexer should be changed to pull posIncAtt from end().

        Attachments

        1. LUCENE-3849.patch
          34 kB
          Michael McCandless
        2. LUCENE-3849.patch
          31 kB
          Michael McCandless
        3. LUCENE-3849.patch
          27 kB
          Robert Muir
        4. LUCENE-3849.patch
          7 kB
          Robert Muir

          Issue Links

            Activity

              People

              • Assignee:
                mikemccand Michael McCandless
                Reporter:
                rcmuir Robert Muir
              • Votes:
                1 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: