Lucene - Core
  1. Lucene - Core
  2. LUCENE-4065

FilteringTokenFilter should never corrupt the tokenstream graph

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Currently removers like stopfilter have an option (true/false) to enable position increments.

      If its true: it both inserts gaps where necessary AND propagates gaps down the stream.
      If its false: it does neither, which can totally mess up the tokenstream graph (e.g. move synonyms to another word).

      There are totally valid natural usecases for false, where you don't want gaps because you want phrasequeries to act as if the word was never actually there.

      But 'not inserting gaps' is separate from proper propagation of existing gaps.

      So I think we should provide an option (either fix 'false' or make it an enum), where you still get a legit tokenstream and dont totally screw it up, but you simply omit gaps.

      See LUCENE-3848 for more information (Where we at least fixed this case to not begin the tokenstream with posinc=0)

        Issue Links

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              Robert Muir
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:

                Development