Lucene - Core
  1. Lucene - Core
  2. LUCENE-3848

basetokenstreamtestcase should fail if tokenstream starts with posinc=0

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.6, 4.0-ALPHA
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      This is meaningless for a tokenstream to start with posinc=0,

      Its also caused problems and hairiness in the indexer (LUCENE-1255, LUCENE-1542),
      and it makes senseless tokenstreams. We should add a check and fix any that do this.

      Furthermore the same bug can exist in removing-filters if they have enablePositionIncrements=false.
      I think this option is useful: but it shouldnt mean 'allow broken tokenstream', it just means we
      don't add gaps.

      If you remove tokens with enablePositionIncrements=false it should not cause the TS to start with
      positionincrement=0, and it shouldnt 'restructure' the tokenstream (e.g. moving synonyms on top of a different word).
      It should just not add any 'holes'.

      1. LUCENE-3848.patch
        9 kB
        Robert Muir
      2. LUCENE-3848-MockGraphTokenFilter.patch
        21 kB
        Michael McCandless
      3. LUCENE-3848.patch
        2 kB
        Robert Muir

        Activity

        Hide
        Robert Muir added a comment -

        I opened LUCENE-3873 to integrate MockGraphTokenFilter into tests.

        Show
        Robert Muir added a comment - I opened LUCENE-3873 to integrate MockGraphTokenFilter into tests.
        Hide
        Robert Muir added a comment -

        I think this is ready to go in, ill wait a bit.

        I didn't make any changes re: "graph-restructuring", though I still feel we should fix this, but it means
        dealing with backwards compatibility, etc.

        The changes in this patch are backwards compatible, in the sense that consumers are already correcting
        'initial posInc=0' to posinc=1 anyway.

        Show
        Robert Muir added a comment - I think this is ready to go in, ill wait a bit. I didn't make any changes re: "graph-restructuring", though I still feel we should fix this, but it means dealing with backwards compatibility, etc. The changes in this patch are backwards compatible, in the sense that consumers are already correcting 'initial posInc=0' to posinc=1 anyway.
        Hide
        Michael McCandless added a comment -

        +1

        Show
        Michael McCandless added a comment - +1
        Hide
        Robert Muir added a comment -

        updated patch: I think its ready to commit.

        I didn't integrate Mike's nice MockGraphTokenFilter yet but will do this under a separate issue: its likely to expose a few bugs

        Show
        Robert Muir added a comment - updated patch: I think its ready to commit. I didn't integrate Mike's nice MockGraphTokenFilter yet but will do this under a separate issue: its likely to expose a few bugs
        Hide
        Michael McCandless added a comment -

        Patch, adding a MockGraphTokenFilter we can use to randomly insert fake graph arcs...

        Show
        Michael McCandless added a comment - Patch, adding a MockGraphTokenFilter we can use to randomly insert fake graph arcs...
        Hide
        Robert Muir added a comment -

        patch fixing the bug in WikipediaTokenizer.

        But i think we just dont have good tests for the removers.

        Ideally for tests i think we should have a simple 'MockSynonymsFilter' that is juts stupid and slow and makes certain synonyms (maybe some multi-word) to use in testing.

        Then we can write tests to find and fix the bugs in the removingfilter.

        Show
        Robert Muir added a comment - patch fixing the bug in WikipediaTokenizer. But i think we just dont have good tests for the removers. Ideally for tests i think we should have a simple 'MockSynonymsFilter' that is juts stupid and slow and makes certain synonyms (maybe some multi-word) to use in testing. Then we can write tests to find and fix the bugs in the removingfilter.

          People

          • Assignee:
            Unassigned
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development