Lucene - Core
  1. Lucene - Core
  2. LUCENE-3873

tie MockGraphTokenFilter into all analyzers tests

    Details

    • Type: Task Task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA, 3.6.1
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Mike made a MockGraphTokenFilter on LUCENE-3848.

      Many filters currently arent tested with anything but a simple tokenstream.
      we should test them with this, too, it might find bugs (zero-length terms,
      stacked terms/synonyms, etc)

      1. LUCENE-3873.patch
        50 kB
        Michael McCandless
      2. LUCENE-3873.patch
        49 kB
        Michael McCandless

        Activity

        Hide
        Uwe Schindler added a comment -

        Bulk close for 3.6.1

        Show
        Uwe Schindler added a comment - Bulk close for 3.6.1
        Hide
        Michael McCandless added a comment -

        New patch, fixing all nocommits. I think it's ready...

        Show
        Michael McCandless added a comment - New patch, fixing all nocommits. I think it's ready...
        Hide
        Michael McCandless added a comment -

        Patch... I think it's close, but there are still some nocommits...

        I had to rework the original MockGraphTokenFilter to sometimes buffer tokens so
        it can set the correct offsets.

        I added a few test cases to existing analyzers (SynFilter, Japanese,
        Standard), and new direct test cases.

        I also created a new MockHoleInjectingTokenFilter...

        Tests seem to pass... but it wouldn't surprise me if beasting/jenkins
        uncovers something...

        Show
        Michael McCandless added a comment - Patch... I think it's close, but there are still some nocommits... I had to rework the original MockGraphTokenFilter to sometimes buffer tokens so it can set the correct offsets. I added a few test cases to existing analyzers (SynFilter, Japanese, Standard), and new direct test cases. I also created a new MockHoleInjectingTokenFilter... Tests seem to pass... but it wouldn't surprise me if beasting/jenkins uncovers something...
        Hide
        Michael McCandless added a comment -

        I agree we can use it in specific places for starters...

        The patch on LUCENE-3848 mixes in "TokenStream to Automaton" and MockGraphTokenFilter; I'll split that apart and only commit MockGraphTokenFilter here.

        One problem is... MockGraphTokenFilter isn't setting offsets currently.... I think to do this "correctly" it needs to buffer up pending input tokens, until it's reached the posLength it wants to output for a random token, and then set the offset accordingly.

        Show
        Michael McCandless added a comment - I agree we can use it in specific places for starters... The patch on LUCENE-3848 mixes in "TokenStream to Automaton" and MockGraphTokenFilter; I'll split that apart and only commit MockGraphTokenFilter here. One problem is... MockGraphTokenFilter isn't setting offsets currently.... I think to do this "correctly" it needs to buffer up pending input tokens, until it's reached the posLength it wants to output for a random token, and then set the offset accordingly.
        Hide
        Robert Muir added a comment -

        One way we can tie this in is via LUCENE-3919.

        But: I think we can use this filter in some individual tests immediately?

        E.g. we can just add a method testRandomGraphs to the filters that do lots
        of crazy state-capturing, putting this thing in-front-of/behind them in
        the analyzer and call checkRandomData?

        Show
        Robert Muir added a comment - One way we can tie this in is via LUCENE-3919 . But: I think we can use this filter in some individual tests immediately? E.g. we can just add a method testRandomGraphs to the filters that do lots of crazy state-capturing, putting this thing in-front-of/behind them in the analyzer and call checkRandomData?
        Hide
        Michael McCandless added a comment -

        LUCENE-3848 has the MockGraphTokenFilter patch...

        Show
        Michael McCandless added a comment - LUCENE-3848 has the MockGraphTokenFilter patch...

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development