Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-5235

throw illegalstate from Tokenizer (instead of NPE/IIOBE) if reset not called

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.6, 6.0
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      We added these best effort checks, but it would be much better if we somehow gave a clear exception... this comes up often

      1. LUCENE-5235_test.patch
        3 kB
        Robert Muir
      2. LUCENE-5235.patch
        134 kB
        Robert Muir
      3. LUCENE-5235.patch
        133 kB
        Robert Muir
      4. LUCENE-5235.patch
        21 kB
        Uwe Schindler
      5. LUCENE-5235.patch
        13 kB
        Uwe Schindler
      6. LUCENE-5235.patch
        10 kB
        Robert Muir
      7. LUCENE-5235.patch
        9 kB
        Uwe Schindler

        Activity

        Hide
        mikemccand Michael McCandless added a comment -

        +1

        Show
        mikemccand Michael McCandless added a comment - +1
        Hide
        rcmuir Robert Muir added a comment -

        modification of BaseTokenSTreamTestCase to check for this... I think its correct...

        Show
        rcmuir Robert Muir added a comment - modification of BaseTokenSTreamTestCase to check for this... I think its correct...
        Hide
        thetaphi Uwe Schindler added a comment -

        First stab with fixing common Tokenizers. Robert will upload a test patch soon.

        Show
        thetaphi Uwe Schindler added a comment - First stab with fixing common Tokenizers. Robert will upload a test patch soon.
        Hide
        rcmuir Robert Muir added a comment -

        here's an improved patch. I think this is easier and works for custom tokenizers too.

        some tests should fail: not all tokenizers fixed yet (we have to revert the null/AIOOBE hacks and call super.reset)

        Show
        rcmuir Robert Muir added a comment - here's an improved patch. I think this is easier and works for custom tokenizers too. some tests should fail: not all tokenizers fixed yet (we have to revert the null/AIOOBE hacks and call super.reset)
        Hide
        mikemccand Michael McCandless added a comment -

        +1, what a delightful solution!!

        Show
        mikemccand Michael McCandless added a comment - +1, what a delightful solution!!
        Hide
        thetaphi Uwe Schindler added a comment -

        I fixed some more Tokenizers + improved tests. We can now also detect double reset() or things like missing close().

        I am now stuck on errors with MockTokenizer sometimes complaining with IllegalState but on other times failing to report IllegalState. Just run tests in analyzers/common. Need more debugging but the corrumtion beer was too heavy... https://twitter.com/rcmuir/status/381508554798018560

        Show
        thetaphi Uwe Schindler added a comment - I fixed some more Tokenizers + improved tests. We can now also detect double reset() or things like missing close(). I am now stuck on errors with MockTokenizer sometimes complaining with IllegalState but on other times failing to report IllegalState. Just run tests in analyzers/common. Need more debugging but the corrumtion beer was too heavy... https://twitter.com/rcmuir/status/381508554798018560
        Hide
        thetaphi Uwe Schindler added a comment -

        More Tokenizers fixed.

        Show
        thetaphi Uwe Schindler added a comment - More Tokenizers fixed.
        Hide
        rcmuir Robert Muir added a comment -

        current patch, now tests are passing in lucene, but there are problems in solr tests.

        The test is big because we got confused with the assertAnalyzesToReuse vs assertAnalyzesTo (which is now useless) so we merged these.

        Show
        rcmuir Robert Muir added a comment - current patch, now tests are passing in lucene, but there are problems in solr tests. The test is big because we got confused with the assertAnalyzesToReuse vs assertAnalyzesTo (which is now useless) so we merged these.
        Hide
        rcmuir Robert Muir added a comment -

        updated patch: all tests pass

        patch here: http://pastebin.com/raw.php?i=SyvM6dCZ

        Show
        rcmuir Robert Muir added a comment - updated patch: all tests pass patch here: http://pastebin.com/raw.php?i=SyvM6dCZ
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1525362 from Uwe Schindler in branch 'dev/trunk'
        [ https://svn.apache.org/r1525362 ]

        LUCENE-5235: Tokenizers now throw an IllegalStateException if the consumer does not call reset() before consuming the stream. Previous versions throwed NullPointerException or ArrayIndexOutOfBoundsException on best effort which was not user-friendly.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1525362 from Uwe Schindler in branch 'dev/trunk' [ https://svn.apache.org/r1525362 ] LUCENE-5235 : Tokenizers now throw an IllegalStateException if the consumer does not call reset() before consuming the stream. Previous versions throwed NullPointerException or ArrayIndexOutOfBoundsException on best effort which was not user-friendly.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1525377 from Uwe Schindler in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1525377 ]

        Merged revision(s) 1525376 from lucene/dev/trunk:
        LUCENE-5235: Add more information to backwards break changes section.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1525377 from Uwe Schindler in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1525377 ] Merged revision(s) 1525376 from lucene/dev/trunk: LUCENE-5235 : Add more information to backwards break changes section.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1525825 from Uwe Schindler in branch 'dev/trunk'
        [ https://svn.apache.org/r1525825 ]

        LUCENE-5235: Update Javadocs to make it explicit that super.reset(), super.close() and super.end() must be called.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1525825 from Uwe Schindler in branch 'dev/trunk' [ https://svn.apache.org/r1525825 ] LUCENE-5235 : Update Javadocs to make it explicit that super.reset(), super.close() and super.end() must be called.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1525826 from Uwe Schindler in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1525826 ]

        Merged revision(s) 1525825 from lucene/dev/trunk:
        LUCENE-5235: Update Javadocs to make it explicit that super.reset(), super.close() and super.end() must be called.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1525826 from Uwe Schindler in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1525826 ] Merged revision(s) 1525825 from lucene/dev/trunk: LUCENE-5235 : Update Javadocs to make it explicit that super.reset(), super.close() and super.end() must be called.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1526155 from Uwe Schindler in branch 'dev/trunk'
        [ https://svn.apache.org/r1526155 ]

        LUCENE-5235: This is according to Clover never hit, so be more strict, may help Robert

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1526155 from Uwe Schindler in branch 'dev/trunk' [ https://svn.apache.org/r1526155 ] LUCENE-5235 : This is according to Clover never hit, so be more strict, may help Robert
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1526158 from Uwe Schindler in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1526158 ]

        Merged revision(s) 1526155 from lucene/dev/trunk:
        LUCENE-5235: This is according to Clover never hit, so be more strict, may help Robert

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1526158 from Uwe Schindler in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1526158 ] Merged revision(s) 1526155 from lucene/dev/trunk: LUCENE-5235 : This is according to Clover never hit, so be more strict, may help Robert

          People

          • Assignee:
            thetaphi Uwe Schindler
            Reporter:
            rcmuir Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development