Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-6979

Tokenizer input state detection should reset state before throwing

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None
    • New

    Description

      The Tokenizer will helpfully let you know that you're using it wrong in certain cases, like if you forget to close it. However it does this in a way that happens lazily (after the fact) but worse it keeps the state of the Tokenizer in a cranky state (i.e. if you try to use it again, you'll get an exception again). What makes this issue insidious is that Tokenizers are re-used via a ReuseStrategy in a ThreadLocal. So once you hit this bug, you're thread is, in a word, "poisoned". And what makes the stack trace a real head-scratcher is that it is not of the original "guilty" party that didn't close; it's likely some other caller, perhaps an indexing thread who isn't going to misuse the TokenStream, or at least hasn't yet. The error message could make that clearer.

      Attachments

        Activity

          People

            Unassigned Unassigned
            dsmiley David Smiley
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: