Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-4343

Clear up more Tokenizer.setReader/TokenStream.reset issues


    • Type: Task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0, 6.0
    • Component/s: modules/analysis
    • Labels:
    • Lucene Fields:


      spinoff from user-list thread.

      I think the rename helps, but the javadocs still have problems: they seem to only describe a totally wacky case (CachingTokenFilter) and not the normal case.

      Ideally setReader would be final I think, but there are a few crazy tokenstreams to fix before I could make that work. Would also need something hackish so MockTokenizer's state machine is still functional.

      But i worked on fixing up the mess in our various tokenstreams, which is easy for the most part.

      As part of this I found it was really useful in flushing out test bugs (ones that dont use MockTokenizer, which they really should), if we can do some best-effort exceptions when the consumer is broken and it costs nothing.

      For example:

      -  private int offset = 0, bufferIndex = 0, dataLen = 0, finalOffset = 0;
      +  // note: bufferIndex is -1 here to best-effort AIOOBE consumers that don't call reset()
      +  private int offset = 0, bufferIndex = -1, dataLen = 0, finalOffset = 0;

      I think this is worth exploring more... this was really effective at finding broken tests etc. We should see if we can be more thorough/ideally throw better exceptions when consumers are broken and its free.


        1. LUCENE-4343.patch
          23 kB
          Robert Muir
        2. LUCENE-4343.patch
          23 kB
          Robert Muir
        3. LUCENE-4343.patch
          19 kB
          Robert Muir
        4. LUCENE-4343.patch
          17 kB
          Robert Muir



            • Assignee:
              rcmuir Robert Muir
            • Votes:
              0 Vote for this issue
              2 Start watching this issue


              • Created: