Lucene - Core
  1. Lucene - Core
  2. LUCENE-2191

rename Tokenizer.reset(Reader) to Tokenizer.setReader(Reader)

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-BETA, 5.0
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      in TokenStream there is a reset() method, but the method in Tokenizer used to set a new Reader is called reset(Reader).

      in my opinion this name overloading creates a lot of confusion, and we see things like reset(Reader) calling reset() even in StandardTokenizer...

      So I think this would be some work to fulfill all the backwards compatibility, but worth it because when you look at the existing reset(Reader) and reset() code in various tokenizers, or the javadocs for Tokenizer, its pretty confusing and inconsistent.

      1. LUCENE-2191.patch
        23 kB
        Robert Muir
      2. LUCENE-2191.patch
        25 kB
        Robert Muir

        Issue Links

          Activity

          Hide
          Uwe Schindler added a comment - - edited

          The transition to the new name still has a problem even with VirtualMethod from LUCENE-2188:
          If we delegate from the new method to the deprecated one when a subclass overrides the deprecated one, we produce a endless loop:

          • Consumer calls setReader()
          • setReader() detects that reset() was overridden and delegates to reset()
          • reset() calls super.reset();
          • the default impl of reset() in the base class delegates to setReader() for backwards compatibility (and it must do this in all cases, because the overridden reset() must call the super class' setReader like super.setReader() would do).

          This problem can only be solved by a thread local boolean that prevents delegation if setReader() was called twice in the same thread. At the moment, I have no other solution without inspecting the call stack...

          Show
          Uwe Schindler added a comment - - edited The transition to the new name still has a problem even with VirtualMethod from LUCENE-2188 : If we delegate from the new method to the deprecated one when a subclass overrides the deprecated one, we produce a endless loop: Consumer calls setReader() setReader() detects that reset() was overridden and delegates to reset() reset() calls super.reset(); the default impl of reset() in the base class delegates to setReader() for backwards compatibility (and it must do this in all cases, because the overridden reset() must call the super class' setReader like super.setReader() would do). This problem can only be solved by a thread local boolean that prevents delegation if setReader() was called twice in the same thread. At the moment, I have no other solution without inspecting the call stack...
          Hide
          Uwe Schindler added a comment -

          I think we should simply do this in trunk (4.0), no backwards problems! reset(Reader) is the wrong name and confuses people (especially if it should include reset() or not - correct is that it should not also call reset() - I fixed lots of Tokenizers like StandardTokenizer to conform to this)

          Show
          Uwe Schindler added a comment - I think we should simply do this in trunk (4.0), no backwards problems! reset(Reader) is the wrong name and confuses people (especially if it should include reset() or not - correct is that it should not also call reset() - I fixed lots of Tokenizers like StandardTokenizer to conform to this)
          Hide
          Robert Muir added a comment -

          Uwe: I'm starting to agree. as much as I would like to minimize user-facing API changes for 4.0, it seems that this might be worth the break.

          Show
          Robert Muir added a comment - Uwe: I'm starting to agree. as much as I would like to minimize user-facing API changes for 4.0, it seems that this might be worth the break.
          Hide
          Robert Muir added a comment -

          patch for trunk, just doing a rename

          Show
          Robert Muir added a comment - patch for trunk, just doing a rename
          Hide
          Uwe Schindler added a comment -

          Thanks. 3.x is more complicated, as we need very sophisticated VirtualMethod...

          Show
          Uwe Schindler added a comment - Thanks. 3.x is more complicated, as we need very sophisticated VirtualMethod...
          Hide
          Robert Muir added a comment -

          updated patch to trunk: I think we should do this while we can still change APIs.

          Show
          Robert Muir added a comment - updated patch to trunk: I think we should do this while we can still change APIs.
          Hide
          Uwe Schindler added a comment -

          +1 Commit thats, in 4.0 we have no backwards issues, so this is easy.

          Tokenizer.setReader(Reader) is the only reasonable method name. reset() is something you have to call in all cases!

          Show
          Uwe Schindler added a comment - +1 Commit thats, in 4.0 we have no backwards issues, so this is easy. Tokenizer.setReader(Reader) is the only reasonable method name. reset() is something you have to call in all cases!
          Hide
          Michael McCandless added a comment -

          +1

          Show
          Michael McCandless added a comment - +1
          Hide
          Chris Male added a comment -

          +1

          Show
          Chris Male added a comment - +1

            People

            • Assignee:
              Unassigned
              Reporter:
              Robert Muir
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development