Lucene - Core
  1. Lucene - Core
  2. LUCENE-5999

Add backcompat support for StandardTokenizer before 4.7

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      The merge from trunk for branch_5x remove the backcompat support in StandardTokenizer for previous unicode versions. With 5x, we only need to add support back for 4.0-4.6 (unicode 6.1).

      1. LUCENE-5999.patch
        1.73 MB
        Ryan Ernst
      2. LUCENE-5999.patch
        59 kB
        Ryan Ernst
      3. LUCENE-5999.patch
        62 kB
        Ryan Ernst

        Activity

        Hide
        Ryan Ernst added a comment -

        Here's a patch which adds back support in the same way other backcompat is done: with a copy of the old code and versioning in the class name. This is different than was done before, but it makes it so we don't have to have Version based constructors, and we don't have possibility of some other impl detail changing in StandardTokenizer itself that could silently break compat.

        The only thing I'm not happy about with this patch is having to add AbstractStandardTokenizer so that setMaxTokenLength could be available for both impls. If someone has an idea for a cleaner approach, I'm happy to drop that.

        Show
        Ryan Ernst added a comment - Here's a patch which adds back support in the same way other backcompat is done: with a copy of the old code and versioning in the class name. This is different than was done before, but it makes it so we don't have to have Version based constructors, and we don't have possibility of some other impl detail changing in StandardTokenizer itself that could silently break compat. The only thing I'm not happy about with this patch is having to add AbstractStandardTokenizer so that setMaxTokenLength could be available for both impls. If someone has an idea for a cleaner approach, I'm happy to drop that.
        Hide
        Robert Muir added a comment -

        Instead of:

        final AbstractStandardTokenizer src;
        if (getVersion().onOrAfter(Version.LUCENE_4_7_0)) {
          src = new StandardTokenizer();
        } else {
          src = new StandardTokenizer40();
        }
        src.setMaxTokenLength(maxTokenLength);
        

        you can do:

        final Tokenizer src;
        if (getVersion().onOrAfter(Version.LUCENE_4_7_0)) {
          StandardTokenizer t = new StandardTokenizer();
          t.setMaxTokenLength(maxTokenLength);
          src = t;
        } else {
          StandardTokenizer40 t = new StandardTokenizer();
          t.setMaxTokenLength(maxTokenLength);
          src = t;
        }
        
        Show
        Robert Muir added a comment - Instead of: final AbstractStandardTokenizer src; if (getVersion().onOrAfter(Version.LUCENE_4_7_0)) { src = new StandardTokenizer(); } else { src = new StandardTokenizer40(); } src.setMaxTokenLength(maxTokenLength); you can do: final Tokenizer src; if (getVersion().onOrAfter(Version.LUCENE_4_7_0)) { StandardTokenizer t = new StandardTokenizer(); t.setMaxTokenLength(maxTokenLength); src = t; } else { StandardTokenizer40 t = new StandardTokenizer(); t.setMaxTokenLength(maxTokenLength); src = t; }
        Hide
        Ryan Ernst added a comment -

        If it was just that, I would have done it in the first place. I was trying to avoid casting inside the anonymous setReader at the end of the function. Here's a patch which does that.

        Show
        Ryan Ernst added a comment - If it was just that, I would have done it in the first place. I was trying to avoid casting inside the anonymous setReader at the end of the function. Here's a patch which does that.
        Hide
        Uwe Schindler added a comment -

        The only thing I'm not happy about with this patch is having to add AbstractStandardTokenizer so that setMaxTokenLength could be available for both impls. If someone has an idea for a cleaner approach, I'm happy to drop that.

        Make the abstract base class package-private, so it is invisible in Javadocs. You solved it already in another way, but if you want to prevent the instanceof checks, this is the way to go.

        Show
        Uwe Schindler added a comment - The only thing I'm not happy about with this patch is having to add AbstractStandardTokenizer so that setMaxTokenLength could be available for both impls. If someone has an idea for a cleaner approach, I'm happy to drop that. Make the abstract base class package-private, so it is invisible in Javadocs. You solved it already in another way, but if you want to prevent the instanceof checks, this is the way to go.
        Hide
        Ryan Ernst added a comment -

        This new patch keeps the casts (upon seeing the change for real instead of in my head, the casts didn't look so bad, and were better than adding abstractions). It adds tests for both unicode 6.1 (thanks to Steve's generation script), and tests for each analyzer to ensure passing a Version before 4.7 gets the unicode 6.1 behavior. I also moved the standard analyzer et al tests from analysis/core to analysis/standard to sit alongside the actual implementations of the things being tested.

        Show
        Ryan Ernst added a comment - This new patch keeps the casts (upon seeing the change for real instead of in my head, the casts didn't look so bad, and were better than adding abstractions). It adds tests for both unicode 6.1 (thanks to Steve's generation script), and tests for each analyzer to ensure passing a Version before 4.7 gets the unicode 6.1 behavior. I also moved the standard analyzer et al tests from analysis/core to analysis/standard to sit alongside the actual implementations of the things being tested.
        Hide
        Robert Muir added a comment -

        +1. Thanks for setting up the testing especially.

        Can we open a followup issue to remove FooTokenizerImpl and FooTokenizerInterface abstractions since we don't use that method for back compat anymore? I can help.

        Show
        Robert Muir added a comment - +1. Thanks for setting up the testing especially. Can we open a followup issue to remove FooTokenizerImpl and FooTokenizerInterface abstractions since we don't use that method for back compat anymore? I can help.
        Hide
        Ryan Ernst added a comment -

        Sure, created LUCENE-6000.

        Show
        Ryan Ernst added a comment - Sure, created LUCENE-6000 .
        Hide
        ASF subversion and git services added a comment -

        Commit 1630189 from Ryan Ernst in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1630189 ]

        LUCENE-5999: Fix backcompat support for StandardTokenizer

        Show
        ASF subversion and git services added a comment - Commit 1630189 from Ryan Ernst in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1630189 ] LUCENE-5999 : Fix backcompat support for StandardTokenizer
        Hide
        ASF subversion and git services added a comment -

        Commit 1630223 from Ryan Ernst in branch 'dev/trunk'
        [ https://svn.apache.org/r1630223 ]

        LUCENE-5999: Move standard analyzer tests on trunk to match location on branch_5x

        Show
        ASF subversion and git services added a comment - Commit 1630223 from Ryan Ernst in branch 'dev/trunk' [ https://svn.apache.org/r1630223 ] LUCENE-5999 : Move standard analyzer tests on trunk to match location on branch_5x
        Hide
        ASF subversion and git services added a comment -

        Commit 1630224 from Ryan Ernst in branch 'dev/trunk'
        [ https://svn.apache.org/r1630224 ]

        LUCENE-5999: Move classic analyzer test next to the classic analyzer

        Show
        ASF subversion and git services added a comment - Commit 1630224 from Ryan Ernst in branch 'dev/trunk' [ https://svn.apache.org/r1630224 ] LUCENE-5999 : Move classic analyzer test next to the classic analyzer
        Hide
        ASF subversion and git services added a comment -

        Commit 1630225 from Ryan Ernst in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1630225 ]

        LUCENE-5999: Move classic analyzer test next to the classic analyzer (merged 1630223)

        Show
        ASF subversion and git services added a comment - Commit 1630225 from Ryan Ernst in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1630225 ] LUCENE-5999 : Move classic analyzer test next to the classic analyzer (merged 1630223)
        Hide
        Anshum Gupta added a comment -

        Bulk close after 5.0 release.

        Show
        Anshum Gupta added a comment - Bulk close after 5.0 release.

          People

          • Assignee:
            Ryan Ernst
            Reporter:
            Ryan Ernst
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development