Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.4
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      Updated Snowball contrib package

      • New org.tartarus.snowball java package with patched SnowballProgram to be abstract to avoid using reflection.
      • Introducing Hungarian, Turkish and Romanian stemmers
      • Introducing constructor SnowballFilter(SnowballProgram)

      It is possible there have been some changes made to the some of there stemmer algorithms between this patch and the current SVN trunk of Lucene, an index might thus not be compatible with new stemmers!

      The API is backwards compatibile and the test pass.

      1. LUCENE-1142.txt
        1.32 MB
        Karl Wettin
      2. snowball.tartarus.txt
        1.32 MB
        Karl Wettin

        Issue Links

          Activity

          Hide
          Karl Wettin added a comment -

          I propose for this patch to be included in Lucene 3.0.0 (3.0.1?)

          Show
          Karl Wettin added a comment - I propose for this patch to be included in Lucene 3.0.0 (3.0.1?)
          Hide
          Grant Ingersoll added a comment -

          Karl,

          This is marked as 2.4, but your comment suggests 3.0. Which do you prefer? We are, again, in that tricky spot of how to handle changes where the API is compatible, but the index can change.

          One thing to check, maybe, is how many terms are going to be affected? Maybe we put this in now and we mark it in CHANGES.txt that users of Snowball will need to re-index, or at least thoroughly test to see if there are any issues in their setup.

          I can go either way.

          Show
          Grant Ingersoll added a comment - Karl, This is marked as 2.4, but your comment suggests 3.0. Which do you prefer? We are, again, in that tricky spot of how to handle changes where the API is compatible, but the index can change. One thing to check, maybe, is how many terms are going to be affected? Maybe we put this in now and we mark it in CHANGES.txt that users of Snowball will need to re-index, or at least thoroughly test to see if there are any issues in their setup. I can go either way.
          Hide
          Karl Wettin added a comment -

          I can try to trace the changes of the snowball code, but probably not anytime soon. That could be one reason to wait for 3.0. I also think that people are more likely to check for changes when they switch to 3.0 than when upgrading to 2.4.

          Show
          Karl Wettin added a comment - I can try to trace the changes of the snowball code, but probably not anytime soon. That could be one reason to wait for 3.0. I also think that people are more likely to check for changes when they switch to 3.0 than when upgrading to 2.4.
          Hide
          Grant Ingersoll added a comment -

          One other thought on this, and I think it is consistent w/ how we've handled other token changes is to do the upgrade, mark it in the CHANGES.txt clearly per the http://wiki.apache.org/lucene-java/BackwardsCompatibility and to also note that to retain the old behavior, one needs to drop in the old jars.

          Could this work? I haven't tried it, but it seems like it could, except for your one comment above about the use of reflection. Of course, there may be a way around that too.

          Additionally, we have an out in that the back compat. link above says:

          "All contribs are not created equal."

          The compatibility commitments of a contrib package can vary based on it's maturity and intended usage. The README.txt file for each contrib should identify it's approach to compatibility. If the README.txt file for a contrib package does not address it's backwards compatibility commitments users should assume it does not make any compatibility commitments.

          Thus, I think we should include this for 2.4 and we should note it in CHANGES.txt and in the Snowball README.

          Show
          Grant Ingersoll added a comment - One other thought on this, and I think it is consistent w/ how we've handled other token changes is to do the upgrade, mark it in the CHANGES.txt clearly per the http://wiki.apache.org/lucene-java/BackwardsCompatibility and to also note that to retain the old behavior, one needs to drop in the old jars. Could this work? I haven't tried it, but it seems like it could, except for your one comment above about the use of reflection. Of course, there may be a way around that too. Additionally, we have an out in that the back compat. link above says: "All contribs are not created equal." The compatibility commitments of a contrib package can vary based on it's maturity and intended usage. The README.txt file for each contrib should identify it's approach to compatibility. If the README.txt file for a contrib package does not address it's backwards compatibility commitments users should assume it does not make any compatibility commitments. Thus, I think we should include this for 2.4 and we should note it in CHANGES.txt and in the Snowball README.
          Hide
          Karl Wettin added a comment -
          • Fixed some conflicts against trunk
          • Backwards compatibility messages

          This is what I did with README:

          +IMPORTANT NOTICE ON BACKWARDS COMPATIBILITY!
          +
          +An index created using the Snowball module in Lucene 2.3.2 and below
          +might not be compatible with the Snowball module in Lucene 2.4 or greater.
          +
          +For more information about this issue see:
          +https://issues.apache.org/jira/browse/LUCENE-1142
          

          I also added this text the the package javadocs.

          About LUCENE-740, all changes in the patches of that issue originates from the Snowball repository and is thus also available in this patch.

          Should I go ahead and commit this for 2.4?

          Show
          Karl Wettin added a comment - Fixed some conflicts against trunk Backwards compatibility messages This is what I did with README: +IMPORTANT NOTICE ON BACKWARDS COMPATIBILITY! + +An index created using the Snowball module in Lucene 2.3.2 and below +might not be compatible with the Snowball module in Lucene 2.4 or greater. + +For more information about this issue see: +https://issues.apache.org/jira/browse/LUCENE-1142 I also added this text the the package javadocs. About LUCENE-740 , all changes in the patches of that issue originates from the Snowball repository and is thus also available in this patch. Should I go ahead and commit this for 2.4?
          Hide
          Grant Ingersoll added a comment -
          Show
          Grant Ingersoll added a comment - +1 -------------------------- Grant Ingersoll http://www.lucidimagination.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
          Hide
          Karl Wettin added a comment -

          Committed in rev 688420

          Show
          Karl Wettin added a comment - Committed in rev 688420
          Hide
          Yonik Seeley added a comment - - edited

          I just tried updating Solr... I guess this isn't so backward compatible (code-wise) because of the package change?

              [javac] f:\code\solr\src\java\org\apache\solr\analysis\EnglishPorterFilterFactory.java:78: package net.sf.snowball.ext does not exist
              [javac]   private net.sf.snowball.ext.EnglishStemmer stemmer;
          

          Oh, nevermind... SnowballFilter should be the public interface that this is accessed through.

          Show
          Yonik Seeley added a comment - - edited I just tried updating Solr... I guess this isn't so backward compatible (code-wise) because of the package change? [javac] f:\code\solr\src\java\org\apache\solr\analysis\EnglishPorterFilterFactory.java:78: package net.sf.snowball.ext does not exist [javac] private net.sf.snowball.ext.EnglishStemmer stemmer; Oh, nevermind... SnowballFilter should be the public interface that this is accessed through.

            People

            • Assignee:
              Karl Wettin
              Reporter:
              Karl Wettin
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development