Lucene - Core
  1. Lucene - Core
  2. LUCENE-2055

Fix buggy stemmers and Remove duplicate analysis functionality

    Details

    • Lucene Fields:
      New

      Description

      would like to remove stemmers in the following packages, and instead in their analyzers use a SnowballStemFilter instead.

      • analyzers/fr
      • analyzers/nl
      • analyzers/ru

      below are excerpts from this code where they proudly proclaim they use the snowball algorithm.
      I think we should delete all of this custom stemming code in favor of the actual snowball package.

      /**
       * A stemmer for French words. 
       * <p>
       * The algorithm is based on the work of
       * Dr Martin Porter on his snowball project<br>
       * refer to http://snowball.sourceforge.net/french/stemmer.html<br>
       * (French stemming algorithm) for details
       * </p>
       */
      
      public class FrenchStemmer {
      
      /**
       * A stemmer for Dutch words. 
       * <p>
       * The algorithm is an implementation of
       * the <a href="http://snowball.tartarus.org/algorithms/dutch/stemmer.html">dutch stemming</a>
       * algorithm in Martin Porter's snowball project.
       * </p>
       */
      public class DutchStemmer {
      
      /**
       * Russian stemming algorithm implementation (see http://snowball.sourceforge.net for detailed description).
       */
      class RussianStemmer
      
      1. LUCENE-2055.patch
        163 kB
        Robert Muir
      2. LUCENE-2055.patch
        163 kB
        Robert Muir
      3. LUCENE-2055.patch
        169 kB
        Robert Muir
      4. LUCENE-2055.patch
        168 kB
        Robert Muir
      5. LUCENE-2055.patch
        179 kB
        Robert Muir

        Issue Links

          Activity

          Robert Muir created issue -
          Robert Muir made changes -
          Field Original Value New Value
          Fix Version/s 3.1 [ 12314025 ]
          Robert Muir made changes -
          Link This issue depends on LUCENE-2206 [ LUCENE-2206 ]
          Robert Muir made changes -
          Link This issue depends on LUCENE-2198 [ LUCENE-2198 ]
          Robert Muir made changes -
          Summary Remove duplicate analysis functionality Fix buggy stemmers and Remove duplicate analysis functionality
          Issue Type Task [ 3 ] Bug [ 1 ]
          Description would like to mark the following code deprecated, so it can be removed.

          * analyzers/fr: all except ElisionFilter, this is unrelated and standalone.
          * analyzers/nl:entire package
          * analyzers/ru: entire package

          below are excerpts from this code where they proudly proclaim they use the snowball algorithm.
          I think we should delete all of this code in favor of the actual snowball package.


          {noformat}
          /**
           * A stemmer for French words.
           * <p>
           * The algorithm is based on the work of
           * Dr Martin Porter on his snowball project<br>
           * refer to http://snowball.sourceforge.net/french/stemmer.html&lt;br>
           * (French stemming algorithm) for details
           * </p>
           */

          public class FrenchStemmer {

          /**
           * A stemmer for Dutch words.
           * <p>
           * The algorithm is an implementation of
           * the <a href="http://snowball.tartarus.org/algorithms/dutch/stemmer.html">dutch stemming</a>
           * algorithm in Martin Porter's snowball project.
           * </p>
           */
          public class DutchStemmer {

          /**
           * Russian stemming algorithm implementation (see http://snowball.sourceforge.net for detailed description).
           */
          class RussianStemmer
          {noformat}

          would like to remove stemmers in the following packages, and instead in their analyzers use a SnowballStemFilter instead.

          * analyzers/fr
          * analyzers/nl
          * analyzers/ru

          below are excerpts from this code where they proudly proclaim they use the snowball algorithm.
          I think we should delete all of this custom stemming code in favor of the actual snowball package.


          {noformat}
          /**
           * A stemmer for French words.
           * <p>
           * The algorithm is based on the work of
           * Dr Martin Porter on his snowball project<br>
           * refer to http://snowball.sourceforge.net/french/stemmer.html&lt;br>
           * (French stemming algorithm) for details
           * </p>
           */

          public class FrenchStemmer {

          /**
           * A stemmer for Dutch words.
           * <p>
           * The algorithm is an implementation of
           * the <a href="http://snowball.tartarus.org/algorithms/dutch/stemmer.html">dutch stemming</a>
           * algorithm in Martin Porter's snowball project.
           * </p>
           */
          public class DutchStemmer {

          /**
           * Russian stemming algorithm implementation (see http://snowball.sourceforge.net for detailed description).
           */
          class RussianStemmer
          {noformat}

          Robert Muir made changes -
          Attachment LUCENE-2055.patch [ 12434371 ]
          Robert Muir made changes -
          Attachment LUCENE-2055.patch [ 12434379 ]
          Robert Muir made changes -
          Attachment LUCENE-2055.patch [ 12434678 ]
          Robert Muir made changes -
          Attachment LUCENE-2055.patch [ 12434694 ]
          Robert Muir made changes -
          Attachment LUCENE-2055.patch [ 12434863 ]
          Robert Muir made changes -
          Assignee Robert Muir [ rcmuir ]
          Robert Muir made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Robert Muir made changes -
          Resolution Fixed [ 1 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Robert Muir made changes -
          Fix Version/s 2.9.4 [ 12315148 ]
          Fix Version/s 3.0.3 [ 12315147 ]
          Fix Version/s 3.1 [ 12314822 ]
          Robert Muir made changes -
          Status Reopened [ 4 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Uwe Schindler made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Mark Thomas made changes -
          Workflow jira [ 12481775 ] Default workflow, editable Closed status [ 12564013 ]
          Mark Thomas made changes -
          Workflow Default workflow, editable Closed status [ 12564013 ] jira [ 12585485 ]
          Shai Erera made changes -
          Component/s modules/analysis [ 12310230 ]
          Component/s contrib/analyzers [ 12312333 ]
          Gavin made changes -
          Link This issue depends on LUCENE-2206 [ LUCENE-2206 ]
          Gavin made changes -
          Link This issue depends upon LUCENE-2206 [ LUCENE-2206 ]
          Gavin made changes -
          Link This issue depends on LUCENE-2198 [ LUCENE-2198 ]
          Gavin made changes -
          Link This issue depends upon LUCENE-2198 [ LUCENE-2198 ]

            People

            • Assignee:
              Robert Muir
              Reporter:
              Robert Muir
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development