Lucene - Core
  1. Lucene - Core
  2. LUCENE-2165

SnowballAnalyzer lacks a constructor that takes a Set of Stop Words

    Details

    • Lucene Fields:
      New, Patch Available

      Description

      As discussed on the java-user list, the SnowballAnalyzer has been updated to use a Set of stop words. However, there is no constructor which accepts a Set, there's only the original String[] one

      This is an issue, because most of the common sources of stop words (eg StopAnalyzer) have deprecated their String[] stop word lists, and moved over to Sets (eg StopAnalyzer.ENGLISH_STOP_WORDS_SET). So, for now, you either have to use a deprecated field on StopAnalyzer, or manually turn the Set into an array so you can pass it to the SnowballAnalyzer

      I would suggest that a constructor is added to SnowballAnalyzer which accepts a Set. Not sure if the old String[] one should be deprecated or not.

      A sample patch against 2.9.1 to add the constructor is:

      — SnowballAnalyzer.java.orig 2009-12-15 11:14:08.000000000 +0000
      +++ SnowballAnalyzer.java 2009-12-14 12:58:37.000000000 +0000
      @@ -67,6 +67,12 @@
      stopSet = StopFilter.makeStopSet(stopWords);
      }

      + /** Builds the named analyzer with the given stop words. */
      + public SnowballAnalyzer(Version matchVersion, String name, Set stopWordsSet)

      { + this(matchVersion, name); + stopSet = stopWordsSet; + }

      +

        Activity

        Hide
        Uwe Schindler added a comment -

        And in 3.0 its simply Set<?>.

        Show
        Uwe Schindler added a comment - And in 3.0 its simply Set<?>.
        Hide
        Robert Muir added a comment -

        if no one objects, will commit tomorrow

        Show
        Robert Muir added a comment - if no one objects, will commit tomorrow
        Hide
        Uwe Schindler added a comment -

        +1 looks good. In my opinion backport is not needed (to 2.9) but maybe for 3.0? Because in 3.0 there is no longer a STOP_WORD-Array in StopFilter.

        Show
        Uwe Schindler added a comment - +1 looks good. In my opinion backport is not needed (to 2.9) but maybe for 3.0? Because in 3.0 there is no longer a STOP_WORD-Array in StopFilter.
        Hide
        Simon Willnauer added a comment -

        Robert, I wonder if you want to make stopSet final and assign the empty set to it in the ctor without stopwords. I always prefer an empty collection over null so you can simply replace the null checks with stopSet.isEmpty(). – kind of unrelated and we can do in sep. issue if you want.

        Show
        Simon Willnauer added a comment - Robert, I wonder if you want to make stopSet final and assign the empty set to it in the ctor without stopwords. I always prefer an empty collection over null so you can simply replace the null checks with stopSet.isEmpty(). – kind of unrelated and we can do in sep. issue if you want.
        Hide
        Robert Muir added a comment -

        Committed revision 891209.

        Show
        Robert Muir added a comment - Committed revision 891209.
        Hide
        Uwe Schindler added a comment -

        backport

        Show
        Uwe Schindler added a comment - backport

          People

          • Assignee:
            Robert Muir
            Reporter:
            Nick Burch
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development