Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-2206

integrate snowball stopword lists

Details

    • New Feature
    • Status: Reopened
    • Major
    • Resolution: Fixed
    • None
    • 4.0-ALPHA
    • modules/analysis
    • None
    • New, Patch Available

    Description

      The snowball project creates stopword lists as well as stemmers, example: http://svn.tartarus.org/snowball/trunk/website/algorithms/english/stop.txt?view=markup

      This patch includes the following:

      • snowball stopword lists for 13 languages in contrib/snowball/resources
      • all stoplists are unmodified, only added license header and converted each one from whatever encoding it was in to UTF-8
      • added getSnowballWordSet to WordListLoader, this is because the format of these files is very different, for example it supports multiple words per line and embedded comments.

      I did not add any changes to SnowballAnalyzer to actually automatically use these lists yet, i would like us to discuss this in a future issue proposing integrating snowball with contrib/analyzers.

      Attachments

        1. LUCENE-2206.patch
          69 kB
          Robert Muir
        2. LUCENE-2206-checkout-fixes.patch
          2 kB
          Uwe Schindler

        Issue Links

          Activity

            People

              rcmuir Robert Muir
              rcmuir Robert Muir
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: