Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-9578

Stemmer feature transformer

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Minor
    • Resolution: Incomplete
    • None
    • None
    • ML

    Description

      Transformer mentioned first in SPARK-5571 based on suggestion from aloknsingh. Very standard NLP preprocessing task.

      From aloknsingh:

      We have one scala stemmer in scalanlp%chalk https://github.com/scalanlp/chalk/tree/master/src/main/scala/chalk/text/analyze which can easily copied (as it is apache project) and is in scala too.
      I think this will be better alternative than lucene englishAnalyzer or opennlp.
      Note: we already use the scalanlp%breeze via the maven dependency so I think adding scalanlp%chalk dependency is also the options. But as you had said we can copy the code as it is small.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              josephkb Joseph K. Bradley
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: