Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-9578

Stemmer feature transformer

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Minor
    • Resolution: Incomplete
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: ML
    • Labels:

      Description

      Transformer mentioned first in SPARK-5571 based on suggestion from Alok Singh. Very standard NLP preprocessing task.

      From Alok Singh:

      We have one scala stemmer in scalanlp%chalk https://github.com/scalanlp/chalk/tree/master/src/main/scala/chalk/text/analyze which can easily copied (as it is apache project) and is in scala too.
      I think this will be better alternative than lucene englishAnalyzer or opennlp.
      Note: we already use the scalanlp%breeze via the maven dependency so I think adding scalanlp%chalk dependency is also the options. But as you had said we can copy the code as it is small.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                josephkb Joseph K. Bradley
              • Votes:
                1 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: