[SPARK-9578] Stemmer feature transformer - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Minor
Resolution: Incomplete
Affects Version/s: None
Fix Version/s: None
Component/s: ML
Labels:
- bulk-closed

Description

Transformer mentioned first in ~~SPARK-5571~~ based on suggestion from aloknsingh. Very standard NLP preprocessing task.

From aloknsingh:

We have one scala stemmer in scalanlp%chalk https://github.com/scalanlp/chalk/tree/master/src/main/scala/chalk/text/analyze which can easily copied (as it is apache project) and is in scala too.
I think this will be better alternative than lucene englishAnalyzer or opennlp.
Note: we already use the scalanlp%breeze via the maven dependency so I think adding scalanlp%chalk dependency is also the options. But as you had said we can copy the code as it is small.

Attachments

Issue Links

is required by

SPARK-5571 LDA should handle text as well

Resolved

links to

[Github] Pull Request #10272 (hhbyyh)

Activity

People

Assignee:: Unassigned

Reporter:: Joseph K. Bradley

Votes:: 1 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 03/Aug/15 23:34

Updated:: 21/May/19 04:35

Resolved:: 21/May/19 04:35