Details
-
New Feature
-
Status: Resolved
-
Minor
-
Resolution: Incomplete
-
None
-
None
Description
Transformer mentioned first in SPARK-5571 based on suggestion from aloknsingh. Very standard NLP preprocessing task.
From aloknsingh:
We have one scala stemmer in scalanlp%chalk https://github.com/scalanlp/chalk/tree/master/src/main/scala/chalk/text/analyze which can easily copied (as it is apache project) and is in scala too.
I think this will be better alternative than lucene englishAnalyzer or opennlp.
Note: we already use the scalanlp%breeze via the maven dependency so I think adding scalanlp%chalk dependency is also the options. But as you had said we can copy the code as it is small.
Attachments
Issue Links
- is required by
-
SPARK-5571 LDA should handle text as well
- Resolved
- links to