Description
There exist tokenizer classes in the spark.ml.feature package and in the LDAExample in the spark.examples.mllib package. The Tokenizer in the LDAExample is more advanced and should be made into a full-fledged public class in spark.mllib.feature. The spark.ml.feature.Tokenizer class should become a wrapper around the new Tokenizer.
Attachments
Issue Links
- is required by
-
SPARK-5572 LDA improvement listing
- Resolved
- relates to
-
SPARK-7583 User guide update for RegexTokenizer
- Resolved
- links to