Description
Converts a text document to a sparse vector of token counts. Similar to http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html
I can further add an estimator to extract vocabulary from corpus if that's appropriate.
Attachments
Issue Links
- relates to
-
SPARK-9890 User guide for CountVectorizer
- Resolved
- links to