Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Incomplete
-
2.1.0
-
None
Description
It would be nice to have a possibility of specyfing the range (or maybe a list of) sizes of ngrams, like it is done in sklearn, as in
http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#sklearn.feature_extraction.text.TfidfVectorizer
This shouldn't be difficult to add, the code is very straightforward, and I can implement it. The only issue is with the NGram API - should it just accept a number/tuple/list?
Attachments
Issue Links
- is duplicated by
-
SPARK-20838 Spark ML ngram feature extractor should support ngram range like scikit
-
- Resolved
-
- links to