Description
N-grams are a NLP feature representation which generalize bag of words to include local context (the n-1 preceding words). We can implement N-grams in ML as a feature transformer (likely directly after tokenization).
For example, "this is a test" should tokenize to ["this","is","a","test"], which upon applying a 2-gram feature transform should yield [["this","is"],["is","a"],["a","test"]].
Attachments
Issue Links
- is depended upon by
-
SPARK-8456 Python API for N-Gram Feature Transformer
- Resolved
-
SPARK-8457 Documentation for N-Gram feature transformer
- Resolved
- links to