[SPARK-8455] Implement N-Gram Feature Transformer - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.5.0
Component/s: ML
Labels:
None

Target Version/s:

1.5.0

Description

N-grams are a NLP feature representation which generalize bag of words to include local context (the n-1 preceding words). We can implement N-grams in ML as a feature transformer (likely directly after tokenization).

For example, "this is a test" should tokenize to ["this","is","a","test"], which upon applying a 2-gram feature transform should yield [["this","is"],["is","a"],["a","test"]].

Attachments

Issue Links

is depended upon by

SPARK-8456 Python API for N-Gram Feature Transformer

Resolved

SPARK-8457 Documentation for N-Gram feature transformer

Resolved

links to

[Github] Pull Request #6887 (feynmanliang)

Activity

People

Assignee:: Feynman Liang

Reporter:: Feynman Liang

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 18/Jun/15 21:00

Updated:: 24/Jul/15 22:01

Resolved:: 22/Jun/15 21:15