Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-8521 Feature Transformers in 1.5
  3. SPARK-8455

Implement N-Gram Feature Transformer

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 1.5.0
    • ML
    • None

    Description

      N-grams are a NLP feature representation which generalize bag of words to include local context (the n-1 preceding words). We can implement N-grams in ML as a feature transformer (likely directly after tokenization).

      For example, "this is a test" should tokenize to ["this","is","a","test"], which upon applying a 2-gram feature transform should yield [["this","is"],["is","a"],["a","test"]].

      Attachments

        Issue Links

          Activity

            People

              fliang Feynman Liang
              fliang Feynman Liang
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: