Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-8521 Feature Transformers in 1.5
  3. SPARK-8703

Add CountVectorizer as a ml transformer to convert document to words count vector

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.5.0
    • ML
    • None

    Description

      Converts a text document to a sparse vector of token counts. Similar to http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html

      I can further add an estimator to extract vocabulary from corpus if that's appropriate.

      Attachments

        Issue Links

          Activity

            People

              yuhaoyan yuhao yang
              yuhaoyan yuhao yang
              Joseph K. Bradley Joseph K. Bradley
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 24h
                  24h
                  Remaining:
                  Remaining Estimate - 24h
                  24h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified