[SPARK-8703] Add CountVectorizer as a ml transformer to convert document to words count vector - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.5.0
Component/s: ML
Labels:
None

Target Version/s:

1.5.0

Description

Converts a text document to a sparse vector of token counts. Similar to http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html

I can further add an estimator to extract vocabulary from corpus if that's appropriate.

Attachments

Issue Links

relates to

SPARK-9890 User guide for CountVectorizer

Resolved

links to

[Github] Pull Request #7084 (hhbyyh)

Activity

People

Assignee:: yuhao yang

Reporter:: yuhao yang

Shepherd:: Joseph K. Bradley

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 29/Jun/15 11:50

Updated:: 12/Aug/15 20:25

Resolved:: 09/Jul/15 17:26

Time Tracking

Estimated:

24h

Remaining:

24h

Logged:

Not Specified