Details
-
New Feature
-
Status: Closed
-
Minor
-
Resolution: Later
-
None
-
None
-
None
-
None
Description
A set of classes that builds matrices from text.
Currently the API consists of TokenMatrixBuilder and TokenInstanceBuilder. Should be thread safe.
PostReader imports 20news-bydate. This takes several GB heap. It would be nice to bounce the data via JDBM or perhaps using the PersistentHashMap in MAHOUT-19.
Attachments
Attachments
Issue Links
- is related to
-
MAHOUT-126 Prepare document vectors from the text
- Closed