Details
Description
MapReduce and DSL Naive Bayes implementations currently lack the ability to classify a new document (outside of the training/holdout corpus). This New feature will do the following.
1. Vectorize a new text document using the dictionary and document frequencies from the training/holdout corpus
- assume the original corpus was vectorized using `seq2sparse`; step (1) will use all of the same parameters.
2. Score and label a new document using a previously trained model.
This effort will need to be done in parallel for MRLegacy and DSL implementations. Neither should be too much work.
Attachments
Issue Links
- relates to
-
MAHOUT-1493 Port Naive Bayes to the Spark DSL
- Closed