[MAHOUT-1564] Naive Bayes Classifier for New Text Documents - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Implemented
Affects Version/s: 0.9
Fix Version/s: 0.10.0
Component/s: None
Labels:
- DSL
- legacy
- scala
- spark

Description

MapReduce and DSL Naive Bayes implementations currently lack the ability to classify a new document (outside of the training/holdout corpus). This New feature will do the following.

1. Vectorize a new text document using the dictionary and document frequencies from the training/holdout corpus

assume the original corpus was vectorized using `seq2sparse`; step (1) will use all of the same parameters.

2. Score and label a new document using a previously trained model.

This effort will need to be done in parallel for MRLegacy and DSL implementations. Neither should be too much work.

Attachments

Issue Links

relates to

MAHOUT-1493 Port Naive Bayes to the Spark DSL

Closed

Activity

People

Assignee:: Andrew Palumbo

Reporter:: Andrew Palumbo

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 27/May/14 21:30

Updated:: 30/May/15 17:50

Resolved:: 01/Apr/15 21:25