Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-1564

Naive Bayes Classifier for New Text Documents

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Implemented
    • Affects Version/s: 0.9
    • Fix Version/s: 0.10.0
    • Component/s: None
    • Labels:

      Description

      MapReduce and DSL Naive Bayes implementations currently lack the ability to classify a new document (outside of the training/holdout corpus). This New feature will do the following.

      1. Vectorize a new text document using the dictionary and document frequencies from the training/holdout corpus

      • assume the original corpus was vectorized using `seq2sparse`; step (1) will use all of the same parameters.

      2. Score and label a new document using a previously trained model.

      This effort will need to be done in parallel for MRLegacy and DSL implementations. Neither should be too much work.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Andrew_Palumbo Andrew Palumbo
                Reporter:
                Andrew_Palumbo Andrew Palumbo
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: