Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-1564

Naive Bayes Classifier for New Text Documents

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Implemented
    • 0.9
    • 0.10.0
    • None

    Description

      MapReduce and DSL Naive Bayes implementations currently lack the ability to classify a new document (outside of the training/holdout corpus). This New feature will do the following.

      1. Vectorize a new text document using the dictionary and document frequencies from the training/holdout corpus

      • assume the original corpus was vectorized using `seq2sparse`; step (1) will use all of the same parameters.

      2. Score and label a new document using a previously trained model.

      This effort will need to be done in parallel for MRLegacy and DSL implementations. Neither should be too much work.

      Attachments

        Issue Links

          Activity

            People

              Andrew_Palumbo Andrew Palumbo
              Andrew_Palumbo Andrew Palumbo
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: