Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-955

Bayes classification result are unstable after classifying non-existing features

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 0.5
    • 0.7
    • None
    • JRE 7

    Description

      Bayes classification results are unstable, and change during runtime!

      Sample test:
      MyClassifier classifier = new MyClassifier(new BayesAlgorithm(),params); //Custom simple wrapper for classifier
      ClassifierResult category = classifier.classify("existing");
      double resultA = category.getScore();
      category = classifier.classify("nonexisting");
      category = classifier.classify("existing");
      double resultB = category.getScore();
      Assert.assertEquals(resultA,resultB,0.0); // FAIL!!!

      Test like the one above will fail. Because nonexisting tokens are added to InMemoryBayesDatastore->featureDictionary therefore datastore.getWeight("sumWeight", "vocabCount") change after classification of unknown feature. Moreover, the featureDictionary fills with not wanted strings using heapspace.

      More on this here
      http://www.lucidimagination.com/search/document/7dabe3efec8d136d/issues_with_memory_use_and_inconsistent_or_state_influenced_results_when_using_cbayesalgorit#8853165db260bf75

      Attachments

        Activity

          People

            robinanil Robin Anil
            rzulf MichaƂ B
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: