Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-4927

Prevent underflow in NB classifier likelihood calculation

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 4.2
    • 4.7, 6.0
    • modules/classification
    • None
    • New

    Description

      Current likelihood calculation multiplies probabilities (whose values are between 0 and 1) thus having longish docs with unfrequent words for some class/category may lead to multiple double multiplications to return 0 even if that's not the correct value (thus assigning such a class 0 probability too).

      Probably using loglikelihood and/or BigDecimals may help.

      Attachments

        Activity

          People

            teofili Tommaso Teofili
            teofili Tommaso Teofili
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: