Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-4927

Prevent underflow in NB classifier likelihood calculation

Agile BoardAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 4.2
    • Fix Version/s: 4.7, 6.0
    • Component/s: modules/classification
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Current likelihood calculation multiplies probabilities (whose values are between 0 and 1) thus having longish docs with unfrequent words for some class/category may lead to multiple double multiplications to return 0 even if that's not the correct value (thus assigning such a class 0 probability too).

      Probably using loglikelihood and/or BigDecimals may help.

        Attachments

          Activity

            People

            • Assignee:
              teofili Tommaso Teofili
              Reporter:
              teofili Tommaso Teofili

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment