Lucene - Core
  1. Lucene - Core
  2. LUCENE-4927

Prevent underflow in NB classifier likelihood calculation

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.2
    • Fix Version/s: 4.7, Trunk
    • Component/s: modules/classification
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Current likelihood calculation multiplies probabilities (whose values are between 0 and 1) thus having longish docs with unfrequent words for some class/category may lead to multiple double multiplications to return 0 even if that's not the correct value (thus assigning such a class 0 probability too).

      Probably using loglikelihood and/or BigDecimals may help.

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        227d 18h 12m 1 Tommaso Teofili 25/Nov/13 08:10
        Resolved Resolved Closed Closed
        111d 4h 51m 1 David Smiley 16/Mar/14 13:02
        David Smiley made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Tommaso Teofili made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        ASF subversion and git services added a comment -

        Commit 1545169 from Tommaso Teofili in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1545169 ]

        LUCENE-4927 - backported to 4x

        Show
        ASF subversion and git services added a comment - Commit 1545169 from Tommaso Teofili in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1545169 ] LUCENE-4927 - backported to 4x
        Tommaso Teofili made changes -
        Field Original Value New Value
        Fix Version/s 4.7 [ 12325572 ]
        Hide
        ASF subversion and git services added a comment -

        Commit 1544433 from Tommaso Teofili in branch 'dev/trunk'
        [ https://svn.apache.org/r1544433 ]

        LUCENE-4927 - switched to log prior/likelihood to avoid possible underflows

        Show
        ASF subversion and git services added a comment - Commit 1544433 from Tommaso Teofili in branch 'dev/trunk' [ https://svn.apache.org/r1544433 ] LUCENE-4927 - switched to log prior/likelihood to avoid possible underflows
        Tommaso Teofili created issue -

          People

          • Assignee:
            Tommaso Teofili
            Reporter:
            Tommaso Teofili
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development