Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-3934

Residual IDF calculation in the pruning package is wrong

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 3.5, 3.6
    • 3.6
    • None
    • None
    • New

    Description

      As discussed on the mailing list (http://markmail.org/message/cwnyfqmet3wophec) there seems to be a bug in both the formula and in the way RIDF is calculated. The formula is missing a minus, but also the calculation uses local (in-document) term frequency instead of the total term frequency (sum of all term occurrences in a corpus).

      Attachments

        1. LUCENE-3934.patch
          5 kB
          Andrzej Bialecki

        Activity

          People

            ab Andrzej Bialecki
            ab Andrzej Bialecki
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: