Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-3290

add FieldInvertState.numUniqueTerms, Terms.sumDocFreq

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 3.4, 4.0-ALPHA
    • core/index
    • None
    • New

    Description

      For scoring systems like lnu.ltc (http://trec.nist.gov/pubs/trec16/papers/ibm-haifa.mq.final.pdf), we need to supply 3 stats:

      • average tf within d
      • # of unique terms within d
      • average number of unique terms across field

      If we add FieldInvertState.numUniqueTerms, you can incorporate the first two into your norms/docvalues (once we cut over),
      the average tf within d being length / numUniqueTerms.

      to compute the average across the field, we can just write the sum of all terms' docfreqs into the terms dictionary header,
      and you can then divide this by maxdoc to get the average.

      Attachments

        1. LUCENE-3290.patch
          28 kB
          Robert Muir
        2. LUCENE-3290.patch
          28 kB
          Robert Muir

        Activity

          People

            rcmuir Robert Muir
            rcmuir Robert Muir
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: