Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8501

An ability to define the sum method for custom term frequencies

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • None
    • None
    • core/index
    • None
    • New

    Description

      Custom term frequencies allows expert users to index and score in custom ways, however, DefaultIndexingChain adds a limitation to this as the sum of frequencies can't overflow

      try {
          invertState.length = Math.addExact(invertState.length, invertState.termFreqAttribute.getTermFrequency());
      } catch (ArithmeticException ae) {
          throw new IllegalArgumentException("too many tokens for field \"" + field.name() + "\"");
      }
      

      This might become an issue if for example the frequency data is encoded in a different way, say the specific scorer works with float frequencies.

      The sum method can be added to TermFrequencyAttribute to get something like

      invertState.length = invertState.termFreqAttribute.addFrequency(invertState.length);
      

      so users may define the summing method and avoid the owerflow exceptions.

      Attachments

        Activity

          People

            Unassigned Unassigned
            ollik1 Olli Kuonanoja
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: