Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8405

Remove TopHits.maxScore

    XMLWordPrintableJSON

    Details

    • Type: Task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 8.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I would like to propose removing TopDocs.maxScore. The reasoning is that either you are sorting by score and then its value is easy to access via the score of the best hit. Or you sort by one or more fields and computing it is wasteful:

      • term frequencies and norms need to be read and decoded for every match
      • scores need to be computed on every match
      • early-termination optimizations are disabled

      It would be more efficient to collect hits twice: once with scores disabled to get the top hits, and once to get the best score which would run efficiently thanks to impacts and MAXSCORE, especially with a size of 1:

      TopDocs topHits = searcher.search(query, 1);
      float maxScore = topHits.scoreDocs.length == 0 ? Float.NaN : topHits.scoreDocs[0].score;
      

      The doDocScores option of TopFieldCollector has drawbacks as well but at least doesn't disable early-termination optimizations and doesn't require scores to be computed on every hit.

      As this would be a significant breaking change, I'm targeting 8.0.

        Attachments

        1. LUCENE-8405.patch
          122 kB
          Adrien Grand
        2. LUCENE-8405.patch
          121 kB
          Adrien Grand

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                jpountz Adrien Grand
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: