Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-10037

Explore a single scoring implementation in DrillSidewaysScorer

Details

    • Task
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 9.0
    • None
    • modules/facet
    • None
    • New

    Description

      DrillSidewaysScorer currently implements three separate strategies for bulk scoring documents: doQueryFirstScoring, doUnionScoring and doDrillDownAdvanceScoring. As far as I can tell, this code dates back to 2013 and two of the three approaches appear to emulate the BooleanScorer "window scoring" / "term-at-a-time" strategy. While this strategy in BooleanScorer is still useful in some cases, the primary benefit, from what I can tell, is to avoid re-heap operations in disjunction cases (as recently described by jpountz). I can't see any reason why we'd prefer these two approaches anymore in DrillSidewaysScorer since we're doing pure conjunctions (no re-heaping to worry about) and doQueryFirstScoring takes advantage of skipping by advancing postings (while the other two approaches iterate their postings entirely, only relying on nextDoc functionality). Finally, we added an optimization (LUCENE-10030) that can only work for doQueryFirstScoring that lazily evaluates the score (where-as doUnionScoring and doDrillDownAdvanceScoring eagerly evaluate it).

       

      All this is to say we should try sending all scoring through doQueryFirstScoring and see how it benchmarks. I'm not sure if we have benchmarks setup already for drill sideways, but I'd love to see if we can't optimize DrillSidewaysScorer while also reducing its code complexity!

      Attachments

        Activity

          People

            Unassigned Unassigned
            gsmiller Greg Miller
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: