[LUCENE-10037] Explore a single scoring implementation in DrillSidewaysScorer - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Task
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: 9.0
Fix Version/s: None
Component/s: modules/facet
Labels:
None

Lucene Fields:

New

Description

DrillSidewaysScorer currently implements three separate strategies for bulk scoring documents: doQueryFirstScoring, doUnionScoring and doDrillDownAdvanceScoring. As far as I can tell, this code dates back to 2013 and two of the three approaches appear to emulate the BooleanScorer "window scoring" / "term-at-a-time" strategy. While this strategy in BooleanScorer is still useful in some cases, the primary benefit, from what I can tell, is to avoid re-heap operations in disjunction cases (as recently described by jpountz). I can't see any reason why we'd prefer these two approaches anymore in DrillSidewaysScorer since we're doing pure conjunctions (no re-heaping to worry about) and doQueryFirstScoring takes advantage of skipping by advancing postings (while the other two approaches iterate their postings entirely, only relying on nextDoc functionality). Finally, we added an optimization (~~LUCENE-10030~~) that can only work for doQueryFirstScoring that lazily evaluates the score (where-as doUnionScoring and doDrillDownAdvanceScoring eagerly evaluate it).

All this is to say we should try sending all scoring through doQueryFirstScoring and see how it benchmarks. I'm not sure if we have benchmarks setup already for drill sideways, but I'd love to see if we can't optimize DrillSidewaysScorer while also reducing its code complexity!

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Greg Miller

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 27/Jul/21 23:54

Updated:: 28/Aug/22 16:24