Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
New
Description
After looking at the recent improvement jpountz made to WAND scoring in LUCENE-10634, which does additional work during match confirmation to not confirm a match who's score wouldn't be competitive, I wanted to see how performance would shift if we squashed the two-phase iteration completely and only returned true matches (that were also known to be competitive by score) in the "approximation" phase. I was a bit surprised to find that luceneutil benchmarks (run with wikimediumall), improves significantly on some disjunction tasks and doesn't show significant regressions anywhere else.
Note that I used LUCENE-10634 as a baseline, and built my candidate change on top of that. The diff can be seen here: DIFF
A simple conclusion here might be that we shouldn't do two-phase iteration in WANDScorer, but I'm pretty sure that's not right. I wonder if what's really going on is that we're under-estimating the cost of confirming a match? Right now we just return the tail size as the cost. While the cost of confirming a match is proportional to the tail size, the actual work involved can be quite significant (having to advance tail iterators to new blocks and decompress them). I wonder if the WAND second phase is being run too early on approximate candidates, and if less-expensive, (and even possibly more restrictive?), second phases could/should be running first?
I'm raising this here as more of a curiosity to see if it sparks ideas on how to move forward. Again, I'm not proposing we do away with two-phase iteration, but it seems we might be able to improve things. Maybe I'll explore changing the cost heuristic next. Also, maybe there's some different benchmarking that would be useful here that I may not be familiar with?
Benchmark results on wikimediumall:
TaskQPS baseline StdDevQPS candidate StdDev Pct diff p-value HighTermTitleBDVSort 22.52 (18.9%) 21.66 (15.6%) -3.8% ( -32% - 37%) 0.485 Prefix3 9.38 (9.2%) 9.09 (10.6%) -3.1% ( -20% - 18%) 0.326 HighTermMonthSort 25.37 (16.0%) 24.87 (17.1%) -2.0% ( -30% - 37%) 0.710 MedTermDayTaxoFacets 9.62 (4.2%) 9.51 (4.1%) -1.2% ( -9% - 7%) 0.368 TermDTSort 74.69 (18.0%) 74.13 (18.2%) -0.7% ( -31% - 43%) 0.897 HighTermDayOfYearSort 52.64 (16.1%) 52.32 (15.4%) -0.6% ( -27% - 36%) 0.903 BrowseMonthTaxoFacets 8.64 (19.1%) 8.59 (19.8%) -0.6% ( -33% - 47%) 0.926 BrowseDateSSDVFacets 0.86 (9.5%) 0.86 (13.1%) -0.4% ( -20% - 24%) 0.914 PKLookup 147.18 (3.9%) 146.66 (3.3%) -0.3% ( -7% - 7%) 0.759 BrowseDayOfYearSSDVFacets 3.47 (4.5%) 3.45 (4.8%) -0.3% ( -9% - 9%) 0.822 Wildcard 36.36 (4.4%) 36.26 (5.2%) -0.3% ( -9% - 9%) 0.866 BrowseMonthSSDVFacets 4.15 (12.7%) 4.13 (12.8%) -0.3% ( -22% - 28%) 0.950 AndHighMedDayTaxoFacets 15.21 (2.7%) 15.18 (2.9%) -0.2% ( -5% - 5%) 0.819 Fuzzy1 68.33 (1.8%) 68.22 (2.0%) -0.2% ( -3% - 3%) 0.783 OrHighMedDayTaxoFacets 2.90 (4.1%) 2.89 (4.0%) -0.1% ( -7% - 8%) 0.930 MedPhrase 52.81 (2.3%) 52.76 (1.8%) -0.1% ( -4% - 4%) 0.878 Respell 36.80 (1.9%) 36.78 (1.9%) -0.1% ( -3% - 3%) 0.933 Fuzzy2 63.06 (1.9%) 63.05 (2.1%) -0.0% ( -3% - 4%) 0.971 LowPhrase 74.60 (1.9%) 74.61 (1.8%) 0.0% ( -3% - 3%) 0.987 AndHighHighDayTaxoFacets 4.54 (2.3%) 4.55 (2.0%) 0.0% ( -4% - 4%) 0.960 HighPhrase 353.13 (2.6%) 353.28 (2.5%) 0.0% ( -4% - 5%) 0.958 OrNotHighHigh 761.72 (4.0%) 762.48 (3.6%) 0.1% ( -7% - 8%) 0.935 OrHighNotLow 1129.94 (4.1%) 1131.56 (3.6%) 0.1% ( -7% - 8%) 0.906 LowTerm 1315.90 (2.9%) 1318.61 (2.5%) 0.2% ( -5% - 5%) 0.810 IntNRQ 192.33 (2.8%) 192.93 (2.3%) 0.3% ( -4% - 5%) 0.701 LowSpanNear 23.60 (2.2%) 23.68 (1.6%) 0.3% ( -3% - 4%) 0.592 OrNotHighMed 867.21 (2.3%) 870.27 (2.8%) 0.4% ( -4% - 5%) 0.664 BrowseRandomLabelSSDVFacets 2.53 (1.6%) 2.54 (1.9%) 0.4% ( -3% - 3%) 0.494 AndHighMed 105.33 (4.5%) 105.83 (4.6%) 0.5% ( -8% - 9%) 0.739 HighTerm 1030.35 (5.7%) 1035.54 (5.9%) 0.5% ( -10% - 12%) 0.783 MedSloppyPhrase 41.07 (3.0%) 41.28 (2.9%) 0.5% ( -5% - 6%) 0.581 AndHighLow 287.51 (3.2%) 289.03 (4.3%) 0.5% ( -6% - 8%) 0.657 OrHighNotMed 910.71 (3.9%) 915.93 (4.1%) 0.6% ( -7% - 8%) 0.651 AndHighHigh 28.96 (5.0%) 29.15 (5.3%) 0.6% ( -9% - 11%) 0.695 OrNotHighLow 679.21 (2.7%) 683.68 (4.1%) 0.7% ( -6% - 7%) 0.551 MedTerm 1425.49 (4.8%) 1435.41 (5.1%) 0.7% ( -8% - 11%) 0.657 MedSpanNear 8.74 (3.0%) 8.80 (2.8%) 0.7% ( -4% - 6%) 0.448 BrowseRandomLabelTaxoFacets 6.11 (14.4%) 6.16 (15.2%) 0.7% ( -25% - 35%) 0.875 OrHighNotHigh 674.18 (4.1%) 679.40 (4.5%) 0.8% ( -7% - 9%) 0.569 LowSloppyPhrase 5.08 (3.3%) 5.12 (3.5%) 0.8% ( -5% - 7%) 0.445 HighSpanNear 2.22 (5.4%) 2.25 (4.2%) 1.3% ( -7% - 11%) 0.398 HighSloppyPhrase 5.27 (7.8%) 5.34 (9.0%) 1.3% ( -14% - 19%) 0.622 LowIntervalsOrdered 17.88 (4.8%) 18.21 (3.1%) 1.9% ( -5% - 10%) 0.144 BrowseDateTaxoFacets 6.51 (14.4%) 6.65 (17.4%) 2.3% ( -25% - 39%) 0.652 BrowseDayOfYearTaxoFacets 6.52 (14.4%) 6.68 (17.7%) 2.5% ( -25% - 40%) 0.624 MedIntervalsOrdered 14.43 (7.8%) 14.80 (4.5%) 2.6% ( -9% - 16%) 0.205 OrHighLow 158.48 (3.2%) 162.94 (4.2%) 2.8% ( -4% - 10%) 0.017 HighIntervalsOrdered 1.56 (9.4%) 1.60 (5.2%) 3.0% ( -10% - 19%) 0.215 OrHighMed 65.32 (4.2%) 71.62 (4.1%) 9.6% ( 1% - 18%) 0.000 OrHighHigh 14.04 (4.5%) 15.68 (3.9%) 11.7% ( 3% - 21%) 0.000