[LUCENE-10639] WANDScorer performs better without two-phase - ASF JIRA

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: core/search
Labels:
None

Lucene Fields:

New

Description

After looking at the recent improvement jpountz made to WAND scoring in LUCENE-10634, which does additional work during match confirmation to not confirm a match who's score wouldn't be competitive, I wanted to see how performance would shift if we squashed the two-phase iteration completely and only returned true matches (that were also known to be competitive by score) in the "approximation" phase. I was a bit surprised to find that luceneutil benchmarks (run with wikimediumall), improves significantly on some disjunction tasks and doesn't show significant regressions anywhere else.

Note that I used LUCENE-10634 as a baseline, and built my candidate change on top of that. The diff can be seen here: DIFF

A simple conclusion here might be that we shouldn't do two-phase iteration in WANDScorer, but I'm pretty sure that's not right. I wonder if what's really going on is that we're under-estimating the cost of confirming a match? Right now we just return the tail size as the cost. While the cost of confirming a match is proportional to the tail size, the actual work involved can be quite significant (having to advance tail iterators to new blocks and decompress them). I wonder if the WAND second phase is being run too early on approximate candidates, and if less-expensive, (and even possibly more restrictive?), second phases could/should be running first?

I'm raising this here as more of a curiosity to see if it sparks ideas on how to move forward. Again, I'm not proposing we do away with two-phase iteration, but it seems we might be able to improve things. Maybe I'll explore changing the cost heuristic next. Also, maybe there's some different benchmarking that would be useful here that I may not be familiar with?

Benchmark results on wikimediumall:

                            TaskQPS baseline      StdDevQPS candidate      StdDev                Pct diff p-value
            HighTermTitleBDVSort       22.52     (18.9%)       21.66     (15.6%)   -3.8% ( -32% -   37%) 0.485
                         Prefix3        9.38      (9.2%)        9.09     (10.6%)   -3.1% ( -20% -   18%) 0.326
               HighTermMonthSort       25.37     (16.0%)       24.87     (17.1%)   -2.0% ( -30% -   37%) 0.710
            MedTermDayTaxoFacets        9.62      (4.2%)        9.51      (4.1%)   -1.2% (  -9% -    7%) 0.368
                      TermDTSort       74.69     (18.0%)       74.13     (18.2%)   -0.7% ( -31% -   43%) 0.897
           HighTermDayOfYearSort       52.64     (16.1%)       52.32     (15.4%)   -0.6% ( -27% -   36%) 0.903
           BrowseMonthTaxoFacets        8.64     (19.1%)        8.59     (19.8%)   -0.6% ( -33% -   47%) 0.926
            BrowseDateSSDVFacets        0.86      (9.5%)        0.86     (13.1%)   -0.4% ( -20% -   24%) 0.914
                        PKLookup      147.18      (3.9%)      146.66      (3.3%)   -0.3% (  -7% -    7%) 0.759
       BrowseDayOfYearSSDVFacets        3.47      (4.5%)        3.45      (4.8%)   -0.3% (  -9% -    9%) 0.822
                        Wildcard       36.36      (4.4%)       36.26      (5.2%)   -0.3% (  -9% -    9%) 0.866
           BrowseMonthSSDVFacets        4.15     (12.7%)        4.13     (12.8%)   -0.3% ( -22% -   28%) 0.950
         AndHighMedDayTaxoFacets       15.21      (2.7%)       15.18      (2.9%)   -0.2% (  -5% -    5%) 0.819
                          Fuzzy1       68.33      (1.8%)       68.22      (2.0%)   -0.2% (  -3% -    3%) 0.783
          OrHighMedDayTaxoFacets        2.90      (4.1%)        2.89      (4.0%)   -0.1% (  -7% -    8%) 0.930
                       MedPhrase       52.81      (2.3%)       52.76      (1.8%)   -0.1% (  -4% -    4%) 0.878
                         Respell       36.80      (1.9%)       36.78      (1.9%)   -0.1% (  -3% -    3%) 0.933
                          Fuzzy2       63.06      (1.9%)       63.05      (2.1%)   -0.0% (  -3% -    4%) 0.971
                       LowPhrase       74.60      (1.9%)       74.61      (1.8%)    0.0% (  -3% -    3%) 0.987
        AndHighHighDayTaxoFacets        4.54      (2.3%)        4.55      (2.0%)    0.0% (  -4% -    4%) 0.960
                      HighPhrase      353.13      (2.6%)      353.28      (2.5%)    0.0% (  -4% -    5%) 0.958
                   OrNotHighHigh      761.72      (4.0%)      762.48      (3.6%)    0.1% (  -7% -    8%) 0.935
                    OrHighNotLow     1129.94      (4.1%)     1131.56      (3.6%)    0.1% (  -7% -    8%) 0.906
                         LowTerm     1315.90      (2.9%)     1318.61      (2.5%)    0.2% (  -5% -    5%) 0.810
                          IntNRQ      192.33      (2.8%)      192.93      (2.3%)    0.3% (  -4% -    5%) 0.701
                     LowSpanNear       23.60      (2.2%)       23.68      (1.6%)    0.3% (  -3% -    4%) 0.592
                    OrNotHighMed      867.21      (2.3%)      870.27      (2.8%)    0.4% (  -4% -    5%) 0.664
     BrowseRandomLabelSSDVFacets        2.53      (1.6%)        2.54      (1.9%)    0.4% (  -3% -    3%) 0.494
                      AndHighMed      105.33      (4.5%)      105.83      (4.6%)    0.5% (  -8% -    9%) 0.739
                        HighTerm     1030.35      (5.7%)     1035.54      (5.9%)    0.5% ( -10% -   12%) 0.783
                 MedSloppyPhrase       41.07      (3.0%)       41.28      (2.9%)    0.5% (  -5% -    6%) 0.581
                      AndHighLow      287.51      (3.2%)      289.03      (4.3%)    0.5% (  -6% -    8%) 0.657
                    OrHighNotMed      910.71      (3.9%)      915.93      (4.1%)    0.6% (  -7% -    8%) 0.651
                     AndHighHigh       28.96      (5.0%)       29.15      (5.3%)    0.6% (  -9% -   11%) 0.695
                    OrNotHighLow      679.21      (2.7%)      683.68      (4.1%)    0.7% (  -6% -    7%) 0.551
                         MedTerm     1425.49      (4.8%)     1435.41      (5.1%)    0.7% (  -8% -   11%) 0.657
                     MedSpanNear        8.74      (3.0%)        8.80      (2.8%)    0.7% (  -4% -    6%) 0.448
     BrowseRandomLabelTaxoFacets        6.11     (14.4%)        6.16     (15.2%)    0.7% ( -25% -   35%) 0.875
                   OrHighNotHigh      674.18      (4.1%)      679.40      (4.5%)    0.8% (  -7% -    9%) 0.569
                 LowSloppyPhrase        5.08      (3.3%)        5.12      (3.5%)    0.8% (  -5% -    7%) 0.445
                    HighSpanNear        2.22      (5.4%)        2.25      (4.2%)    1.3% (  -7% -   11%) 0.398
                HighSloppyPhrase        5.27      (7.8%)        5.34      (9.0%)    1.3% ( -14% -   19%) 0.622
             LowIntervalsOrdered       17.88      (4.8%)       18.21      (3.1%)    1.9% (  -5% -   10%) 0.144
            BrowseDateTaxoFacets        6.51     (14.4%)        6.65     (17.4%)    2.3% ( -25% -   39%) 0.652
       BrowseDayOfYearTaxoFacets        6.52     (14.4%)        6.68     (17.7%)    2.5% ( -25% -   40%) 0.624
             MedIntervalsOrdered       14.43      (7.8%)       14.80      (4.5%)    2.6% (  -9% -   16%) 0.205
                       OrHighLow      158.48      (3.2%)      162.94      (4.2%)    2.8% (  -4% -   10%) 0.017
            HighIntervalsOrdered        1.56      (9.4%)        1.60      (5.2%)    3.0% ( -10% -   19%) 0.215
                       OrHighMed       65.32      (4.2%)       71.62      (4.1%)    9.6% (   1% -   18%) 0.000
                      OrHighHigh       14.04      (4.5%)       15.68      (3.9%)   11.7% (   3% -   21%) 0.000

WANDScorer performs better without two-phase

Details

Description

Attachments

Activity

People

Dates