Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-10639

WANDScorer performs better without two-phase

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • core/search
    • None
    • New

    Description

      After looking at the recent improvement jpountz made to WAND scoring in LUCENE-10634, which does additional work during match confirmation to not confirm a match who's score wouldn't be competitive, I wanted to see how performance would shift if we squashed the two-phase iteration completely and only returned true matches (that were also known to be competitive by score) in the "approximation" phase. I was a bit surprised to find that luceneutil benchmarks (run with wikimediumall), improves significantly on some disjunction tasks and doesn't show significant regressions anywhere else.

      Note that I used LUCENE-10634 as a baseline, and built my candidate change on top of that. The diff can be seen here: DIFF

      A simple conclusion here might be that we shouldn't do two-phase iteration in WANDScorer, but I'm pretty sure that's not right. I wonder if what's really going on is that we're under-estimating the cost of confirming a match? Right now we just return the tail size as the cost. While the cost of confirming a match is proportional to the tail size, the actual work involved can be quite significant (having to advance tail iterators to new blocks and decompress them). I wonder if the WAND second phase is being run too early on approximate candidates, and if less-expensive, (and even possibly more restrictive?), second phases could/should be running first?

      I'm raising this here as more of a curiosity to see if it sparks ideas on how to move forward. Again, I'm not proposing we do away with two-phase iteration, but it seems we might be able to improve things. Maybe I'll explore changing the cost heuristic next. Also, maybe there's some different benchmarking that would be useful here that I may not be familiar with?

      Benchmark results on wikimediumall:

                                  TaskQPS baseline      StdDevQPS candidate      StdDev                Pct diff p-value
                  HighTermTitleBDVSort       22.52     (18.9%)       21.66     (15.6%)   -3.8% ( -32% -   37%) 0.485
                               Prefix3        9.38      (9.2%)        9.09     (10.6%)   -3.1% ( -20% -   18%) 0.326
                     HighTermMonthSort       25.37     (16.0%)       24.87     (17.1%)   -2.0% ( -30% -   37%) 0.710
                  MedTermDayTaxoFacets        9.62      (4.2%)        9.51      (4.1%)   -1.2% (  -9% -    7%) 0.368
                            TermDTSort       74.69     (18.0%)       74.13     (18.2%)   -0.7% ( -31% -   43%) 0.897
                 HighTermDayOfYearSort       52.64     (16.1%)       52.32     (15.4%)   -0.6% ( -27% -   36%) 0.903
                 BrowseMonthTaxoFacets        8.64     (19.1%)        8.59     (19.8%)   -0.6% ( -33% -   47%) 0.926
                  BrowseDateSSDVFacets        0.86      (9.5%)        0.86     (13.1%)   -0.4% ( -20% -   24%) 0.914
                              PKLookup      147.18      (3.9%)      146.66      (3.3%)   -0.3% (  -7% -    7%) 0.759
             BrowseDayOfYearSSDVFacets        3.47      (4.5%)        3.45      (4.8%)   -0.3% (  -9% -    9%) 0.822
                              Wildcard       36.36      (4.4%)       36.26      (5.2%)   -0.3% (  -9% -    9%) 0.866
                 BrowseMonthSSDVFacets        4.15     (12.7%)        4.13     (12.8%)   -0.3% ( -22% -   28%) 0.950
               AndHighMedDayTaxoFacets       15.21      (2.7%)       15.18      (2.9%)   -0.2% (  -5% -    5%) 0.819
                                Fuzzy1       68.33      (1.8%)       68.22      (2.0%)   -0.2% (  -3% -    3%) 0.783
                OrHighMedDayTaxoFacets        2.90      (4.1%)        2.89      (4.0%)   -0.1% (  -7% -    8%) 0.930
                             MedPhrase       52.81      (2.3%)       52.76      (1.8%)   -0.1% (  -4% -    4%) 0.878
                               Respell       36.80      (1.9%)       36.78      (1.9%)   -0.1% (  -3% -    3%) 0.933
                                Fuzzy2       63.06      (1.9%)       63.05      (2.1%)   -0.0% (  -3% -    4%) 0.971
                             LowPhrase       74.60      (1.9%)       74.61      (1.8%)    0.0% (  -3% -    3%) 0.987
              AndHighHighDayTaxoFacets        4.54      (2.3%)        4.55      (2.0%)    0.0% (  -4% -    4%) 0.960
                            HighPhrase      353.13      (2.6%)      353.28      (2.5%)    0.0% (  -4% -    5%) 0.958
                         OrNotHighHigh      761.72      (4.0%)      762.48      (3.6%)    0.1% (  -7% -    8%) 0.935
                          OrHighNotLow     1129.94      (4.1%)     1131.56      (3.6%)    0.1% (  -7% -    8%) 0.906
                               LowTerm     1315.90      (2.9%)     1318.61      (2.5%)    0.2% (  -5% -    5%) 0.810
                                IntNRQ      192.33      (2.8%)      192.93      (2.3%)    0.3% (  -4% -    5%) 0.701
                           LowSpanNear       23.60      (2.2%)       23.68      (1.6%)    0.3% (  -3% -    4%) 0.592
                          OrNotHighMed      867.21      (2.3%)      870.27      (2.8%)    0.4% (  -4% -    5%) 0.664
           BrowseRandomLabelSSDVFacets        2.53      (1.6%)        2.54      (1.9%)    0.4% (  -3% -    3%) 0.494
                            AndHighMed      105.33      (4.5%)      105.83      (4.6%)    0.5% (  -8% -    9%) 0.739
                              HighTerm     1030.35      (5.7%)     1035.54      (5.9%)    0.5% ( -10% -   12%) 0.783
                       MedSloppyPhrase       41.07      (3.0%)       41.28      (2.9%)    0.5% (  -5% -    6%) 0.581
                            AndHighLow      287.51      (3.2%)      289.03      (4.3%)    0.5% (  -6% -    8%) 0.657
                          OrHighNotMed      910.71      (3.9%)      915.93      (4.1%)    0.6% (  -7% -    8%) 0.651
                           AndHighHigh       28.96      (5.0%)       29.15      (5.3%)    0.6% (  -9% -   11%) 0.695
                          OrNotHighLow      679.21      (2.7%)      683.68      (4.1%)    0.7% (  -6% -    7%) 0.551
                               MedTerm     1425.49      (4.8%)     1435.41      (5.1%)    0.7% (  -8% -   11%) 0.657
                           MedSpanNear        8.74      (3.0%)        8.80      (2.8%)    0.7% (  -4% -    6%) 0.448
           BrowseRandomLabelTaxoFacets        6.11     (14.4%)        6.16     (15.2%)    0.7% ( -25% -   35%) 0.875
                         OrHighNotHigh      674.18      (4.1%)      679.40      (4.5%)    0.8% (  -7% -    9%) 0.569
                       LowSloppyPhrase        5.08      (3.3%)        5.12      (3.5%)    0.8% (  -5% -    7%) 0.445
                          HighSpanNear        2.22      (5.4%)        2.25      (4.2%)    1.3% (  -7% -   11%) 0.398
                      HighSloppyPhrase        5.27      (7.8%)        5.34      (9.0%)    1.3% ( -14% -   19%) 0.622
                   LowIntervalsOrdered       17.88      (4.8%)       18.21      (3.1%)    1.9% (  -5% -   10%) 0.144
                  BrowseDateTaxoFacets        6.51     (14.4%)        6.65     (17.4%)    2.3% ( -25% -   39%) 0.652
             BrowseDayOfYearTaxoFacets        6.52     (14.4%)        6.68     (17.7%)    2.5% ( -25% -   40%) 0.624
                   MedIntervalsOrdered       14.43      (7.8%)       14.80      (4.5%)    2.6% (  -9% -   16%) 0.205
                             OrHighLow      158.48      (3.2%)      162.94      (4.2%)    2.8% (  -4% -   10%) 0.017
                  HighIntervalsOrdered        1.56      (9.4%)        1.60      (5.2%)    3.0% ( -10% -   19%) 0.215
                             OrHighMed       65.32      (4.2%)       71.62      (4.1%)    9.6% (   1% -   18%) 0.000
                            OrHighHigh       14.04      (4.5%)       15.68      (3.9%)   11.7% (   3% -   21%) 0.000
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            gsmiller Greg Miller
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: