Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-10634

Speed up WANDScorer by computing scores before advancing tail scorers

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • None
    • None
    • New

    Description

      While looking at performance numbers on LUCENE-10480, I noticed that it is often faster to compute a score in order to finer-grained estimation of the best score that the current document can possibly get before advancing a tail scorer.

      Making this change to WANDScorer yielded a small but reproducible speedup:

                                  TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                                IntNRQ      186.50     (11.8%)      175.34     (19.1%)   -6.0% ( -33% -   28%) 0.234
                  HighTermTitleBDVSort      167.27     (20.6%)      161.85     (17.2%)   -3.2% ( -34% -   43%) 0.591
                       MedSloppyPhrase      194.77      (5.5%)      190.45      (7.8%)   -2.2% ( -14% -   11%) 0.299
                 HighTermDayOfYearSort      229.61      (7.7%)      225.74      (7.1%)   -1.7% ( -15% -   14%) 0.471
                       LowSloppyPhrase       20.22      (4.3%)       19.95      (4.8%)   -1.3% ( -10% -    8%) 0.366
                            TermDTSort      319.62      (7.7%)      316.78      (7.5%)   -0.9% ( -14% -   15%) 0.712
                          OrHighNotLow     1856.44      (5.6%)     1842.88      (5.7%)   -0.7% ( -11% -   11%) 0.682
                      AndMedOrHighHigh       73.87      (3.8%)       73.51      (3.6%)   -0.5% (  -7% -    7%) 0.677
                         OrHighNotHigh     2000.56      (5.6%)     1991.65      (6.9%)   -0.4% ( -12% -   12%) 0.823
                             LowPhrase      106.90      (2.4%)      106.61      (2.9%)   -0.3% (  -5% -    5%) 0.750
                            AndHighLow     1661.80      (3.5%)     1658.56      (3.7%)   -0.2% (  -7% -    7%) 0.865
                                Fuzzy2      110.64      (1.8%)      110.43      (1.9%)   -0.2% (  -3% -    3%) 0.752
                     HighTermMonthSort       73.74     (17.5%)       73.68     (20.8%)   -0.1% ( -32% -   46%) 0.989
                              PKLookup      242.86      (1.8%)      242.75      (1.8%)   -0.0% (  -3% -    3%) 0.934
                          OrHighNotMed     1454.98      (5.3%)     1456.26      (5.8%)    0.1% ( -10% -   11%) 0.960
                            HighPhrase      523.22      (2.9%)      524.01      (2.6%)    0.2% (  -5% -    5%) 0.862
                             MedPhrase      140.65      (2.7%)      140.87      (2.9%)    0.2% (  -5% -    5%) 0.862
                      HighSloppyPhrase        8.74      (4.6%)        8.75      (5.5%)    0.2% (  -9% -   10%) 0.914
                           LowSpanNear       28.05      (3.6%)       28.14      (3.0%)    0.3% (  -6% -    7%) 0.777
                           MedSpanNear        7.59      (3.5%)        7.61      (3.4%)    0.3% (  -6% -    7%) 0.778
                               Respell       67.62      (1.9%)       67.82      (1.8%)    0.3% (  -3% -    4%) 0.595
                 OrAndHigMedAndHighMed      127.87      (3.1%)      128.27      (4.0%)    0.3% (  -6% -    7%) 0.780
                          OrNotHighLow     1513.24      (2.1%)     1520.33      (2.6%)    0.5% (  -4% -    5%) 0.528
                OrHighPhraseHighPhrase       25.26      (3.0%)       25.38      (3.0%)    0.5% (  -5% -    6%) 0.616
                          OrNotHighMed     1544.04      (4.5%)     1552.26      (4.2%)    0.5% (  -7% -    9%) 0.697
                           AndHighHigh       92.24      (4.8%)       92.79      (6.6%)    0.6% ( -10% -   12%) 0.744
                            AndHighMed      420.42      (3.1%)      423.19      (5.2%)    0.7% (  -7% -    9%) 0.624
                                Fuzzy1      117.42      (1.9%)      118.19      (2.2%)    0.7% (  -3% -    4%) 0.307
                               MedTerm     2209.36      (4.6%)     2224.54      (5.3%)    0.7% (  -8% -   11%) 0.661
                   MedIntervalsOrdered      124.18      (8.1%)      125.12      (8.0%)    0.8% ( -14% -   18%) 0.767
                         OrNotHighHigh     1239.43      (4.6%)     1249.63      (4.8%)    0.8% (  -8% -   10%) 0.580
                       AndHighOrMedMed       95.02      (4.3%)       95.82      (3.8%)    0.8% (  -6% -    9%) 0.515
                              Wildcard      315.22     (23.3%)      317.98     (22.5%)    0.9% ( -36% -   60%) 0.904
                               LowTerm     2775.81      (4.0%)     2808.32      (5.2%)    1.2% (  -7% -   10%) 0.425
                  HighIntervalsOrdered       14.24      (8.0%)       14.41      (8.4%)    1.2% ( -14% -   19%) 0.646
                   LowIntervalsOrdered      120.62      (5.8%)      122.09      (6.6%)    1.2% ( -10% -   14%) 0.534
                          HighSpanNear       39.04      (6.7%)       39.71      (4.3%)    1.7% (  -8% -   13%) 0.332
                               Prefix3       80.25      (5.1%)       81.70      (3.3%)    1.8% (  -6% -   10%) 0.187
                              HighTerm     3635.73      (6.0%)     3720.39      (6.5%)    2.3% (  -9% -   15%) 0.240
                             OrHighLow      860.22      (3.7%)      882.88      (3.4%)    2.6% (  -4% -   10%) 0.019
                             OrHighMed       91.61      (3.9%)       94.40      (4.1%)    3.1% (  -4% -   11%) 0.016
                            OrHighHigh       55.17      (3.7%)       57.09      (4.1%)    3.5% (  -4% -   11%) 0.005
                          OrHighMedMed      172.38      (5.0%)      178.92      (6.0%)    3.8% (  -6% -   15%) 0.029
                         OrHighHighMed       68.63      (4.5%)       72.66      (5.3%)    5.9% (  -3% -   16%) 0.000
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jpountz Adrien Grand
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m