Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8432

Stop calling comparator even if early termination is not possible

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 7.3
    • Fix Version/s: 7.5, 8.0
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      TopFieldCollector continues calling comparator.compareBottom even if result is known in advance due to document order when trackMaxScore or trackTotalHits is set.

      Comparator call is not very cheap because it can involve DV read from disk and all calls can be avoided after first non competitive segment document is reached.

      There is a patch and luceneutil report on wikimedium10m sorted by DayOfYear:

                          TaskQPS baseline      StdDev   QPS patch      StdDev                Pct diff
             HighTermMonthSort      226.04      (6.3%)      215.33      (4.3%)   -4.7% ( -14% -    6%)
                       LowTerm      933.27      (5.5%)      924.62      (4.2%)   -0.9% ( -10% -    9%)
                  OrNotHighLow      945.68      (5.7%)      939.12      (4.5%)   -0.7% ( -10% -   10%)
                   MedSpanNear       28.76      (1.4%)       28.61      (1.5%)   -0.5% (  -3% -    2%)
      BrowseDayOfYearSSDVFacets       16.36      (5.0%)       16.29      (4.5%)   -0.4% (  -9% -    9%)
                    AndHighMed      112.30      (2.9%)      111.96      (1.6%)   -0.3% (  -4% -    4%)
                   LowSpanNear       12.42      (1.5%)       12.38      (1.6%)   -0.3% (  -3% -    2%)
              HighSloppyPhrase       18.66      (3.9%)       18.62      (4.0%)   -0.2% (  -7% -    7%)
                     MedPhrase      219.40      (2.7%)      219.06      (2.7%)   -0.2% (  -5% -    5%)
                  OrNotHighMed      222.88      (3.2%)      222.63      (3.4%)   -0.1% (  -6% -    6%)
                    AndHighLow      521.59      (3.5%)      521.02      (4.5%)   -0.1% (  -7% -    8%)
               MedSloppyPhrase       16.71      (4.7%)       16.70      (4.7%)   -0.0% (  -8% -    9%)
                     LowPhrase       15.58      (2.5%)       15.59      (2.9%)    0.0% (  -5% -    5%)
                       Respell       92.05      (2.4%)       92.19      (3.0%)    0.2% (  -5% -    5%)
                  HighSpanNear       17.03      (2.2%)       17.06      (2.1%)    0.2% (  -4% -    4%)
                    HighPhrase       37.85      (5.8%)       37.92      (5.9%)    0.2% ( -10% -   12%)
                  OrHighNotLow      118.25      (2.9%)      118.47      (3.5%)    0.2% (  -6% -    6%)
         BrowseMonthTaxoFacets        2.94      (0.4%)        2.94      (0.8%)    0.2% (   0% -    1%)
          BrowseDateTaxoFacets        2.75      (0.3%)        2.75      (1.6%)    0.3% (  -1% -    2%)
               LowSloppyPhrase      105.28      (2.3%)      105.60      (2.5%)    0.3% (  -4% -    5%)
                       Prefix3      122.07      (6.8%)      122.55      (6.5%)    0.4% ( -12% -   14%)
                 OrNotHighHigh       55.07      (3.8%)       55.29      (4.5%)    0.4% (  -7% -    8%)
         BrowseMonthSSDVFacets       20.88      (7.2%)       20.99      (7.5%)    0.5% ( -13% -   16%)
                 OrHighNotHigh       58.40      (4.2%)       58.72      (4.8%)    0.6% (  -8% -    9%)
                      Wildcard       79.87      (3.7%)       80.31      (4.0%)    0.6% (  -6% -    8%)
                     OrHighMed       13.25      (4.3%)       13.34      (4.9%)    0.6% (  -8% -   10%)
      BrowseDayOfYearTaxoFacets        2.73      (0.6%)        2.75      (1.6%)    0.7% (  -1% -    2%)
                    OrHighHigh       22.03      (4.1%)       22.19      (4.9%)    0.7% (  -8% -   10%)
                   AndHighHigh       23.46      (2.1%)       23.63      (1.9%)    0.7% (  -3% -    4%)
                      PKLookup      145.59      (4.2%)      146.66      (4.3%)    0.7% (  -7% -    9%)
                       MedTerm      171.13      (5.0%)      172.43      (5.1%)    0.8% (  -8% -   11%)
                     OrHighLow      119.22      (2.8%)      120.23      (3.1%)    0.8% (  -4% -    6%)
                  OrHighNotMed       87.06      (3.7%)       87.80      (4.1%)    0.8% (  -6% -    8%)
                        IntNRQ       26.44     (12.8%)       26.68     (11.5%)    0.9% ( -20% -   28%)
                      HighTerm      107.64      (6.1%)      108.88      (5.6%)    1.2% (  -9% -   13%)
                        Fuzzy2       69.69     (10.7%)       71.64      (7.4%)    2.8% ( -13% -   23%)
                        Fuzzy1       53.95      (6.5%)       55.79      (6.2%)    3.4% (  -8% -   17%)
         HighTermDayOfYearSort       19.71      (4.7%)       21.51      (7.1%)    9.1% (  -2% -   21%)

      Unfortunately, luceneutil shows regression on non index sort match sorting (HighTermMonthSort). I can't reproduce the regression on any real case, but I'm afraid my benchmarks isn't quite accurate.

        Attachments

        1. LUCENE-8432.patch
          5 kB
          Nikolay Khitrin
        2. LUCENE-8432.patch
          4 kB
          Adrien Grand

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                khitrin Nikolay Khitrin
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: