[LUCENE-8432] Stop calling comparator even if early termination is not possible - ASF JIRA

Details

Type: Improvement
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: 7.3
Fix Version/s: 7.5, 8.0
Component/s: core/search
Labels:
None

Lucene Fields:

New

Description

TopFieldCollector continues calling comparator.compareBottom even if result is known in advance due to document order when trackMaxScore or trackTotalHits is set.

Comparator call is not very cheap because it can involve DV read from disk and all calls can be avoided after first non competitive segment document is reached.

There is a patch and luceneutil report on wikimedium10m sorted by DayOfYear:

                    TaskQPS baseline      StdDev   QPS patch      StdDev                Pct diff
       HighTermMonthSort      226.04      (6.3%)      215.33      (4.3%)   -4.7% ( -14% -    6%)
                 LowTerm      933.27      (5.5%)      924.62      (4.2%)   -0.9% ( -10% -    9%)
            OrNotHighLow      945.68      (5.7%)      939.12      (4.5%)   -0.7% ( -10% -   10%)
             MedSpanNear       28.76      (1.4%)       28.61      (1.5%)   -0.5% (  -3% -    2%)
BrowseDayOfYearSSDVFacets       16.36      (5.0%)       16.29      (4.5%)   -0.4% (  -9% -    9%)
              AndHighMed      112.30      (2.9%)      111.96      (1.6%)   -0.3% (  -4% -    4%)
             LowSpanNear       12.42      (1.5%)       12.38      (1.6%)   -0.3% (  -3% -    2%)
        HighSloppyPhrase       18.66      (3.9%)       18.62      (4.0%)   -0.2% (  -7% -    7%)
               MedPhrase      219.40      (2.7%)      219.06      (2.7%)   -0.2% (  -5% -    5%)
            OrNotHighMed      222.88      (3.2%)      222.63      (3.4%)   -0.1% (  -6% -    6%)
              AndHighLow      521.59      (3.5%)      521.02      (4.5%)   -0.1% (  -7% -    8%)
         MedSloppyPhrase       16.71      (4.7%)       16.70      (4.7%)   -0.0% (  -8% -    9%)
               LowPhrase       15.58      (2.5%)       15.59      (2.9%)    0.0% (  -5% -    5%)
                 Respell       92.05      (2.4%)       92.19      (3.0%)    0.2% (  -5% -    5%)
            HighSpanNear       17.03      (2.2%)       17.06      (2.1%)    0.2% (  -4% -    4%)
              HighPhrase       37.85      (5.8%)       37.92      (5.9%)    0.2% ( -10% -   12%)
            OrHighNotLow      118.25      (2.9%)      118.47      (3.5%)    0.2% (  -6% -    6%)
   BrowseMonthTaxoFacets        2.94      (0.4%)        2.94      (0.8%)    0.2% (   0% -    1%)
    BrowseDateTaxoFacets        2.75      (0.3%)        2.75      (1.6%)    0.3% (  -1% -    2%)
         LowSloppyPhrase      105.28      (2.3%)      105.60      (2.5%)    0.3% (  -4% -    5%)
                 Prefix3      122.07      (6.8%)      122.55      (6.5%)    0.4% ( -12% -   14%)
           OrNotHighHigh       55.07      (3.8%)       55.29      (4.5%)    0.4% (  -7% -    8%)
   BrowseMonthSSDVFacets       20.88      (7.2%)       20.99      (7.5%)    0.5% ( -13% -   16%)
           OrHighNotHigh       58.40      (4.2%)       58.72      (4.8%)    0.6% (  -8% -    9%)
                Wildcard       79.87      (3.7%)       80.31      (4.0%)    0.6% (  -6% -    8%)
               OrHighMed       13.25      (4.3%)       13.34      (4.9%)    0.6% (  -8% -   10%)
BrowseDayOfYearTaxoFacets        2.73      (0.6%)        2.75      (1.6%)    0.7% (  -1% -    2%)
              OrHighHigh       22.03      (4.1%)       22.19      (4.9%)    0.7% (  -8% -   10%)
             AndHighHigh       23.46      (2.1%)       23.63      (1.9%)    0.7% (  -3% -    4%)
                PKLookup      145.59      (4.2%)      146.66      (4.3%)    0.7% (  -7% -    9%)
                 MedTerm      171.13      (5.0%)      172.43      (5.1%)    0.8% (  -8% -   11%)
               OrHighLow      119.22      (2.8%)      120.23      (3.1%)    0.8% (  -4% -    6%)
            OrHighNotMed       87.06      (3.7%)       87.80      (4.1%)    0.8% (  -6% -    8%)
                  IntNRQ       26.44     (12.8%)       26.68     (11.5%)    0.9% ( -20% -   28%)
                HighTerm      107.64      (6.1%)      108.88      (5.6%)    1.2% (  -9% -   13%)
                  Fuzzy2       69.69     (10.7%)       71.64      (7.4%)    2.8% ( -13% -   23%)
                  Fuzzy1       53.95      (6.5%)       55.79      (6.2%)    3.4% (  -8% -   17%)
   HighTermDayOfYearSort       19.71      (4.7%)       21.51      (7.1%)    9.1% (  -2% -   21%)

Unfortunately, luceneutil shows regression on non index sort match sorting (HighTermMonthSort). I can't reproduce the regression on any real case, but I'm afraid my benchmarks isn't quite accurate.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-8432.patch
24/Aug/18 12:14
4 kB
Adrien Grand
LUCENE-8432.patch
27/Jul/18 11:10
5 kB
Nikolay Khitrin

Issue Links

is related to

LUCENE-8434 Use shared instance of CollectionTerminatedException

Resolved

Stop calling comparator even if early termination is not possible

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates