Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: master (7.0), 6.2
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I am digging into some performance regressions between 4.x and 5.x which seem to be due to how we always run conjunctions with ConjunctionDISI now while 4.x had FilteredQuery, which was optimized for the case that there are only two clauses or that one of the clause supports random access. I'd like to explore the former in this issue.

      1. LUCENE-7330.patch
        19 kB
        Adrien Grand

        Activity

        Hide
        jpountz Adrien Grand added a comment -

        Here is a patch. It speeds up conjunctions thanks to 2 changes:

        First it removes the 'if (doc == NO_MORE_DOCS) return NO_MORE_DOCS;' at the top of doNext(). This was needed because TwoPhaseConjunctionDISI extended ConjunctionDISI and it is illegal to call TwoPhaseIterator.matches() on NO_MORE_DOCS. I had to refactor a bit how the two-phase iterator is exposed but I don't think it makes things more complicated.

        Second, it adds a special case for the second least costly iterator so that we do not have to check whether it is already on the same document as the 'lead'. If you look at the impl of doNext, we currently have to protect the call to 'other.advance()' under a 'if (other.docID() < doc)', but we can actually avoid it for the 2nd least costly iterator without changing the order in which iterators are invoked.

        luceneutil reports the following numbers on wikimedium10m, there seems to be a noticeable gain for conjunction-based queries (And*, Span and *Phrase):

                            TaskQPS baseline      StdDev   QPS patch      StdDev                Pct diff
                    OrHighNotLow      128.17      (9.1%)      126.04      (8.6%)   -1.7% ( -17% -   17%)
                      OrHighHigh       14.75      (6.5%)       14.54      (5.8%)   -1.4% ( -12% -   11%)
                       OrHighMed       66.53      (6.2%)       65.65      (5.8%)   -1.3% ( -12% -   11%)
                       OrHighLow       85.42      (7.3%)       84.51      (6.7%)   -1.1% ( -14% -   13%)
                          Fuzzy1       68.08     (10.9%)       67.37     (10.2%)   -1.0% ( -19% -   22%)
                    OrHighNotMed      133.66      (8.5%)      132.33      (7.5%)   -1.0% ( -15% -   16%)
                   OrHighNotHigh       64.83      (4.6%)       64.36      (4.3%)   -0.7% (  -9% -    8%)
                    OrNotHighLow     1150.80      (3.1%)     1144.91      (3.4%)   -0.5% (  -6% -    6%)
                          Fuzzy2       61.60     (22.2%)       61.31     (14.0%)   -0.5% ( -30% -   46%)
                   OrNotHighHigh       22.30      (2.7%)       22.23      (2.6%)   -0.3% (  -5% -    5%)
                    OrNotHighMed      155.90      (2.4%)      155.74      (2.7%)   -0.1% (  -5% -    5%)
                         Respell       94.52      (1.9%)       94.69      (1.9%)    0.2% (  -3% -    4%)
                        Wildcard       66.04      (4.6%)       66.50      (4.4%)    0.7% (  -7% -   10%)
                         Prefix3      104.62      (4.7%)      105.54      (4.3%)    0.9% (  -7% -   10%)
                        HighTerm       98.37      (5.3%)       99.65      (4.5%)    1.3% (  -8% -   11%)
                      AndHighLow      612.09      (3.0%)      620.90      (2.6%)    1.4% (  -3% -    7%)
                         MedTerm      237.97      (4.9%)      241.93      (4.4%)    1.7% (  -7% -   11%)
                          IntNRQ       18.72      (9.4%)       19.05      (7.7%)    1.7% ( -13% -   20%)
                 LowSloppyPhrase      108.80      (1.7%)      111.16      (2.2%)    2.2% (  -1% -    6%)
                       MedPhrase      100.85      (2.2%)      103.08      (2.1%)    2.2% (  -2% -    6%)
                     MedSpanNear       71.08      (2.2%)       73.09      (2.2%)    2.8% (  -1% -    7%)
                         LowTerm      623.38      (9.5%)      641.55      (7.7%)    2.9% ( -12% -   22%)
                      HighPhrase       35.36      (3.2%)       36.42      (3.0%)    3.0% (  -3% -    9%)
                     LowSpanNear       92.47      (2.9%)       95.41      (2.8%)    3.2% (  -2% -    9%)
                HighSloppyPhrase       31.99      (4.9%)       33.09      (4.8%)    3.5% (  -5% -   13%)
                      AndHighMed      223.42      (1.6%)      231.21      (1.9%)    3.5% (   0% -    7%)
                 MedSloppyPhrase       43.07      (2.5%)       45.13      (2.2%)    4.8% (   0% -    9%)
                    HighSpanNear       28.57      (2.9%)       29.95      (3.6%)    4.8% (  -1% -   11%)
                     AndHighHigh       74.55      (1.0%)       78.39      (1.6%)    5.2% (   2% -    7%)
                       LowPhrase       19.97      (2.5%)       21.04      (2.9%)    5.4% (   0% -   10%)
        
        Show
        jpountz Adrien Grand added a comment - Here is a patch. It speeds up conjunctions thanks to 2 changes: First it removes the 'if (doc == NO_MORE_DOCS) return NO_MORE_DOCS;' at the top of doNext(). This was needed because TwoPhaseConjunctionDISI extended ConjunctionDISI and it is illegal to call TwoPhaseIterator.matches() on NO_MORE_DOCS. I had to refactor a bit how the two-phase iterator is exposed but I don't think it makes things more complicated. Second, it adds a special case for the second least costly iterator so that we do not have to check whether it is already on the same document as the 'lead'. If you look at the impl of doNext, we currently have to protect the call to 'other.advance()' under a 'if (other.docID() < doc)', but we can actually avoid it for the 2nd least costly iterator without changing the order in which iterators are invoked. luceneutil reports the following numbers on wikimedium10m, there seems to be a noticeable gain for conjunction-based queries (And*, Span and *Phrase): TaskQPS baseline StdDev QPS patch StdDev Pct diff OrHighNotLow 128.17 (9.1%) 126.04 (8.6%) -1.7% ( -17% - 17%) OrHighHigh 14.75 (6.5%) 14.54 (5.8%) -1.4% ( -12% - 11%) OrHighMed 66.53 (6.2%) 65.65 (5.8%) -1.3% ( -12% - 11%) OrHighLow 85.42 (7.3%) 84.51 (6.7%) -1.1% ( -14% - 13%) Fuzzy1 68.08 (10.9%) 67.37 (10.2%) -1.0% ( -19% - 22%) OrHighNotMed 133.66 (8.5%) 132.33 (7.5%) -1.0% ( -15% - 16%) OrHighNotHigh 64.83 (4.6%) 64.36 (4.3%) -0.7% ( -9% - 8%) OrNotHighLow 1150.80 (3.1%) 1144.91 (3.4%) -0.5% ( -6% - 6%) Fuzzy2 61.60 (22.2%) 61.31 (14.0%) -0.5% ( -30% - 46%) OrNotHighHigh 22.30 (2.7%) 22.23 (2.6%) -0.3% ( -5% - 5%) OrNotHighMed 155.90 (2.4%) 155.74 (2.7%) -0.1% ( -5% - 5%) Respell 94.52 (1.9%) 94.69 (1.9%) 0.2% ( -3% - 4%) Wildcard 66.04 (4.6%) 66.50 (4.4%) 0.7% ( -7% - 10%) Prefix3 104.62 (4.7%) 105.54 (4.3%) 0.9% ( -7% - 10%) HighTerm 98.37 (5.3%) 99.65 (4.5%) 1.3% ( -8% - 11%) AndHighLow 612.09 (3.0%) 620.90 (2.6%) 1.4% ( -3% - 7%) MedTerm 237.97 (4.9%) 241.93 (4.4%) 1.7% ( -7% - 11%) IntNRQ 18.72 (9.4%) 19.05 (7.7%) 1.7% ( -13% - 20%) LowSloppyPhrase 108.80 (1.7%) 111.16 (2.2%) 2.2% ( -1% - 6%) MedPhrase 100.85 (2.2%) 103.08 (2.1%) 2.2% ( -2% - 6%) MedSpanNear 71.08 (2.2%) 73.09 (2.2%) 2.8% ( -1% - 7%) LowTerm 623.38 (9.5%) 641.55 (7.7%) 2.9% ( -12% - 22%) HighPhrase 35.36 (3.2%) 36.42 (3.0%) 3.0% ( -3% - 9%) LowSpanNear 92.47 (2.9%) 95.41 (2.8%) 3.2% ( -2% - 9%) HighSloppyPhrase 31.99 (4.9%) 33.09 (4.8%) 3.5% ( -5% - 13%) AndHighMed 223.42 (1.6%) 231.21 (1.9%) 3.5% ( 0% - 7%) MedSloppyPhrase 43.07 (2.5%) 45.13 (2.2%) 4.8% ( 0% - 9%) HighSpanNear 28.57 (2.9%) 29.95 (3.6%) 4.8% ( -1% - 11%) AndHighHigh 74.55 (1.0%) 78.39 (1.6%) 5.2% ( 2% - 7%) LowPhrase 19.97 (2.5%) 21.04 (2.9%) 5.4% ( 0% - 10%)
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 4a02813e2eec9ba5093b0e8f285e14b68b07051b in lucene-solr's branch refs/heads/branch_6x from Adrien Grand
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4a02813 ]

        LUCENE-7330: Speed up conjunctions.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 4a02813e2eec9ba5093b0e8f285e14b68b07051b in lucene-solr's branch refs/heads/branch_6x from Adrien Grand [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4a02813 ] LUCENE-7330 : Speed up conjunctions.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 72914198e60dcaa2008f6945e53e36e1c0053078 in lucene-solr's branch refs/heads/master from Adrien Grand
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7291419 ]

        LUCENE-7330: Speed up conjunctions.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 72914198e60dcaa2008f6945e53e36e1c0053078 in lucene-solr's branch refs/heads/master from Adrien Grand [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7291419 ] LUCENE-7330 : Speed up conjunctions.
        Hide
        jpountz Adrien Grand added a comment -

        Nightly benchmarks seem to confirm the speedup is real: http://people.apache.org/~mikemccand/lucenebench/AndHighHigh.html

        Show
        jpountz Adrien Grand added a comment - Nightly benchmarks seem to confirm the speedup is real: http://people.apache.org/~mikemccand/lucenebench/AndHighHigh.html
        Hide
        mikemccand Michael McCandless added a comment -

        Bulk close resolved issues after 6.2.0 release.

        Show
        mikemccand Michael McCandless added a comment - Bulk close resolved issues after 6.2.0 release.

          People

          • Assignee:
            jpountz Adrien Grand
            Reporter:
            jpountz Adrien Grand
          • Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development