Lucene - Core
  1. Lucene - Core
  2. LUCENE-6756

Give MatchAllDocsQuery a dedicated BulkScorer

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.4
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      MatchAllDocsQuery currently uses the default BulkScorer, which creates a Scorer and iterates over matching doc IDs up to NO_MORE_DOCS. I tried to build a dedicated BulkScorer, which seemed to help remove abstractions as it helped improve throughput by a ~2x factor with simple collectors.

      1. LUCENE-6756.patch
        1 kB
        Adrien Grand
      2. MABench.java
        2 kB
        Adrien Grand

        Activity

        Hide
        Adrien Grand added a comment -

        Here are a patch and the simplistic/non-realistic/terrible benchmark I used.

        Show
        Adrien Grand added a comment - Here are a patch and the simplistic/non-realistic/terrible benchmark I used.
        Hide
        Robert Muir added a comment -

        Can you check that the specialization does not make hotspot crazy? AFAIK its already crazy around this stuff...

        Show
        Robert Muir added a comment - Can you check that the specialization does not make hotspot crazy? AFAIK its already crazy around this stuff...
        Hide
        Adrien Grand added a comment -

        I added a MatchAll task to wikimedium1m and hotspot looks happy:

                            TaskQPS baseline      StdDev   QPS patch      StdDev                Pct diff
                          Fuzzy2      103.55     (32.6%)       95.61     (35.5%)   -7.7% ( -57% -   89%)
                          Fuzzy1      139.81     (13.1%)      132.03     (17.0%)   -5.6% ( -31% -   28%)
                         Prefix3      374.46      (8.7%)      368.62      (7.4%)   -1.6% ( -16% -   15%)
                       OrHighLow      322.32      (7.0%)      320.66      (5.9%)   -0.5% ( -12% -   13%)
                       OrHighMed      257.31      (8.7%)      256.59      (4.7%)   -0.3% ( -12% -   14%)
                      OrHighHigh      202.24      (8.1%)      201.80      (6.2%)   -0.2% ( -13% -   15%)
                      HighPhrase      155.66      (4.3%)      155.48      (5.2%)   -0.1% (  -9% -    9%)
                     LowSpanNear      200.83      (5.5%)      200.68      (4.5%)   -0.1% (  -9% -   10%)
                      AndHighLow     1806.85      (5.2%)     1806.05      (8.9%)   -0.0% ( -13% -   14%)
                        HighTerm      573.21      (7.8%)      573.11      (6.6%)   -0.0% ( -13% -   15%)
                 LowSloppyPhrase      132.99      (4.6%)      132.98      (5.7%)   -0.0% (  -9% -   10%)
                     AndHighHigh      401.82      (4.2%)      402.76      (4.3%)    0.2% (  -7% -    9%)
                HighSloppyPhrase      271.61      (5.7%)      273.46      (7.3%)    0.7% ( -11% -   14%)
                    HighSpanNear      107.11      (6.2%)      107.85      (5.2%)    0.7% ( -10% -   12%)
                       MedPhrase      186.57      (4.5%)      187.88      (4.9%)    0.7% (  -8% -   10%)
                       LowPhrase      402.46      (4.4%)      406.53      (3.5%)    1.0% (  -6% -    9%)
                 MedSloppyPhrase      233.49      (5.0%)      236.66      (3.4%)    1.4% (  -6% -   10%)
                         MedTerm     1278.37      (8.9%)     1302.62      (6.4%)    1.9% ( -12% -   18%)
                        Wildcard      339.31      (8.8%)      346.33      (6.5%)    2.1% ( -12% -   19%)
                         Respell      152.28      (9.2%)      155.51      (8.8%)    2.1% ( -14% -   22%)
                      AndHighMed      396.54      (8.1%)      407.13      (3.7%)    2.7% (  -8% -   15%)
                     MedSpanNear      565.97      (6.9%)      581.61      (5.3%)    2.8% (  -8% -   16%)
                         LowTerm     3143.46     (14.2%)     3244.12      (8.8%)    3.2% ( -17% -   30%)
                          IntNRQ       90.11     (11.4%)       93.16      (8.0%)    3.4% ( -14% -   25%)
                        MatchAll      117.18      (3.7%)      211.95     (30.9%)   80.9% (  44% -  119%)
        

        The fuzzy queries are a bit off but I see a lot of variance with these queries anyway, even without the change.

        Show
        Adrien Grand added a comment - I added a MatchAll task to wikimedium1m and hotspot looks happy: TaskQPS baseline StdDev QPS patch StdDev Pct diff Fuzzy2 103.55 (32.6%) 95.61 (35.5%) -7.7% ( -57% - 89%) Fuzzy1 139.81 (13.1%) 132.03 (17.0%) -5.6% ( -31% - 28%) Prefix3 374.46 (8.7%) 368.62 (7.4%) -1.6% ( -16% - 15%) OrHighLow 322.32 (7.0%) 320.66 (5.9%) -0.5% ( -12% - 13%) OrHighMed 257.31 (8.7%) 256.59 (4.7%) -0.3% ( -12% - 14%) OrHighHigh 202.24 (8.1%) 201.80 (6.2%) -0.2% ( -13% - 15%) HighPhrase 155.66 (4.3%) 155.48 (5.2%) -0.1% ( -9% - 9%) LowSpanNear 200.83 (5.5%) 200.68 (4.5%) -0.1% ( -9% - 10%) AndHighLow 1806.85 (5.2%) 1806.05 (8.9%) -0.0% ( -13% - 14%) HighTerm 573.21 (7.8%) 573.11 (6.6%) -0.0% ( -13% - 15%) LowSloppyPhrase 132.99 (4.6%) 132.98 (5.7%) -0.0% ( -9% - 10%) AndHighHigh 401.82 (4.2%) 402.76 (4.3%) 0.2% ( -7% - 9%) HighSloppyPhrase 271.61 (5.7%) 273.46 (7.3%) 0.7% ( -11% - 14%) HighSpanNear 107.11 (6.2%) 107.85 (5.2%) 0.7% ( -10% - 12%) MedPhrase 186.57 (4.5%) 187.88 (4.9%) 0.7% ( -8% - 10%) LowPhrase 402.46 (4.4%) 406.53 (3.5%) 1.0% ( -6% - 9%) MedSloppyPhrase 233.49 (5.0%) 236.66 (3.4%) 1.4% ( -6% - 10%) MedTerm 1278.37 (8.9%) 1302.62 (6.4%) 1.9% ( -12% - 18%) Wildcard 339.31 (8.8%) 346.33 (6.5%) 2.1% ( -12% - 19%) Respell 152.28 (9.2%) 155.51 (8.8%) 2.1% ( -14% - 22%) AndHighMed 396.54 (8.1%) 407.13 (3.7%) 2.7% ( -8% - 15%) MedSpanNear 565.97 (6.9%) 581.61 (5.3%) 2.8% ( -8% - 16%) LowTerm 3143.46 (14.2%) 3244.12 (8.8%) 3.2% ( -17% - 30%) IntNRQ 90.11 (11.4%) 93.16 (8.0%) 3.4% ( -14% - 25%) MatchAll 117.18 (3.7%) 211.95 (30.9%) 80.9% ( 44% - 119%) The fuzzy queries are a bit off but I see a lot of variance with these queries anyway, even without the change.
        Hide
        Robert Muir added a comment -

        +1

        Show
        Robert Muir added a comment - +1
        Hide
        ASF subversion and git services added a comment -

        Commit 1700437 from Adrien Grand in branch 'dev/trunk'
        [ https://svn.apache.org/r1700437 ]

        LUCENE-6756: MatchAllDocsQuery now has a dedicated BulkScorer.

        Show
        ASF subversion and git services added a comment - Commit 1700437 from Adrien Grand in branch 'dev/trunk' [ https://svn.apache.org/r1700437 ] LUCENE-6756 : MatchAllDocsQuery now has a dedicated BulkScorer.
        Hide
        ASF subversion and git services added a comment -

        Commit 1700449 from Adrien Grand in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1700449 ]

        LUCENE-6756: MatchAllDocsQuery now has a dedicated BulkScorer.

        Show
        ASF subversion and git services added a comment - Commit 1700449 from Adrien Grand in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1700449 ] LUCENE-6756 : MatchAllDocsQuery now has a dedicated BulkScorer.

          People

          • Assignee:
            Adrien Grand
            Reporter:
            Adrien Grand
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development