Lucene - Core
  1. Lucene - Core
  2. LUCENE-6275

SloppyPhraseScorer should use ConjunctionDISI

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.1, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Currently, this guy has his own little built-in algorithm, which doesn't seem optimal to me. It might be better if it reused ConjunctionDISI like ExactPhraseScorer does.

      1. LUCENE-6275.patch
        10 kB
        Adrien Grand

        Activity

        Hide
        Adrien Grand added a comment -

        Here is a patch which yields a modest speedup with sloppy phrase queries:

                            TaskQPS baseline      StdDev   QPS patch      StdDev                Pct diff
                     AndHighHigh       51.02      (2.3%)       50.50      (1.5%)   -1.0% (  -4% -    2%)
                      AndHighMed       84.88      (2.2%)       84.34      (1.8%)   -0.6% (  -4% -    3%)
                        PKLookup      268.94      (2.9%)      268.67      (2.9%)   -0.1% (  -5% -    5%)
                          IntNRQ       16.03      (6.3%)       16.05      (4.8%)    0.2% ( -10% -   11%)
                    OrNotHighMed      171.70      (3.2%)      172.39      (2.2%)    0.4% (  -4% -    6%)
                         LowTerm      855.24      (6.1%)      859.47      (4.0%)    0.5% (  -9% -   11%)
                    OrHighNotLow       35.75      (2.1%)       35.94      (1.4%)    0.5% (  -2% -    4%)
                       OrHighLow       17.66      (9.7%)       17.75      (8.8%)    0.5% ( -16% -   21%)
                         Respell       53.38      (6.0%)       53.69      (4.6%)    0.6% (  -9% -   11%)
                         Prefix3       31.75      (6.3%)       31.94      (5.3%)    0.6% ( -10% -   12%)
                   OrHighNotHigh       50.56      (2.2%)       50.88      (1.2%)    0.6% (  -2% -    4%)
                      OrHighHigh       39.59     (10.5%)       39.86      (8.6%)    0.7% ( -16% -   22%)
                     MedSpanNear       20.07      (2.4%)       20.21      (1.6%)    0.7% (  -3% -    4%)
                       OrHighMed       54.66      (9.3%)       55.04      (8.4%)    0.7% ( -15% -   20%)
                    OrHighNotMed       63.77      (2.3%)       64.21      (1.3%)    0.7% (  -2% -    4%)
                       LowPhrase       34.61      (3.2%)       34.86      (1.6%)    0.7% (  -4% -    5%)
                      HighPhrase       21.22      (2.5%)       21.37      (1.6%)    0.7% (  -3% -    4%)
                        HighTerm      116.09      (3.8%)      117.07      (2.4%)    0.9% (  -5% -    7%)
                         MedTerm      302.55      (3.6%)      305.14      (2.5%)    0.9% (  -5% -    7%)
                        Wildcard       84.58      (4.3%)       85.50      (3.0%)    1.1% (  -5% -    8%)
                    HighSpanNear       11.34      (3.3%)       11.47      (1.7%)    1.1% (  -3% -    6%)
                    OrNotHighLow      574.75      (5.3%)      581.11      (4.2%)    1.1% (  -7% -   11%)
                     LowSpanNear       17.90      (3.9%)       18.11      (1.8%)    1.1% (  -4% -    7%)
                      AndHighLow      746.65      (4.0%)      755.38      (3.3%)    1.2% (  -5% -    8%)
                   OrNotHighHigh       47.14      (4.1%)       47.76      (1.3%)    1.3% (  -3% -    7%)
                          Fuzzy1       85.35     (14.0%)       87.01      (6.9%)    1.9% ( -16% -   26%)
                       MedPhrase       90.08      (6.1%)       92.08      (3.8%)    2.2% (  -7% -   12%)
                 LowSloppyPhrase       75.57      (4.6%)       78.69      (4.5%)    4.1% (  -4% -   13%)
                HighSloppyPhrase       12.02      (4.1%)       12.63      (3.7%)    5.1% (  -2% -   13%)
                          Fuzzy2       55.98     (16.2%)       59.21     (16.2%)    5.8% ( -22% -   45%)
                 MedSloppyPhrase       35.37      (4.1%)       38.01      (2.8%)    7.5% (   0% -   14%)
        
        Show
        Adrien Grand added a comment - Here is a patch which yields a modest speedup with sloppy phrase queries: TaskQPS baseline StdDev QPS patch StdDev Pct diff AndHighHigh 51.02 (2.3%) 50.50 (1.5%) -1.0% ( -4% - 2%) AndHighMed 84.88 (2.2%) 84.34 (1.8%) -0.6% ( -4% - 3%) PKLookup 268.94 (2.9%) 268.67 (2.9%) -0.1% ( -5% - 5%) IntNRQ 16.03 (6.3%) 16.05 (4.8%) 0.2% ( -10% - 11%) OrNotHighMed 171.70 (3.2%) 172.39 (2.2%) 0.4% ( -4% - 6%) LowTerm 855.24 (6.1%) 859.47 (4.0%) 0.5% ( -9% - 11%) OrHighNotLow 35.75 (2.1%) 35.94 (1.4%) 0.5% ( -2% - 4%) OrHighLow 17.66 (9.7%) 17.75 (8.8%) 0.5% ( -16% - 21%) Respell 53.38 (6.0%) 53.69 (4.6%) 0.6% ( -9% - 11%) Prefix3 31.75 (6.3%) 31.94 (5.3%) 0.6% ( -10% - 12%) OrHighNotHigh 50.56 (2.2%) 50.88 (1.2%) 0.6% ( -2% - 4%) OrHighHigh 39.59 (10.5%) 39.86 (8.6%) 0.7% ( -16% - 22%) MedSpanNear 20.07 (2.4%) 20.21 (1.6%) 0.7% ( -3% - 4%) OrHighMed 54.66 (9.3%) 55.04 (8.4%) 0.7% ( -15% - 20%) OrHighNotMed 63.77 (2.3%) 64.21 (1.3%) 0.7% ( -2% - 4%) LowPhrase 34.61 (3.2%) 34.86 (1.6%) 0.7% ( -4% - 5%) HighPhrase 21.22 (2.5%) 21.37 (1.6%) 0.7% ( -3% - 4%) HighTerm 116.09 (3.8%) 117.07 (2.4%) 0.9% ( -5% - 7%) MedTerm 302.55 (3.6%) 305.14 (2.5%) 0.9% ( -5% - 7%) Wildcard 84.58 (4.3%) 85.50 (3.0%) 1.1% ( -5% - 8%) HighSpanNear 11.34 (3.3%) 11.47 (1.7%) 1.1% ( -3% - 6%) OrNotHighLow 574.75 (5.3%) 581.11 (4.2%) 1.1% ( -7% - 11%) LowSpanNear 17.90 (3.9%) 18.11 (1.8%) 1.1% ( -4% - 7%) AndHighLow 746.65 (4.0%) 755.38 (3.3%) 1.2% ( -5% - 8%) OrNotHighHigh 47.14 (4.1%) 47.76 (1.3%) 1.3% ( -3% - 7%) Fuzzy1 85.35 (14.0%) 87.01 (6.9%) 1.9% ( -16% - 26%) MedPhrase 90.08 (6.1%) 92.08 (3.8%) 2.2% ( -7% - 12%) LowSloppyPhrase 75.57 (4.6%) 78.69 (4.5%) 4.1% ( -4% - 13%) HighSloppyPhrase 12.02 (4.1%) 12.63 (3.7%) 5.1% ( -2% - 13%) Fuzzy2 55.98 (16.2%) 59.21 (16.2%) 5.8% ( -22% - 45%) MedSloppyPhrase 35.37 (4.1%) 38.01 (2.8%) 7.5% ( 0% - 14%)
        Hide
        Robert Muir added a comment -

        +1 this is fantastic.

        Show
        Robert Muir added a comment - +1 this is fantastic.
        Hide
        ASF subversion and git services added a comment -

        Commit 1661727 from Adrien Grand in branch 'dev/trunk'
        [ https://svn.apache.org/r1661727 ]

        LUCENE-6275: Use ConjunctionDISI in SloppyPhraseScorer.

        Show
        ASF subversion and git services added a comment - Commit 1661727 from Adrien Grand in branch 'dev/trunk' [ https://svn.apache.org/r1661727 ] LUCENE-6275 : Use ConjunctionDISI in SloppyPhraseScorer.
        Hide
        ASF subversion and git services added a comment -

        Commit 1661728 from Adrien Grand in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1661728 ]

        LUCENE-6275: Use ConjunctionDISI in SloppyPhraseScorer.

        Show
        ASF subversion and git services added a comment - Commit 1661728 from Adrien Grand in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1661728 ] LUCENE-6275 : Use ConjunctionDISI in SloppyPhraseScorer.
        Hide
        Timothy Potter added a comment -

        Bulk close after 5.1 release

        Show
        Timothy Potter added a comment - Bulk close after 5.1 release

          People

          • Assignee:
            Adrien Grand
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development