Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.0, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Out-of-order currently adds complexity that I would like to remove. Here is a selection of issues that come from out-of-order scoring.

      • lots of specializations with collectors: we have two versions of every top score/field collector depending on whether it should support out-of-order collection or not
      • it feels like it should be an implementation detail of our bulk scorers but it also makes our APIs more complicated, eg. LeafCollector.acceptsDocsOutOfOrder
      • if you create a TopFieldCollector, how do you know if you should pass docsScoredInOrder=true or false? To make the decision, you actually need to know whether your query supports out-of-order scoring while the API is on Weight.

      I initially wanted to keep it and improve the decision process in LUCENE-6172 but I'm not sure it's the right approach as it would require to make the API even more complicated... hence the suggestion to remove out-of-order scoring completely.

      1. bool_or.tasks
        3 kB
        Adrien Grand
      2. LUCENE-6179.patch
        178 kB
        Adrien Grand
      3. LUCENE-6179.patch
        173 kB
        Adrien Grand

        Issue Links

          Activity

          Hide
          Mikhail Khludnev added a comment -

          wow,wow,wow! how does it work for BS1?

          Show
          Mikhail Khludnev added a comment - wow,wow,wow! how does it work for BS1?
          Hide
          Michael McCandless added a comment -

          +1

          Show
          Michael McCandless added a comment - +1
          Hide
          Adrien Grand added a comment -

          Here is a patch. It removes out-of-order scoring support and makes BooleanScorer (aka BS1) score documents in order. The new version is slower than current trunk

                              TaskQPS baseline      StdDev   QPS patch      StdDev                Pct diff
                          OrLowMed       93.19      (5.9%)       78.33      (4.8%)  -16.0% ( -25% -   -5%)
                          OrLowLow      282.29      (5.0%)      237.59      (4.1%)  -15.8% ( -23% -   -7%)
                       OrLowLowLow      203.94      (7.6%)      172.27      (4.8%)  -15.5% ( -25% -   -3%)
                          OrMedMed       57.48      (6.4%)       49.40      (5.3%)  -14.1% ( -24% -   -2%)
                         OrLowHigh       15.42      (6.4%)       13.63      (5.4%)  -11.6% ( -21% -    0%)
                       OrMedMedMed       37.51      (6.3%)       33.71      (5.6%)  -10.1% ( -20% -    1%)
                         OrMedHigh       18.73      (6.0%)       16.89      (5.4%)   -9.8% ( -20% -    1%)
                        OrHighHigh       10.94      (5.8%)       10.19      (5.7%)   -6.8% ( -17% -    5%)
                          PKLookup      259.62      (0.9%)      258.13      (1.0%)   -0.6% (  -2% -    1%)
                    OrHighHighHigh        7.85      (5.4%)        7.86      (6.6%)    0.1% ( -11% -   12%)
                           LowTerm      687.79      (3.5%)      694.17      (3.2%)    0.9% (  -5% -    7%)
                          HighTerm       22.81      (5.6%)       23.11      (1.3%)    1.3% (  -5% -    8%)
                           MedTerm      154.48      (5.2%)      162.79      (2.6%)    5.4% (  -2% -   13%)
          

          but still much faster than the Scorer-based BulkScorer:

                              TaskQPS baseline      StdDev   QPS patch      StdDev                Pct diff
                           LowTerm      671.14      (5.0%)      665.62      (3.2%)   -0.8% (  -8% -    7%)
                          OrLowLow      232.81      (6.2%)      231.60      (3.8%)   -0.5% (  -9% -   10%)
                          PKLookup      254.28      (1.2%)      253.31      (0.9%)   -0.4% (  -2% -    1%)
                          HighTerm       19.47      (3.0%)       20.31      (1.1%)    4.3% (   0% -    8%)
                           MedTerm      149.88      (4.7%)      162.49      (1.8%)    8.4% (   1% -   15%)
                       OrLowLowLow      151.55      (5.2%)      165.27      (5.9%)    9.1% (  -2% -   21%)
                          OrLowMed       67.87      (6.2%)       80.68      (5.6%)   18.9% (   6% -   32%)
                         OrLowHigh        7.97      (5.8%)       10.36      (7.7%)   29.9% (  15% -   46%)
                          OrMedMed       35.97      (5.1%)       48.45      (7.5%)   34.7% (  20% -   49%)
                         OrMedHigh       12.44      (5.6%)       17.01      (8.0%)   36.7% (  21% -   53%)
                        OrHighHigh        5.38      (5.2%)        8.16     (10.2%)   51.7% (  34% -   70%)
                       OrMedMedMed       21.63      (4.8%)       34.07      (8.8%)   57.5% (  41% -   74%)
                    OrHighHighHigh        3.96      (4.5%)        6.78     (12.0%)   71.2% (  52% -   91%)
          

          The new BooleanScorer uses a bit set in order to sort documents that matched within the window. As Robert pointed out, I don't have CTZ/NTZ support on my JVM, so this run of the benchmark is actually a worst-case. It should perform better on a newer machine:

          $ java -XX:+PrintFlagsFinal -version | grep Instruction
               intx FenceInstruction                          = 0                                   {ARCH product}
               bool UseBMI1Instructions                       = false                               {ARCH product}
               bool UseCountLeadingZerosInstruction           = false                               {ARCH product}
               bool UseCountTrailingZerosInstruction          = false                               {ARCH product}
               bool UsePopCountInstruction                    = true                                {product}
          java version "1.8.0_25"
          Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
          Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)
          

          Some final notes about the patch:

          • it helps remove significant code: 97 files changed, 411 insertions, 2131 deletions
          • since Solr always forces in-order scoring, it would actually make Solr faster since it would now use BS1
          • some queries that were lazy to implement scoresDocsOutOfOrder correctly might be faster because we won't pick a collector that supports out-of-order scoring while documents are actually collected in order (see eg. FilteredQuery)

          If someone is interested in reproducing the benchmark, I uploaded the tasks file that I used.

          Show
          Adrien Grand added a comment - Here is a patch. It removes out-of-order scoring support and makes BooleanScorer (aka BS1) score documents in order. The new version is slower than current trunk TaskQPS baseline StdDev QPS patch StdDev Pct diff OrLowMed 93.19 (5.9%) 78.33 (4.8%) -16.0% ( -25% - -5%) OrLowLow 282.29 (5.0%) 237.59 (4.1%) -15.8% ( -23% - -7%) OrLowLowLow 203.94 (7.6%) 172.27 (4.8%) -15.5% ( -25% - -3%) OrMedMed 57.48 (6.4%) 49.40 (5.3%) -14.1% ( -24% - -2%) OrLowHigh 15.42 (6.4%) 13.63 (5.4%) -11.6% ( -21% - 0%) OrMedMedMed 37.51 (6.3%) 33.71 (5.6%) -10.1% ( -20% - 1%) OrMedHigh 18.73 (6.0%) 16.89 (5.4%) -9.8% ( -20% - 1%) OrHighHigh 10.94 (5.8%) 10.19 (5.7%) -6.8% ( -17% - 5%) PKLookup 259.62 (0.9%) 258.13 (1.0%) -0.6% ( -2% - 1%) OrHighHighHigh 7.85 (5.4%) 7.86 (6.6%) 0.1% ( -11% - 12%) LowTerm 687.79 (3.5%) 694.17 (3.2%) 0.9% ( -5% - 7%) HighTerm 22.81 (5.6%) 23.11 (1.3%) 1.3% ( -5% - 8%) MedTerm 154.48 (5.2%) 162.79 (2.6%) 5.4% ( -2% - 13%) but still much faster than the Scorer-based BulkScorer: TaskQPS baseline StdDev QPS patch StdDev Pct diff LowTerm 671.14 (5.0%) 665.62 (3.2%) -0.8% ( -8% - 7%) OrLowLow 232.81 (6.2%) 231.60 (3.8%) -0.5% ( -9% - 10%) PKLookup 254.28 (1.2%) 253.31 (0.9%) -0.4% ( -2% - 1%) HighTerm 19.47 (3.0%) 20.31 (1.1%) 4.3% ( 0% - 8%) MedTerm 149.88 (4.7%) 162.49 (1.8%) 8.4% ( 1% - 15%) OrLowLowLow 151.55 (5.2%) 165.27 (5.9%) 9.1% ( -2% - 21%) OrLowMed 67.87 (6.2%) 80.68 (5.6%) 18.9% ( 6% - 32%) OrLowHigh 7.97 (5.8%) 10.36 (7.7%) 29.9% ( 15% - 46%) OrMedMed 35.97 (5.1%) 48.45 (7.5%) 34.7% ( 20% - 49%) OrMedHigh 12.44 (5.6%) 17.01 (8.0%) 36.7% ( 21% - 53%) OrHighHigh 5.38 (5.2%) 8.16 (10.2%) 51.7% ( 34% - 70%) OrMedMedMed 21.63 (4.8%) 34.07 (8.8%) 57.5% ( 41% - 74%) OrHighHighHigh 3.96 (4.5%) 6.78 (12.0%) 71.2% ( 52% - 91%) The new BooleanScorer uses a bit set in order to sort documents that matched within the window. As Robert pointed out, I don't have CTZ/NTZ support on my JVM, so this run of the benchmark is actually a worst-case. It should perform better on a newer machine: $ java -XX:+PrintFlagsFinal -version | grep Instruction intx FenceInstruction = 0 {ARCH product} bool UseBMI1Instructions = false {ARCH product} bool UseCountLeadingZerosInstruction = false {ARCH product} bool UseCountTrailingZerosInstruction = false {ARCH product} bool UsePopCountInstruction = true {product} java version "1.8.0_25" Java(TM) SE Runtime Environment (build 1.8.0_25-b17) Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode) Some final notes about the patch: it helps remove significant code: 97 files changed, 411 insertions , 2131 deletions since Solr always forces in-order scoring, it would actually make Solr faster since it would now use BS1 some queries that were lazy to implement scoresDocsOutOfOrder correctly might be faster because we won't pick a collector that supports out-of-order scoring while documents are actually collected in order (see eg. FilteredQuery) If someone is interested in reproducing the benchmark, I uploaded the tasks file that I used.
          Hide
          Robert Muir added a comment -

          +1

          Show
          Robert Muir added a comment - +1
          Hide
          Adrien Grand added a comment -

          > how does it work for BS1?

          It uses a bit set for sorting. Since the scorer scores windows of 2048 documents (2^11, just like before), every document in a window can be given an id between 0 and 2048 (docId & 2047), and this id is used as an index in the bit set. When all sub-scorers have scored the window, we iterate over the one bits in order and feed the collector.

          This bit set only has 32 longs (2048 / 64), so iterating over it is rather fast even if it's sparse. There also is a hasMatches boolean which is used to quickly escape the loop in case no sub scorers matched.

          Show
          Adrien Grand added a comment - > how does it work for BS1? It uses a bit set for sorting. Since the scorer scores windows of 2048 documents (2^11, just like before), every document in a window can be given an id between 0 and 2048 ( docId & 2047 ), and this id is used as an index in the bit set. When all sub-scorers have scored the window, we iterate over the one bits in order and feed the collector. This bit set only has 32 longs (2048 / 64), so iterating over it is rather fast even if it's sparse. There also is a hasMatches boolean which is used to quickly escape the loop in case no sub scorers matched.
          Hide
          Adrien Grand added a comment -

          For the record, I also tried another approach by merge-sorting the matches from each scorer but it did not work as well as the bitset approach (even when there are only 2 sub-scorers).

          Show
          Adrien Grand added a comment - For the record, I also tried another approach by merge-sorting the matches from each scorer but it did not work as well as the bitset approach (even when there are only 2 sub-scorers).
          Hide
          Adrien Grand added a comment -

          One problem with the patch is that collectors cannot opt out from this specialized bulk scorer anymore by returning false in acceptsDocsOutOfOrder.

          This is an issue for ToParentBlockJoinCollector since it calls Scorer.getChildren in LeafCollector.setScorer. Yet this throws an UnsupportedOperationException with a FakeScorer... Any suggestion how to fix this issue?

          Show
          Adrien Grand added a comment - One problem with the patch is that collectors cannot opt out from this specialized bulk scorer anymore by returning false in acceptsDocsOutOfOrder. This is an issue for ToParentBlockJoinCollector since it calls Scorer.getChildren in LeafCollector.setScorer. Yet this throws an UnsupportedOperationException with a FakeScorer... Any suggestion how to fix this issue?
          Hide
          Adrien Grand added a comment -

          New patch. I fixed the issue with ToParentBlockJoinCollector by providing a ToParentBlockJoinIndexSearcher in the join module that uses the Scorer to score instead of the BulkScorer.

          I think it's ready.

          Show
          Adrien Grand added a comment - New patch. I fixed the issue with ToParentBlockJoinCollector by providing a ToParentBlockJoinIndexSearcher in the join module that uses the Scorer to score instead of the BulkScorer. I think it's ready.
          Hide
          Adrien Grand added a comment -

          FYI I'm considering pushing it to the 5.0 branch when committing as I think it's a nice cleanup and we already broke the collector API in 5.0. Let me know if you have objections and I'll just delay to 5.1.

          Show
          Adrien Grand added a comment - FYI I'm considering pushing it to the 5.0 branch when committing as I think it's a nice cleanup and we already broke the collector API in 5.0. Let me know if you have objections and I'll just delay to 5.1.
          Hide
          Michael McCandless added a comment -

          +1 to patch and +1 for 5.0.

          Show
          Michael McCandless added a comment - +1 to patch and +1 for 5.0.
          Hide
          ASF subversion and git services added a comment -

          Commit 1652013 from Adrien Grand in branch 'dev/trunk'
          [ https://svn.apache.org/r1652013 ]

          LUCENE-6179: Disallow out-of-order scoring.

          Show
          ASF subversion and git services added a comment - Commit 1652013 from Adrien Grand in branch 'dev/trunk' [ https://svn.apache.org/r1652013 ] LUCENE-6179 : Disallow out-of-order scoring.
          Hide
          ASF subversion and git services added a comment -

          Commit 1652015 from Adrien Grand in branch 'dev/trunk'
          [ https://svn.apache.org/r1652015 ]

          LUCENE-6179: Leftovers.

          Show
          ASF subversion and git services added a comment - Commit 1652015 from Adrien Grand in branch 'dev/trunk' [ https://svn.apache.org/r1652015 ] LUCENE-6179 : Leftovers.
          Hide
          ASF subversion and git services added a comment -

          Commit 1652027 from Adrien Grand in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1652027 ]

          LUCENE-6179: Disallow out-of-order scoring.

          Show
          ASF subversion and git services added a comment - Commit 1652027 from Adrien Grand in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1652027 ] LUCENE-6179 : Disallow out-of-order scoring.
          Hide
          ASF subversion and git services added a comment -

          Commit 1652034 from Adrien Grand in branch 'dev/branches/lucene_solr_5_0'
          [ https://svn.apache.org/r1652034 ]

          LUCENE-6179: Disallow out-of-order scoring.

          Show
          ASF subversion and git services added a comment - Commit 1652034 from Adrien Grand in branch 'dev/branches/lucene_solr_5_0' [ https://svn.apache.org/r1652034 ] LUCENE-6179 : Disallow out-of-order scoring.
          Hide
          Anshum Gupta added a comment -

          Bulk close after 5.0 release.

          Show
          Anshum Gupta added a comment - Bulk close after 5.0 release.
          Hide
          ASF subversion and git services added a comment -

          Commit 7f3d86524d0fc5cdf5a517eb266b68b49db81be0 in lucene-solr's branch refs/heads/master from Christine Poerschke
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7f3d865 ]

          SOLR-9436: remove no longer used acceptsDocsOutOfOrder methods (LUCENE-6179 removed out-of-order scoring)

          Show
          ASF subversion and git services added a comment - Commit 7f3d86524d0fc5cdf5a517eb266b68b49db81be0 in lucene-solr's branch refs/heads/master from Christine Poerschke [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7f3d865 ] SOLR-9436 : remove no longer used acceptsDocsOutOfOrder methods ( LUCENE-6179 removed out-of-order scoring)

            People

            • Assignee:
              Adrien Grand
              Reporter:
              Adrien Grand
            • Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development