Solr
  1. Solr
  2. SOLR-7128

Two phase distributed search is fetching extra fields in GET_TOP_IDS phase

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 4.10.2, 4.10.3
    • Fix Version/s: 4.10.4, 5.1, 6.0
    • Component/s: search
    • Labels:
      None

      Description

      Pablo Queixalos reported this to me privately so I am creating this issue on his behalf.

      We found an issue in versions 4.10.+ (4.10.2 and 4.10.3 for sure).

      When processing a two phase distributed query with an explicit fl parameter, the two phases are well processed, but the GET_TOP_IDS retrieves the matching documents fields, even if a GET_FIELDS shard request is getting executed just after.

      /solr/someCollectionCore?collection=someOtherCollection&q=:&debug=true&fl=id,title
      => id is retrieved during GET_TOP_IDS phase that's ok:: it's our uniqueKeyField
      => title is also retrieved during GET_TOP_IDS phase, that's not ok.

      I'm able to reproduce this. This is pretty bad performance bug that was introduced in SOLR-5768 or it's subsequent related issues. I plan to fix this bug and add substantial tests to assert such things.

      1. SOLR-7128.patch
        19 kB
        Shalin Shekhar Mangar
      2. SOLR-7128.patch
        19 kB
        Shalin Shekhar Mangar
      3. SOLR-7128.patch
        18 kB
        Shalin Shekhar Mangar
      4. SOLR-7128.patch
        17 kB
        Shalin Shekhar Mangar
      5. SOLR-7128-addendum.patch
        7 kB
        Shalin Shekhar Mangar

        Issue Links

          Activity

          Hide
          Shalin Shekhar Mangar added a comment -

          This was introduced in SOLR-6796.

          Here's a test which exposes this issue.

          1. I created a TrackingShardHandlerFactory which can record shard requests sent from any node. There are a few helper methods to get requests by shard and by purpose.
          2. The new test TestTwoPhaseDistributedQuery uses this factory to assert the fields which are being requested in GET_TOP_IDS phase.

          I will likely move the TrackingShardHandlerFactory into its own issue because it is helpful for other distributed tests as well. I also need to decouple it from the MiniSolrCloudCluster abstraction. I found a problem in MiniSolrCloudCluster too and opened SOLR-7146

          I'm working on a fix.

          Show
          Shalin Shekhar Mangar added a comment - This was introduced in SOLR-6796 . Here's a test which exposes this issue. I created a TrackingShardHandlerFactory which can record shard requests sent from any node. There are a few helper methods to get requests by shard and by purpose. The new test TestTwoPhaseDistributedQuery uses this factory to assert the fields which are being requested in GET_TOP_IDS phase. I will likely move the TrackingShardHandlerFactory into its own issue because it is helpful for other distributed tests as well. I also need to decouple it from the MiniSolrCloudCluster abstraction. I found a problem in MiniSolrCloudCluster too and opened SOLR-7146 I'm working on a fix.
          Hide
          Hoss Man added a comment -

          I had a lot of feedback/ideas about the TrackingShardHandlerFactory, but not a lot of feedback/comments/ideas about the root issue here, so i went ahead and filed SOLR-7147 for you.

          Show
          Hoss Man added a comment - I had a lot of feedback/ideas about the TrackingShardHandlerFactory, but not a lot of feedback/comments/ideas about the root issue here, so i went ahead and filed SOLR-7147 for you.
          Hide
          Shalin Shekhar Mangar added a comment - - edited

          This patch fixes the bug and modifies the DistributedQueryComponentOptimizationTest to use the TrackingShardHandlerFactory introduced in SOLR-7147. I removed the TestTwoPhaseDistributedQuery test that I had introduced earlier.

          This test now asserts that every distrib.singlePass query:

          1. Makes exactly 'numSlices' number of shard requests
          2. Makes no GET_FIELDS requests
          3. Must request the unique key field from shards
          4. Must request the score if 'fl' has score or sort by score is requested
          5. Requests all fields that are present in 'fl' param

          It also asserts that every regular two phase distribtued search:

          1. Makes at most 2 * 'numSlices' number of shard requests
          2. Must request the unique key field from shards
          3. Must request the score if 'fl' has score or sort by score is requested
          4. Requests no fields other than id and score in GET_TOP_IDS request
          5. Requests exactly the fields that are present in 'fl' param in GET_FIELDS request and no others

          and also asserts that:

          1. Each query which requests id or score or both behaves exactly like a single pass query
          Show
          Shalin Shekhar Mangar added a comment - - edited This patch fixes the bug and modifies the DistributedQueryComponentOptimizationTest to use the TrackingShardHandlerFactory introduced in SOLR-7147 . I removed the TestTwoPhaseDistributedQuery test that I had introduced earlier. This test now asserts that every distrib.singlePass query: Makes exactly 'numSlices' number of shard requests Makes no GET_FIELDS requests Must request the unique key field from shards Must request the score if 'fl' has score or sort by score is requested Requests all fields that are present in 'fl' param It also asserts that every regular two phase distribtued search: Makes at most 2 * 'numSlices' number of shard requests Must request the unique key field from shards Must request the score if 'fl' has score or sort by score is requested Requests no fields other than id and score in GET_TOP_IDS request Requests exactly the fields that are present in 'fl' param in GET_FIELDS request and no others and also asserts that: Each query which requests id or score or both behaves exactly like a single pass query
          Hide
          Shalin Shekhar Mangar added a comment -

          Updated patch. I moved my earlier Jira comment about the kind of assertions being made in this test to the Javadocs of the test method.

          All tests pass. ant precommit passes. I'll commit this shortly.

          Show
          Shalin Shekhar Mangar added a comment - Updated patch. I moved my earlier Jira comment about the kind of assertions being made in this test to the Javadocs of the test method. All tests pass. ant precommit passes. I'll commit this shortly.
          Hide
          Erick Erickson added a comment -

          Shalin:

          I was looking at the fact that two-phase distributed search also decompresses the doc to get the ID to pass back to the aggregator on the first pass, on the surface I don't see why all the fields that need to be passed back can't be taken from indexed terms, i.e. the ID and sort criteria for each doc better be indexed terms. see: https://issues.apache.org/jira/browse/SOLR-6888, and one comment was that https://issues.apache.org/jira/browse/SOLR-6810 would take care of this.

          Just wondering how this issue relates to those two for my background.

          Thanks!

          Show
          Erick Erickson added a comment - Shalin: I was looking at the fact that two-phase distributed search also decompresses the doc to get the ID to pass back to the aggregator on the first pass, on the surface I don't see why all the fields that need to be passed back can't be taken from indexed terms, i.e. the ID and sort criteria for each doc better be indexed terms. see: https://issues.apache.org/jira/browse/SOLR-6888 , and one comment was that https://issues.apache.org/jira/browse/SOLR-6810 would take care of this. Just wondering how this issue relates to those two for my background. Thanks!
          Hide
          Shalin Shekhar Mangar added a comment -

          I reverted some of my changes to QueryComponent because I realised the duplicate field names aren't really a problem because SolrReturnFields maintains a Set of them. My logic not to add duplicate 'id' fields was just making it more complicated for no reason. However, I feel that it could be simplified more but perhaps some other time.

          Show
          Shalin Shekhar Mangar added a comment - I reverted some of my changes to QueryComponent because I realised the duplicate field names aren't really a problem because SolrReturnFields maintains a Set of them. My logic not to add duplicate 'id' fields was just making it more complicated for no reason. However, I feel that it could be simplified more but perhaps some other time.
          Hide
          Shalin Shekhar Mangar added a comment - - edited

          Erick, this is a different bug. The bug is that the complete list of fields from the 'fl' param were being requested in the first phase of distributed search. Really only id and/or score is required in this first phase with the current distributed search algorithm.

          How those fields are actually retrieved by a shard is the subject of SOLR-6888 and whether id is really required to be fetched at all is the subject of SOLR-6810.

          Show
          Shalin Shekhar Mangar added a comment - - edited Erick, this is a different bug. The bug is that the complete list of fields from the 'fl' param were being requested in the first phase of distributed search. Really only id and/or score is required in this first phase with the current distributed search algorithm. How those fields are actually retrieved by a shard is the subject of SOLR-6888 and whether id is really required to be fetched at all is the subject of SOLR-6810 .
          Hide
          ASF subversion and git services added a comment -

          Commit 1662366 from shalin@apache.org in branch 'dev/trunk'
          [ https://svn.apache.org/r1662366 ]

          SOLR-7128: Two phase distributed search is fetching extra fields in GET_TOP_IDS phase

          Show
          ASF subversion and git services added a comment - Commit 1662366 from shalin@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1662366 ] SOLR-7128 : Two phase distributed search is fetching extra fields in GET_TOP_IDS phase
          Hide
          ASF subversion and git services added a comment -

          Commit 1662367 from shalin@apache.org in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1662367 ]

          SOLR-7128: Two phase distributed search is fetching extra fields in GET_TOP_IDS phase

          Show
          ASF subversion and git services added a comment - Commit 1662367 from shalin@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1662367 ] SOLR-7128 : Two phase distributed search is fetching extra fields in GET_TOP_IDS phase
          Hide
          Michael McCandless added a comment -

          Shalin Shekhar Mangar looks like this is done? Is the fix low-risk? Since I haven't managed to cut a 4.10.4 yet (various fun Lucene issues...) maybe it makes sense to backport for 4.10.4?

          Show
          Michael McCandless added a comment - Shalin Shekhar Mangar looks like this is done? Is the fix low-risk? Since I haven't managed to cut a 4.10.4 yet (various fun Lucene issues...) maybe it makes sense to backport for 4.10.4?
          Hide
          Shalin Shekhar Mangar added a comment -

          Hoss prodded me privately about the duplicate field names being requested in shard requests (thanks Hoss!) so I refactored the field modifying logic so that duplicates aren't possible. I also added more tests with no 'fl', fl=* and fl=*,score in both single pass and regular search.

          Show
          Shalin Shekhar Mangar added a comment - Hoss prodded me privately about the duplicate field names being requested in shard requests (thanks Hoss!) so I refactored the field modifying logic so that duplicates aren't possible. I also added more tests with no 'fl', fl=* and fl=*,score in both single pass and regular search.
          Hide
          ASF subversion and git services added a comment -

          Commit 1662729 from shalin@apache.org in branch 'dev/trunk'
          [ https://svn.apache.org/r1662729 ]

          SOLR-7128: Make sure fields aren't duplicated in shard requests

          Show
          ASF subversion and git services added a comment - Commit 1662729 from shalin@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1662729 ] SOLR-7128 : Make sure fields aren't duplicated in shard requests
          Hide
          ASF subversion and git services added a comment -

          Commit 1662730 from shalin@apache.org in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1662730 ]

          SOLR-7128: Make sure fields aren't duplicated in shard requests

          Show
          ASF subversion and git services added a comment - Commit 1662730 from shalin@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1662730 ] SOLR-7128 : Make sure fields aren't duplicated in shard requests
          Hide
          ASF subversion and git services added a comment -

          Commit 1662802 from shalin@apache.org in branch 'dev/branches/lucene_solr_4_10'
          [ https://svn.apache.org/r1662802 ]

          SOLR-7128: Two phase distributed search is fetching extra fields in GET_TOP_IDS phase

          Show
          ASF subversion and git services added a comment - Commit 1662802 from shalin@apache.org in branch 'dev/branches/lucene_solr_4_10' [ https://svn.apache.org/r1662802 ] SOLR-7128 : Two phase distributed search is fetching extra fields in GET_TOP_IDS phase
          Hide
          Shalin Shekhar Mangar added a comment -

          Since I haven't managed to cut a 4.10.4 yet (various fun Lucene issues...) maybe it makes sense to backport for 4.10.4?

          Thanks for waiting Michael McCandless! The test actually took some time to back-port because of so many changes in Solr's test suite between 4.x and 5.0.

          Note to anyone who reviews my 4.10 commit – I had to change the solr-trackingshardhandler.xml to use the old style solr.xml otherwise it wouldn't load collection1. Since there was no time to find/fix any bugs, I chose the easy way and changed the format of solr.xml itself. I verified that the TestTrackingShardHandler factory still tests the right things and passes.

          Show
          Shalin Shekhar Mangar added a comment - Since I haven't managed to cut a 4.10.4 yet (various fun Lucene issues...) maybe it makes sense to backport for 4.10.4? Thanks for waiting Michael McCandless ! The test actually took some time to back-port because of so many changes in Solr's test suite between 4.x and 5.0. Note to anyone who reviews my 4.10 commit – I had to change the solr-trackingshardhandler.xml to use the old style solr.xml otherwise it wouldn't load collection1. Since there was no time to find/fix any bugs, I chose the easy way and changed the format of solr.xml itself. I verified that the TestTrackingShardHandler factory still tests the right things and passes.
          Hide
          Michael McCandless added a comment -

          Bulk close for 4.10.4 release

          Show
          Michael McCandless added a comment - Bulk close for 4.10.4 release

            People

            • Assignee:
              Shalin Shekhar Mangar
              Reporter:
              Shalin Shekhar Mangar
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development