Solr
  1. Solr
  2. SOLR-8534

Add generic support for Collection APIs to be async

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.5, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      Currently only a handful of Collection API calls support the async parameter. I propose to extended support for async to most APIs.

      The Overseer has a generic support for calls to be async. So we should leverage that and make all commands implemented within the OverseerCollectionMessageHandler to support async

      1. SOLR-8534_solrJ.patch
        18 kB
        Varun Thacker
      2. SOLR-8534_solrJ.patch
        19 kB
        Varun Thacker
      3. SOLR-8534.patch
        51 kB
        Varun Thacker
      4. SOLR-8534.patch
        52 kB
        Varun Thacker
      5. SOLR-8534.patch
        49 kB
        Varun Thacker
      6. SOLR-8534.patch
        32 kB
        Varun Thacker

        Issue Links

          Activity

          Hide
          Varun Thacker added a comment -

          First cut patch which adds support for async calls to the mentioned commands.

          Additionally it makes a few modifications to CollectionAdminRequest which came up while adding async support in SolrJ . I'll look at some of the changes made here more closely for back compat reasons .

          Show
          Varun Thacker added a comment - First cut patch which adds support for async calls to the mentioned commands. Additionally it makes a few modifications to CollectionAdminRequest which came up while adding async support in SolrJ . I'll look at some of the changes made here more closely for back compat reasons .
          Hide
          Noble Paul added a comment -

          can u make a sub task for the unrelated changes

          Show
          Noble Paul added a comment - can u make a sub task for the unrelated changes
          Hide
          Varun Thacker added a comment -

          Solr code Changes:

          • CollectionsHandler takes the async param generically. Earlier only few commands took it. The idea is that the OverseerTaskProcessor supports async actions generically. Why limit it only to a few commands.
          • Merged collectShardResponses and processResponses as they were doing the same thing. collectShardResponses had an abortOnError which logic which I moved to processResponses. I then merged processResponsed and completeAsyncAction into one method . We use processResponses to collect shard responses and should deal with async as well. It would be less error prone that way I feel.
          • DeleteCollection, CreateShard, DeleteShard implements async
          • DeleteReplica now reuses the sendShardRequest method instead of doing the same thing there. It also implements async now
          • The first call from SplitShard to collectShardResponses is not needed as deleteShard already calls processResponses which processes the shard responses. So I guess this call will always have an empty shard response so it has no effect. The one differences is that deleteShard calls processResponses with abortOnError=true while here the comment says it wants abortOnError=false.
          • sliceCmd calls sendShardRequest internally instead of doing the same thing there

          Question:
          DeleteReplica#collectShardResponses aborts on error because it calls collectShardResponse with abortOnError=false. Is there a reason for delete replica to not abort of the core underneath? Say it fails to unload, we still go ahead and remove the entry from cluster state. So the user can't even retry. Maybe it should abort?

          Filed SOLR-8553 and remove the nocommit from the previous patch in reload collection

          SolrJ Changes:

          • SolrJ Collection admin requests takes the async param generically. Thus it's added to CollectionSpecificAdminRequest and CollectionShardSpecificAdminRequest . Earlier only certain commands accepted async
          • A lot of instance variables in the CollectionSpecificAdminequest sub-classes were private or protected. Since most of them were protected I converted the private variables to protected. Maybe they should all be private instead?
          • CollectionShardAdminRequest#getCommonParams has been deprecated in favour of CollectionShardAdminRequest#getParams to make the API consistent.

          Question:

          • A lot of getParams which have been overridden are inconsistant . Some of them do null checks before adding while others don't . Should we make it uniform here by doing null checks everywhere? I guess some of them don't have null checks because they are mandatory? In that case shouldn't we throw an exception if they aren't present?

          Here are the 6 collection api calls which won't support async currently because it's executed in the collections handler and not the overseer.

          • LIST_OP
          • CLUSTERSTATUS_OP
          • REBALANCELEADERS_OP
          • SYNCSHARD_OP
          • FORCELEADER_OP
          • CLUSTERPROP_OP

          I filed SOLR-8554 to move REBALANCELEADERS_OP and FORCELEADER_OP to the overseer for the reasons mentioned on the Jira.

          So that leves us with 4 operations not supporting collections api with this patch:

          • LIST_OP
          • CLUSTERSTATUS_OP
          • SYNCSHARD_OP
          • CLUSTERPROP_OP

          Currently I've run CollectionsAPIAsyncDistributedZkTest several times and it's passed. Haven't run all the tests yet. I'll also add some more async tests

          Show
          Varun Thacker added a comment - Solr code Changes: CollectionsHandler takes the async param generically. Earlier only few commands took it. The idea is that the OverseerTaskProcessor supports async actions generically. Why limit it only to a few commands. Merged collectShardResponses and processResponses as they were doing the same thing. collectShardResponses had an abortOnError which logic which I moved to processResponses. I then merged processResponsed and completeAsyncAction into one method . We use processResponses to collect shard responses and should deal with async as well. It would be less error prone that way I feel. DeleteCollection, CreateShard, DeleteShard implements async DeleteReplica now reuses the sendShardRequest method instead of doing the same thing there. It also implements async now The first call from SplitShard to collectShardResponses is not needed as deleteShard already calls processResponses which processes the shard responses. So I guess this call will always have an empty shard response so it has no effect. The one differences is that deleteShard calls processResponses with abortOnError=true while here the comment says it wants abortOnError=false. sliceCmd calls sendShardRequest internally instead of doing the same thing there Question: DeleteReplica#collectShardResponses aborts on error because it calls collectShardResponse with abortOnError=false. Is there a reason for delete replica to not abort of the core underneath? Say it fails to unload, we still go ahead and remove the entry from cluster state. So the user can't even retry. Maybe it should abort? Filed SOLR-8553 and remove the nocommit from the previous patch in reload collection SolrJ Changes: SolrJ Collection admin requests takes the async param generically. Thus it's added to CollectionSpecificAdminRequest and CollectionShardSpecificAdminRequest . Earlier only certain commands accepted async A lot of instance variables in the CollectionSpecificAdminequest sub-classes were private or protected. Since most of them were protected I converted the private variables to protected. Maybe they should all be private instead? CollectionShardAdminRequest#getCommonParams has been deprecated in favour of CollectionShardAdminRequest#getParams to make the API consistent. Question: A lot of getParams which have been overridden are inconsistant . Some of them do null checks before adding while others don't . Should we make it uniform here by doing null checks everywhere? I guess some of them don't have null checks because they are mandatory? In that case shouldn't we throw an exception if they aren't present? Here are the 6 collection api calls which won't support async currently because it's executed in the collections handler and not the overseer. LIST_OP CLUSTERSTATUS_OP REBALANCELEADERS_OP SYNCSHARD_OP FORCELEADER_OP CLUSTERPROP_OP I filed SOLR-8554 to move REBALANCELEADERS_OP and FORCELEADER_OP to the overseer for the reasons mentioned on the Jira. So that leves us with 4 operations not supporting collections api with this patch: LIST_OP CLUSTERSTATUS_OP SYNCSHARD_OP CLUSTERPROP_OP Currently I've run CollectionsAPIAsyncDistributedZkTest several times and it's passed. Haven't run all the tests yet. I'll also add some more async tests
          Hide
          Varun Thacker added a comment -

          Patch which adds some more tests as well as moves the setting of async to the CollectionAdminRequest class in SolrJ so that all APIs can make use of it.

          Show
          Varun Thacker added a comment - Patch which adds some more tests as well as moves the setting of async to the CollectionAdminRequest class in SolrJ so that all APIs can make use of it.
          Hide
          Noble Paul added a comment -

          The patch looks fine +1

          Show
          Noble Paul added a comment - The patch looks fine +1
          Hide
          Varun Thacker added a comment -

          Thanks Noble for the review! Tests seem to pass. I'll run it one more time and then commit it.

          After this there are two tasks that we need to support Async in all API calls -

          • Work on SOLR-8554 . This should be fairly straightforward and I'll get this done after this patch is committed.
          • Plan on how to support async to the operations that are currently being handled by the collections handler
          Show
          Varun Thacker added a comment - Thanks Noble for the review! Tests seem to pass. I'll run it one more time and then commit it. After this there are two tasks that we need to support Async in all API calls - Work on SOLR-8554 . This should be fairly straightforward and I'll get this done after this patch is committed. Plan on how to support async to the operations that are currently being handled by the collections handler
          Hide
          Varun Thacker added a comment -

          Updated patch.

          Made some minor change

          • Remove unused variable
          +    } catch (SolrException e) {
          +      //expected
          +      int x =10;
          +    }
          
          • Reverted wrong change in previous patches for OVERSEERSTATUS_OP which returns a Collections.EMPTY_MAP
               OVERSEERSTATUS_OP(OVERSEERSTATUS) {
                 @Override
                 Map<String, Object> call(SolrQueryRequest req, SolrQueryResponse rsp, CollectionsHandler h) throws Exception {
          -        return new LinkedHashMap<>();
          +        return Collections.EMPTY_MAP;
                 }
          
          • Removed final from the setAsyncId signature. Certain operations might need to throw unsupported operation

          All tests pass. I'll commit this shortly

          Show
          Varun Thacker added a comment - Updated patch. Made some minor change Remove unused variable + } catch (SolrException e) { + //expected + int x =10; + } Reverted wrong change in previous patches for OVERSEERSTATUS_OP which returns a Collections.EMPTY_MAP OVERSEERSTATUS_OP(OVERSEERSTATUS) { @Override Map< String , Object > call(SolrQueryRequest req, SolrQueryResponse rsp, CollectionsHandler h) throws Exception { - return new LinkedHashMap<>(); + return Collections.EMPTY_MAP; } Removed final from the setAsyncId signature. Certain operations might need to throw unsupported operation All tests pass. I'll commit this shortly
          Hide
          ASF subversion and git services added a comment -

          Commit 1725474 from Varun Thacker in branch 'dev/trunk'
          [ https://svn.apache.org/r1725474 ]

          SOLR-8534: Add generic support for collection APIs to be async

          Show
          ASF subversion and git services added a comment - Commit 1725474 from Varun Thacker in branch 'dev/trunk' [ https://svn.apache.org/r1725474 ] SOLR-8534 : Add generic support for collection APIs to be async
          Hide
          ASF subversion and git services added a comment -

          Commit 1726067 from Varun Thacker in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1726067 ]

          SOLR-8534: Add generic support for collection APIs to be async (merged trunk r1725474)

          Show
          ASF subversion and git services added a comment - Commit 1726067 from Varun Thacker in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1726067 ] SOLR-8534 : Add generic support for collection APIs to be async (merged trunk r1725474)
          Hide
          Varun Thacker added a comment -

          Committed to trunk and branch_5x

          Show
          Varun Thacker added a comment - Committed to trunk and branch_5x
          Hide
          Anshum Gupta added a comment -

          this has added async support to 'requeststatus' call, which isn't correct.

          Show
          Anshum Gupta added a comment - this has added async support to 'requeststatus' call, which isn't correct.
          Hide
          Varun Thacker added a comment -

          I don't think so. REQUESTSTATUS_OP returns a null and the check is -

                Map<String, Object> props = operation.call(req, rsp, this);
                String asyncId = req.getParams().get(ASYNC);
                if (props != null) {
                  if (asyncId != null) {
                    props.put(ASYNC, asyncId);
                  }
          

          Which means all calls which are currently being handled by the collections handler directly will return null and the async param won't get applied

          As a future enhancement we need to allow async calls to actions handled by the collections handler as well

          Show
          Varun Thacker added a comment - I don't think so. REQUESTSTATUS_OP returns a null and the check is - Map< String , Object > props = operation.call(req, rsp, this ); String asyncId = req.getParams().get(ASYNC); if (props != null ) { if (asyncId != null ) { props.put(ASYNC, asyncId); } Which means all calls which are currently being handled by the collections handler directly will return null and the async param won't get applied As a future enhancement we need to allow async calls to actions handled by the collections handler as well
          Hide
          Anshum Gupta added a comment -

          Sorry for not clarifying but I'm actually talking about the SolrJ code. I didn't dig deeper but I could do this:

          RequestStatus req = new RequestStatus();
          req.setAsyncId("myid").process(client);
          
          Show
          Anshum Gupta added a comment - Sorry for not clarifying but I'm actually talking about the SolrJ code. I didn't dig deeper but I could do this: RequestStatus req = new RequestStatus(); req.setAsyncId( "myid" ).process(client);
          Hide
          Anshum Gupta added a comment -

          Also, this would be a problem with everything that doesn't go to the Overseer. We should introduce a new level in the hierarchy there to selectively support async id.

          Show
          Anshum Gupta added a comment - Also, this would be a problem with everything that doesn't go to the Overseer. We should introduce a new level in the hierarchy there to selectively support async id.
          Hide
          Varun Thacker added a comment -

          Need to fix SolrJ to not accept async params for commands that don't support async

          Show
          Varun Thacker added a comment - Need to fix SolrJ to not accept async params for commands that don't support async
          Hide
          Varun Thacker added a comment -

          Patch which fixes the SolrJ API's . So Anshum Gupta can you please review it.

          I don't like the extra spaces being added by the patch by Intellij seems to keep adding it back. I'll see what I can do

          Show
          Varun Thacker added a comment - Patch which fixes the SolrJ API's . So Anshum Gupta can you please review it. I don't like the extra spaces being added by the patch by Intellij seems to keep adding it back. I'll see what I can do
          Hide
          Anshum Gupta added a comment -

          Thanks Varun Thacker ! I'll take a look in a couple of hours.

          Show
          Anshum Gupta added a comment - Thanks Varun Thacker ! I'll take a look in a couple of hours.
          Hide
          Anshum Gupta added a comment -

          Instead of duplicating async code for async why not add another class (AsyncCollectionAdminRequest? ) so that the following classes instead directly extend the async behavior:

          • CollectionAdminRoleRequest
          • CreateAlias
          • DeleteAlias
          • OverseerStatus

          The rest looks fine to me.

          Show
          Anshum Gupta added a comment - Instead of duplicating async code for async why not add another class (AsyncCollectionAdminRequest? ) so that the following classes instead directly extend the async behavior: CollectionAdminRoleRequest CreateAlias DeleteAlias OverseerStatus The rest looks fine to me.
          Hide
          Varun Thacker added a comment -

          Patch which folds in Anshum's changes. I'll commit this soon

          Show
          Varun Thacker added a comment - Patch which folds in Anshum's changes. I'll commit this soon
          Hide
          ASF subversion and git services added a comment -

          Commit 9985a0966ba33f78b0889b00cd81cd6c5a858111 in lucene-solr's branch refs/heads/master from Varun Thacker
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9985a09 ]

          SOLR-8534: Fix SolrJ APIs to add async support

          Show
          ASF subversion and git services added a comment - Commit 9985a0966ba33f78b0889b00cd81cd6c5a858111 in lucene-solr's branch refs/heads/master from Varun Thacker [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9985a09 ] SOLR-8534 : Fix SolrJ APIs to add async support
          Hide
          ASF subversion and git services added a comment -

          Commit 14bd3c4d3859719f8c0d5d0edebbf17f36531b72 in lucene-solr's branch refs/heads/branch_5x from Varun Thacker
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=14bd3c4 ]

          SOLR-8534: Fix SolrJ APIs to add async support

          Show
          ASF subversion and git services added a comment - Commit 14bd3c4d3859719f8c0d5d0edebbf17f36531b72 in lucene-solr's branch refs/heads/branch_5x from Varun Thacker [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=14bd3c4 ] SOLR-8534 : Fix SolrJ APIs to add async support
          Hide
          ASF subversion and git services added a comment -

          Commit 13313415a5d975f77a5360239d82a344b9a7aa20 in lucene-solr's branch refs/heads/branch_5_5 from Varun Thacker
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1331341 ]

          SOLR-8534: Fix SolrJ APIs to add async support

          Show
          ASF subversion and git services added a comment - Commit 13313415a5d975f77a5360239d82a344b9a7aa20 in lucene-solr's branch refs/heads/branch_5_5 from Varun Thacker [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1331341 ] SOLR-8534 : Fix SolrJ APIs to add async support

            People

            • Assignee:
              Varun Thacker
              Reporter:
              Varun Thacker
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development