Solr code Changes:
- CollectionsHandler takes the async param generically. Earlier only few commands took it. The idea is that the OverseerTaskProcessor supports async actions generically. Why limit it only to a few commands.
- Merged collectShardResponses and processResponses as they were doing the same thing. collectShardResponses had an abortOnError which logic which I moved to processResponses. I then merged processResponsed and completeAsyncAction into one method . We use processResponses to collect shard responses and should deal with async as well. It would be less error prone that way I feel.
- DeleteCollection, CreateShard, DeleteShard implements async
- DeleteReplica now reuses the sendShardRequest method instead of doing the same thing there. It also implements async now
- The first call from SplitShard to collectShardResponses is not needed as deleteShard already calls processResponses which processes the shard responses. So I guess this call will always have an empty shard response so it has no effect. The one differences is that deleteShard calls processResponses with abortOnError=true while here the comment says it wants abortOnError=false.
- sliceCmd calls sendShardRequest internally instead of doing the same thing there
DeleteReplica#collectShardResponses aborts on error because it calls collectShardResponse with abortOnError=false. Is there a reason for delete replica to not abort of the core underneath? Say it fails to unload, we still go ahead and remove the entry from cluster state. So the user can't even retry. Maybe it should abort?
Filed SOLR-8553 and remove the nocommit from the previous patch in reload collection
- SolrJ Collection admin requests takes the async param generically. Thus it's added to CollectionSpecificAdminRequest and CollectionShardSpecificAdminRequest . Earlier only certain commands accepted async
- A lot of instance variables in the CollectionSpecificAdminequest sub-classes were private or protected. Since most of them were protected I converted the private variables to protected. Maybe they should all be private instead?
- CollectionShardAdminRequest#getCommonParams has been deprecated in favour of CollectionShardAdminRequest#getParams to make the API consistent.
- A lot of getParams which have been overridden are inconsistant . Some of them do null checks before adding while others don't . Should we make it uniform here by doing null checks everywhere? I guess some of them don't have null checks because they are mandatory? In that case shouldn't we throw an exception if they aren't present?
Here are the 6 collection api calls which won't support async currently because it's executed in the collections handler and not the overseer.
I filed SOLR-8554 to move REBALANCELEADERS_OP and FORCELEADER_OP to the overseer for the reasons mentioned on the Jira.
So that leves us with 4 operations not supporting collections api with this patch:
Currently I've run CollectionsAPIAsyncDistributedZkTest several times and it's passed. Haven't run all the tests yet. I'll also add some more async tests