Building on the distributed cluster state update changes (SOLR-14928), this ticket will distribute the Collection API so that commands can execute on any node (i.e. the node handling the request through CollectionsHandler) without having to go through a Zookeeper queue and the Overseer.
This is the second step (first was SOLR-14928) after which the Overseer could be removed (but the code keeps existing execution options so completion by no means Overseer is gone, but it could be removed in a future release).
There is a dependency on the distributed cluster state changes because the Overseer locking protecting same collection (or same shard) Collection API commands from executing concurrently will be replaced by optimistic locking of the collection state.json znodes (or other znodes that will eventually replace/augment state.json).
The goal of this ticket is threefold:
- Simplify the code (running synchronously and not going through the Zookeeper queues and the Overseer dequeue logic is much simpler),
- Lead to improved performance for most/all use cases (although this is a secondary goal, as long as performance is not degraded) and
- Allow a future change (in another future Jira) to the way cluster state is cached on the nodes of the cluster (keep less information, be less dependent on Zookeeper watches, do not care about collections not present on the node). This future work will aim to significantly increase the scale (amount of collections) supported by SolrCloud.