So if I am understanding this we still have the live_nodes but we'd remove the information under collections in favor of a single node which stores the cloud state in some XML format, correct?
I don't quite think so. The problem with this is three-fold.
A) It is very useful to have each ZK structure contain information that flows generally in one direction so that the updates to the information are relatively easy to understand.
B) It is important to have at least a directory containing ephemeral files to describe which nodes are currently live.
C) It is nice to separate information bound for different destinations so that the number of notifications is minimized.
With these considerations in mind, I think it is better to have three basic structures:
1) A directory of per node assignments to be written by the overseer but watched and read by each node separately. By (A), this should only be written by the overseer and only read by the nodes. By (C) the assignment for each node should be in a separate file in ZK.
2) A directory of per node status files to be created and updated by the nodes, but watched and read by the overseer. By (A), the nodes write and the overseer reads. By (B) and (C), there is an ephemeral file per node. The files should be named with a key that can be used to find the corresponding status clause in the cluster status file. The contents of the files should probably be the status clause itself if we expect the overseer to update the cluster file or can be empty if the nodes update the status file.
3) A composite cluster status file created and maintained by the overseer, but possibly also updated by nodes. The overseer will have to update this when nodes disappear, but it would avoid data duplication to have the nodes directly write their status to the cluster status file in slight violation of (A), but we can pretend that only the nodes update the file and the clients read and watch it with the overseer only stepping in to do updates that crashing nodes should do. Neither (B) nor (C) since this information is bound for all clients.
Note that no locks are necessary. Updates to (1) are only by the overseer. Updates to (2) are only by the nodes and no two nodes will touch the same files. Updates to (3) are either by the nodes directly with a versioned update or by the overseer when it sees changes to (2). The former strategy is probably better.
The process would then be in ZkController.register that the new solr instance gets the current state, attempts to add its information to the file and do an update to that particular version, if the update fails (presumably because another solr instance has modified the node resulting in a version bump) we'd simply repeat this.
But the new nodes would also need to create their corresponding ephemeral file.
Would something like this work for the XML format? We could then iterate through this and update the CloudState locally whenever a change to this has happened.
<str name="url">http: <str name="node_name">node</str>
I think that this is good in principle, but there are few nits that I would raise.
First, common usage outside Solr equates slice and shard. As such, to avoid confusion by outsiders, I would recommend that shard here be replaced by the term "replica" since that seems to be understood inside the Solr community to have the same meaning as outside.
Secondly, is the "props" element necessary? It seems excessive to me.
Thirdly, shouldn't there be a list of IP addresses that might be used to reach the node? I think that it is good practice to have multiple addresses for the common case where a single address might not work. This happens, for instance, with EC2 where each node has internal and external addresses that have different costs and accessibility. In other cases, there might be multiple paths to a node and it might be nice to use all of them for failure tolerance. It is not acceptable to list the node more than once with different IP addresses because that makes it look like more than one node which can cause problems with load balancing.
Fourthly, I think it should be clarified that the node_name should be used as the key in the live_nodes directory for the ephemeral file corresponding to the node.
If this seems reasonable I will attempt to make this change.
Pretty close to reasonable in my book.