Hi, Patrick from ZooKeeper here. Great start on this (jira and wiki)! Wanted to point out a few things (a bit of a brain dump but...). Please keep in mind that I know very little of the inner working of solr itself, so pardon any dumb questions :
As I mentioned previously please take advantage of the work we've been doing/documenting with the hbase team:
In particular they have been learning a hard lesson wrt how ZooKeeper sessions work. When you establish a session you specify the "timeout" parameter (I see you have default listed as 10 sec which is great). Be aware that this controls two things: 1) the client timeout, 2) the session timeout. The client is heartbeating the server to keep the session alive. Every 1/3 the timeout the client sends a heartbeat. If the client does not hear back in an additional 1/3 of the timeout it will consider the server unavailable and attempt to connect to another server in the cluser (based on the server list you provided during session creation). If the server does not hear from the client within the timeout period it will consider the client unavailable and cleanup the session. This includes deleting any ephemeral nodes owned by the expired session. This has been an issue for the hbase team - in particular the issue they have faced is that the JVM GC can pause the ZK client vm (hbase region server) for >> 1 minute (in some cases we saw 4 minutes). If this is an issue for you (does solr ever see pauses like this?) we may need to discuss.
I think you are correct in having 2 distinct configurations; 1) ZK cluster (ensemble or standalone) configuration, 2) your client configuration
I see this in your example:
which is great for "sole quickstart with zk" - basically this would be a "standalone" zk installation, vs something like
which a user might run for a production system (supporting a single point of failure, ie 1 ZK server can go down and the cluster will still be available)
You should think now about how users will interact with the system once ZK is introduced. In particular troubleshooting. This is an issue that has been vexing hbase as well - how to educate and support users. How to provide enough information, but not too much (ie "go learn zk") to troubleshoot basic problems such as mis-configuration.
Will "ZooKeeperAwareShardHandler" set watches? Some of the text on the wiki page implies watches to monitor state in zk, it would be good to call this out explicitly.
I saw mention of "ZKClient", does this just mean the "official" ZooKeeper client/class we ship with the release, your own wrapper, or something else?
I also saw this comment "(this is dependent on ZooKeeper supporting getFirstChild() which it currently does not)".
We have no plans to add this in the near future afaik (there is a jira for something similar but I'm not aware of anyone working on it recently) – however typically this can be done through the use of the sequential flag.
1) Create your znodes with the sequential flag
2) to "getFirstChild()" just call "getChildren()" and sort on the sequence number
will this work for you? (this is the simplest I could think of, there are other options if this doesn't work that we could discuss)
What does this refer to? "The trick here is how to keep all the masters in a group in sync" Something that ZK itself could help to mediate?