Hi, has any work started on this issue? We have a deployment with a very large clusterstate.json (most of our collections are there). New collections added since our last upgrade have their own split state.json, but we still have an enormous number of collections using the shared file. We are suspicious that the large degree of contention on clusterstate.json is affecting the stability of our cluster, so we'd like to split it apart to see if things improve.
A few questions:
1) Do you think it would be safe to do this manually on a running cluster? I've only spent a few hours looking at the overseer code, but I got the impression that I might just be able to populate all the state.json nodes manually, followed by emptying clusterstate.json. That last step should tickle all the running servers, forcing a reload which will get all servers into the right separated state. At least, that's my theory. Does that sound right to you?
2) Suppose I wanted to try to write a patch for this issue to help solve it for everyone, is that a reasonable thing to attempt for someone with a lot of ZK knowledge but pretty new to Solr? Or are there a lot of subtleties?
3) Can you opine on the specifics of having an API to move the state out vs. a forced migration? From what I read on
SOLR-5473, it sounds like eventually we'd just want to force everyone into split state. Is it too "soon" to do that?
(Unrelated to this specific issue, I'm actually a committer on Apache Curator, and I have a general interest in understanding and possibly helping improve overseer's ZK interactions. Are there any docs outside of the code itself you might recommend for me to read?)