ZkStateWriter is basically a write cache. It should be much simpler than it is. A few things that bug me in no particular order:
1) Tracking lastStateFormat / lastCollectionName and in general having a maybeFlushBefore / maybeFlushAfter makes no real sense to me. If ZkStateWriter were capable of operating as a perfect write cache, the content of what's being written should never force a flush. It should be able to just always keep queuing operations until the desired time delay is hit, or it's flushed from the outside.
2) ZkStateWriter's ClusterState liveNodes should probably be a view on ZkStateReader's ClusterState liveNode.
3) ZkWriteCallback - the one place this is used is the Overseer stateUpdateQueue handling. I think the way that loop works would ZkStateWriter could be done a little better. Ideally, I would want to peek up to N children at a time from that queue, send them all through ZkStateWriter in succession, flush, then remove those N items from the stateUpdateQueue. If the flush failed from some reason, it could return a count of items committed so we could remove that many items from the stateUpdateQueue. It seems a little nuts to have a second workQueue in operation the way it is today. I get that in some situations we'd end up doing more net cluster state writes, but I think we'd still do fewer net writes to ZK since we do so much queue management.