1) bq "Actually Overseer already has code that catches all exceptions and refreshes the cluster state.
In that case, I would suggest an explicit catch block there for BadVersionException, and log something milder than error? The current KeeperException catch block always logs an error.
2) Okay, I totally buy that argument! But see note below.
3) I'm merely saying that the refactoring that would be necessary to make ZkStateWriter more easily testable might negatively impact the flow of the existing code. I'm not sure, you'd have to try it.
Having stewed on this a bit, I think there's somewhat of a fundamental disconnect in how ZkStateWriter's batching interacts with the Overseer work queue.
Work items get removed from the queue before the corresponding change is actually written out to the cluster state. You can see in the original design that there's this neat peek->poll dance in the Overseer loop that attempts to enforce guaranteed state updates by not discarding the work item until after the state is written out. But the batching implementation gets rid of this guarantee, and that's why I perceive we're now in a state where the overseer updates and external updates can even be in conflict with each other.
Imagine (as a thought experiment) we got rid of batching. If we did that, no external change could "lose" a work item, because we'd be committing one item at a time, so the retry operation on a bad version exception would always re-grab the most recent, unapplied work item. Now obviously, we don't want to get rid of batching, because efficiency. But I really do think we're batching in the wrong place. I think batching actually needs to happen in Overseer, because it has to be tied to discarding work items. Ideally, we'd peek N work items from the head of the queue, setup all the pending updates / cluster state mods in ZkStateWriter, then try to commit everything. If it succeeds, great, remove all the processed items from the queue. If it fails, then reread cluster state and retry all the items again.
Make any sense?
(Now, all that said, that work may be more effort than it's worth, and maybe we should just focus on not having to use the queue to make cluster state updates in the format v2 world.)