Here's a patch which uses ZooKeeper 'multi' transactions to make sure that the LIR state can be set only when the requesting leader node is still alive. This ensures that regardless of how long the network partition lasts (long GC, whatever), the node setting the LIR state must be the leader or else the LIR state cannot be set.
Initially I attempted to use the shard leader path as the 'exists' check in the 'multi' command but this doesn't work because the leader path is always created fresh which means that it's version is always 0 and the check always succeeds regardless of who the current leader is. This is why we must use the election's leader sequence path.
This is just a first cut of this approach. I intend to refactor some of these LIR methods – they have become too big. I will also write a test which exercises these new transactional semantics and reproduces the failure.
Edit - I also remove the replicaUrl parameter from ZkController.ensureReplicaInLeaderInitiatedRecovery because replicaProps were already being passed as a parameter and the replica url can be derived from it.