Flavio Junqueira, thanks for the clarification. If broker A shrinks the partition ISR in ZK before step 2, then broker A's ZK session expires, then, broker sends the shrunk ISR to broker C, two things can happen. (1) C has already received requests from the new controller B. In this case, A's request will be rejected. However, since the new controller B is re-elected after broker A shrinks the ISR in ZK and the new controller read the latest ISR from ZK on initialization, B will send the latest ISR to broker C. (2) C hasn't received any request from the new controller B. In this case, A's request will be accepted. The new controller B will later send the same ISR to broker B, but that's fine. So, in either case, we are covered.
The problem in the description is really caused by broker A changing ZK after its session expires. So, it seems the fix would be the following. If the controller (say A) hits a ZK ConnectionLoss event while reading/writing to ZK, it will pause the operation. Two possibilities can follow. In the case when the controller A's ZK session expires, it will just ignore all the outstanding ZK events. This guarantees that controller A can't touch ZK any more after a new controller is elected (which has to happen after controller A's SessionExpiration event). So, the new controller is guaranteed to read the latest ZK data, act on this, and send the latest info to the broker. This would avoid the issue in the description.
In the second case, controller A will get a SyncConnected event. In this case, does controller A just resume from where it's left off? Or does it ignore all outstanding events and re-read all subscribed ZK paths (since there could be missing events between the connection loss event and the SyncConnected event)?
Finally, ZkClient actually hides the ZK ConnectionLoss event and only informs the application when the ZK session expires. To pursue this, we will have to access ZK directly.