when comparing zxids for leader election, the current epoch of the peer needs to be taken into account.
Can you please elaborate the situations where the epoch is not currectly accounted for?
This is actually part of the changes that we are proposing to the Zab implementation, detailed in the wiki, and this particular sub-task is related to a change in leader election. The election needs to take into account not only the highest zxid, but also the epoch of the leader, which might not be the same as the epoch of the last zxid.
Preliminary patch. It is backward compatible, and tests pass for me. Also, FLELostMessageTest has notification messages hardcoded, which shows that it is backward compatible, but we probably want to implement a little more testing.
New version of the patch. I have cleaned up a bit and added an explicit FLEBackwardCompatibilityTest class. It is essentially a copy of FLELostMessageTest, but with a hardcoded messages that look like the current notifications in the trunk code.
Also note that there are no modifications to LeaderElection, so LETest passing indicates that it is backward compatible with respect to that implementation of leader election.
Integrated in ZooKeeper-trunk #1215 (See https://builds.apache.org/job/ZooKeeper-trunk/1215/)