Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.8.0, 3.7.1, 3.8.1, 3.7.2, 3.8.2, 3.9.1
Description
We had txn loss incident in production recently. After investigation, we found it was caused by the race condition of follower writing the current epoch and sending the ACK_LD before successfully persisting all the txns from DIFF sync in Learner.syncWithLeader() method.
case Leader.NEWLEADER: ... self.setCurrentEpoch(newEpoch); writeToTxnLog = true; //Anything after this needs to go to the transaction log, not applied directly in memory isPreZAB1_0 = false; // ZOOKEEPER-3911: make sure sync the uncommitted logs before commit them (ACK NEWLEADER). sock.setSoTimeout(self.tickTime * self.syncLimit); self.setSyncMode(QuorumPeer.SyncMode.NONE); zk.startupWithoutServing(); if (zk instanceof FollowerZooKeeperServer) { FollowerZooKeeperServer fzk = (FollowerZooKeeperServer) zk; for (PacketInFlight p : packetsNotCommitted) { fzk.logRequest(p.hdr, p.rec, p.digest); } packetsNotCommitted.clear(); } writePacket(new QuorumPacket(Leader.ACK, newLeaderZxid, null, null), true); break; }
In this method, when follower receives the NEWLEADER msg, the current epoch is updated before writing the uncommitted txns to the disk and writing txns is done asynchronously by the SyncThreadd. If follower crashes after setting the current epoch and sending ACK_LD and before all transactions are successfully written to disk, transactions loss can happen.
This is because leader election is based on epoch first and then transaction id. When the follower becomes a leader because it has highest epoch, it will ask the other followers to truncate txns even they have been written to disk, causing data loss.
The following is the scenario
1. Leader election happened
2. A follower synced with Leader via DIFF, received committed proposals from leader and kept them in memory
3. The follower received the NEWLEADER message
4. The follower updated the newEpoch
5. The follower was bounced before writing all the uncommitted txns to disk
6. Leader shutdown and a new election triggered
7. Follower became the new leader because it has largest currentEpoch
8. New leader asked other followers to truncate their committed txns and transactions got lost
Attachments
Issue Links
- causes
-
ZOOKEEPER-4808 Fix the log statement in FastLeaderElection
- Resolved
- supercedes
-
ZOOKEEPER-3023 Flaky test: org.apache.zookeeper.server.quorum.Zab1_0Test.testNormalFollowerRunWithDiff
- Resolved
-
ZOOKEEPER-4643 Committed txns may be improperly truncated if follower crashes right after updating currentEpoch but before persisting txns to disk
- Resolved
-
ZOOKEEPER-4646 Committed txns may still be lost if followers crash after replying ACK of NEWLEADER but before writing txns to disk
- Resolved
-
ZOOKEEPER-4541 Ephemeral znode owned by closed session visible in 1 of 3 servers
- Resolved
- links to