Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-4785

Txn loss due to race condition in Learner.syncWithLeader() during DIFF sync

    XMLWordPrintableJSON

Details

    Description

      We had txn loss incident in production recently. After investigation, we found it was caused by the race condition of follower writing the current epoch and sending the ACK_LD before successfully persisting all the txns from DIFF sync in Learner.syncWithLeader() method.

      case Leader.NEWLEADER: 
              ...
              self.setCurrentEpoch(newEpoch);
              writeToTxnLog = true;
              //Anything after this needs to go to the transaction log, not applied directly in memory
              isPreZAB1_0 = false;
      
              // ZOOKEEPER-3911: make sure sync the uncommitted logs before commit them (ACK NEWLEADER).
              sock.setSoTimeout(self.tickTime * self.syncLimit);
              self.setSyncMode(QuorumPeer.SyncMode.NONE);
              zk.startupWithoutServing();
              if (zk instanceof FollowerZooKeeperServer) {
                  FollowerZooKeeperServer fzk = (FollowerZooKeeperServer) zk;
                  for (PacketInFlight p : packetsNotCommitted) {
                    fzk.logRequest(p.hdr, p.rec, p.digest);
                  }
                  packetsNotCommitted.clear();
              }
      
              writePacket(new QuorumPacket(Leader.ACK, newLeaderZxid, null, null), true);
              break;
          }
      

      In this method, when follower receives the NEWLEADER msg, the current epoch is updated before writing the uncommitted txns to the disk and writing txns is done asynchronously by the SyncThreadd. If follower crashes after setting the current epoch and sending ACK_LD and before all transactions are successfully written to disk, transactions loss can happen.

      This is because leader election is based on epoch first and then transaction id. When the follower becomes a leader because it has highest epoch, it will ask the other followers to truncate txns even they have been written to disk, causing data loss.

      The following is the scenario

      1. Leader election happened
      2. A follower synced with Leader via DIFF, received committed proposals from leader and kept them in memory
      3. The follower received the NEWLEADER message
      4. The follower updated the newEpoch
      5. The follower was bounced before writing all the uncommitted txns to disk
      6. Leader shutdown and a new election triggered
      7. Follower became the new leader because it has largest currentEpoch
      8. New leader asked other followers to truncate their committed txns and transactions got lost

      Attachments

        Issue Links

          Activity

            People

              li4wang Li Wang
              li4wang Li Wang
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1.5h
                  1.5h