Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-3104

Potential data inconsistency due to NEWLEADER packet being sent too early during SNAP sync

    XMLWordPrintableJSON

Details

    Description

      Currently, in SNAP sync, the leader will start queuing the proposal/commits and the NEWLEADER packet before sending over the snapshot over wire. So it's possible that the zxid associated with the snapshot might be higher than all the packets queued before NEWLEADER.
       
      When the follower received the snapshot, it will apply all the txns queued before NEWLEADER, which may not cover all the txns up to the zxid in the snapshot. After that, it will write the snapshot out to disk with the zxid associated with the snapshot. In case the server crashed after writing this out, when loading the data from disk, it will use zxid of the snapshot file to sync with leader, and it could cause data inconsistent, because we only replayed partial of the historical data during previous syncing.
       
      NEWLEADER packet means the learner now has the correct and almost up to data state as leader, so it makes more sense to move the NEWLEADER packet after sending over snapshot, and this is what we did in the fix.
       
      Besides this, the socket timeout is changed to use smaller sync timeout after received NEWLEADER ack, in high write traffic ensembles with large snapshot, the follower might be timed out by leader before finishing sending over those queued txns after writing snapshot out, which could cause the follower staying in syncing state forever. Move the NEWLEADER packet after sending over snapshot can avoid this issue as well.

      Attachments

        Activity

          People

            lvfangmin Fangmin Lv
            lvfangmin Fangmin Lv
            Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 2h 10m
                2h 10m