ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-1154

Data inconsistency when the node(s) with the highest zxid is not present at the time of leader election

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 3.3.3
    • Fix Version/s: 3.3.4, 3.4.0
    • Component/s: quorum
    • Labels:
      None

      Description

      If a participant with the highest zxid (lets call it A) isn't present during leader election, a participant with a lower zxid (say B) might be chosen as a leader. When A comes up, it will replay the log with that higher zxid. The change that was in that higher zxid will only be visible to the clients connecting to the participant A, but not to other participants.

      I was able to reproduce this problem by
      1. connect debugger to B and C and suspend them, so they don't write anything
      2. Issue an update to the leader A.
      3. After a few seconds, crash all servers (A,B,C)
      4. Start B and C, let the leader election take place
      5. Start A.
      6. You will find that the update done in step 2 is visible on A but not on B,C, hence the inconsistency.

      Below is a more detailed analysis of what is happening in the code.

      Initial Condition
      1. Lets say there are three nodes in the ensemble A,B,C with A being the leader
      2. The current epoch is 7.
      3. For simplicity of the example, lets say zxid is a two digit number, with epoch being the first digit.
      4. The zxid is 73
      5. All the nodes have seen the change 73 and have persistently logged it.

      Step 1
      Request with zxid 74 is issued. The leader A writes it to the log but there is a crash of the entire ensemble and B,C never write the change 74 to their log.

      Step 3
      B,C restart, A is still down
      B,C form the quorum
      B is the new leader. Lets say B minCommitLog is 71 and maxCommitLog is 73
      epoch is now 8, zxid is 80
      Request with zxid 81 is successful. On B, minCommitLog is now 71, maxCommitLog is 81

      Step 4
      A starts up. It applies the change in request with zxid 74 to its in-memory data tree
      A contacts B to registerAsFollower and provides 74 as its ZxId
      Since 71<=74<=81, B decides to send A the diff. B will send to A the proposal 81.

      Problem:
      The problem with the above sequence is that A's data tree has the update from request 74, which is not correct. Before getting the proposals 81, A should have received a trunc to 73. I don't see that in the code. If the maxCommitLog on B hadn't bumped to 81 but had stayed at 73, that case seems to be fine.

      1. ZOOKEEPER-1154.patch
        15 kB
        Camille Fournier
      2. ZOOKEEPER-1154.patch
        12 kB
        Vishal Kathuria
      3. ZOOKEEPER-1154.patch
        12 kB
        Vishal Kathuria
      4. ZOOKEEPER-1154.patch
        13 kB
        Vishal Kathuria

        Activity

        Vishal Kathuria created issue -
        Mahadev konar made changes -
        Field Original Value New Value
        Priority Critical [ 2 ] Blocker [ 1 ]
        Vishal Kathuria made changes -
        Attachment ZOOKEEPER-1154.patch [ 12491016 ]
        Vishal Kathuria made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Vishal Kathuria made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Vishal Kathuria made changes -
        Attachment ZOOKEEPER-1154.patch [ 12491350 ]
        Vishal Kathuria made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Patrick Hunt made changes -
        Assignee Vishal Kathuria [ vishal.k ]
        Patrick Hunt made changes -
        Fix Version/s 3.3.4 [ 12316276 ]
        Vishal Kathuria made changes -
        Attachment ZOOKEEPER-1154.patch [ 12492776 ]
        Camille Fournier made changes -
        Attachment ZOOKEEPER-1154.patch [ 12492846 ]
        Camille Fournier made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Mahadev konar made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Vishal Kathuria
            Reporter:
            Vishal Kathuria
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 504h
              504h
              Remaining:
              Remaining Estimate - 504h
              504h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development