Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-313

Problem with successive leader failures when no client is connected

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 3.0.0, 3.0.1
    • Fix Version/s: 3.1.1
    • Component/s: server
    • Labels:
      None
    • Environment:

      all

      Description

      Steps to reproduce:

      Create a 3 node cluster . Run some transactions and then stop all clients. Make sure no other clients connect for the duration of the test.

      Let L1 be the current leader. Bring down L1. Let L2 be the leader chosen. Let the third node be N3. Note that this will increase the txn id for N3's snapshot without any transaction being logged. Now bring up L1 – same will happen for L1. Now bring down L2.

      Both N3 and L1 now have snapshots with a transaction id greater than the last logged transaction. Whoever is elected leader will try to restore its state from the filesystem and fail.

      One easy workaround is obviously to change the FileTxnSnapLog not to save a snapshot if zxid > last logged zxid. The correct solution is possibly to log a transaction for leader election as well.

        Attachments

          Activity

            People

            • Assignee:
              mahadev Mahadev konar
              Reporter:
              sbera Sunanda Bera
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: