Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-2033

zookeeper follower fails to start after a restart immediately following a new epoch

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.4.6
    • Fix Version/s: 3.4.7
    • Component/s: quorum
    • Labels:
      None

      Description

      The following issue was seen when adding a new node to a zookeeper cluster.

      Reproduction steps
      1. Create a 2 node ensemble. Write some keys.
      2. Add another node to the ensemble, by modifying the config. Restarting entire cluster.
      3. Restart the new node before writing any new keys.

      What occurs is that the new node gets a SNAP from the newly elected leader, since it is too far behind. The zxid for this snapshot is from the new epoch but that is not in the committed log cache.

      On restart of this new node. The follower sends the new epoch zxid. The leader looks at it's maxCommitted logs, and sees that it is not the newest epoch, and therefore sends a TRUNC.

      The follower sees the TRUNC but it only has a snapshot, so it cannot truncate!

        Attachments

        1. ZOOKEEPER-2033-3.4.patch
          4 kB
          Asad Saeed
        2. ZOOKEEPER-2033.patch
          3 kB
          Asad Saeed

          Activity

            People

            • Assignee:
              asadpanda Asad Saeed
              Reporter:
              asadpanda Asad Saeed
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: