Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-2791

Quorum doesn't recover after zxid rollover

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.3.6, 3.4.8
    • Fix Version/s: None
    • Component/s: leaderElection, quorum
    • Labels:
      None
    • Environment:

      Ubuntu 14.04.4 LTS, AWS EC2, 5 node ensembles

      Description

      When zxid rolls over the ensemble is unable to recover without manually restarting the cluster. The leader enters shutdown() state when zxid rolls over, but the remaining four nodes in the ensemble are not able to re-elect a new leader. This state has persisted for at least 15 minutes before an operator manually restarted the cluster and the ensemble recovered.

      Config:
      --------
      tickTime=2000
      initLimit=10
      syncLimit=5
      dataDir=/raid0/zookeeper
      clientPort=2181
      maxClientCnxns=100
      autopurge.snapRetainCount=14
      autopurge.purgeInterval=24
      leaderServes: True
      server.7=172.26.134.88:2888:3888
      server.6=172.26.136.143:2888:3888
      server.5=172.26.135.103:2888:3888
      server.4=172.26.134.16:2888:3888
      server.9=172.26.135.19:2888:3888

      Logs:

      https://gist.github.com/mheffner/d615d358d4a360ae56a0d0a280040640

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                abrahamfine Abraham Fine
                Reporter:
                mheffner Mike Heffner
              • Votes:
                1 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated: