Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-2791

Quorum doesn't recover after zxid rollover

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.3.6, 3.4.8
    • None
    • leaderElection, quorum
    • None
    • Ubuntu 14.04.4 LTS, AWS EC2, 5 node ensembles

    Description

      When zxid rolls over the ensemble is unable to recover without manually restarting the cluster. The leader enters shutdown() state when zxid rolls over, but the remaining four nodes in the ensemble are not able to re-elect a new leader. This state has persisted for at least 15 minutes before an operator manually restarted the cluster and the ensemble recovered.

      Config:
      --------
      tickTime=2000
      initLimit=10
      syncLimit=5
      dataDir=/raid0/zookeeper
      clientPort=2181
      maxClientCnxns=100
      autopurge.snapRetainCount=14
      autopurge.purgeInterval=24
      leaderServes: True
      server.7=172.26.134.88:2888:3888
      server.6=172.26.136.143:2888:3888
      server.5=172.26.135.103:2888:3888
      server.4=172.26.134.16:2888:3888
      server.9=172.26.135.19:2888:3888

      Logs:

      https://gist.github.com/mheffner/d615d358d4a360ae56a0d0a280040640

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            abrahamfine Abraham Fine Assign to me
            mheffner Mike Heffner

            Dates

              Created:
              Updated:

              Slack

                Issue deployment