Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Critical Critical
    • Resolution: Unresolved
    • Affects Version/s: 3.4.6
    • Fix Version/s: None
    • Component/s: leaderElection
    • Labels:
      None
    • Environment:

      Ubuntu 12.04, OpenJDK 1.6

      Description

      In 3-node cluster, when there are 2 nodes die and reboot during leader election, it might lead to the case that there are 2 leaders happen in the system. Eventually, a leader that does not has follower supports and quit being leader, but it makes us lose some availability.

      I am building a tools that can reorder messages and disk write, and also inject node crash to the system and found this bug.
      These are the step of events that my tools execute in sequence that lead to 2 leaders at the end.
      My zookeeper nodes have id = 0,1,2

      packetsend from=0 to=1 state=0 leader=0 zxid=0 electionEpoch=1 peerEpoch=0
      packetsend from=0 to=2 state=0 leader=0 zxid=0 electionEpoch=1 peerEpoch=0
      packetsend from=2 to=0 state=0 leader=2 zxid=0 electionEpoch=1 peerEpoch=0
      packetsend from=2 to=1 state=0 leader=2 zxid=0 electionEpoch=1 peerEpoch=0
      packetsend from=1 to=0 state=0 leader=1 zxid=0 electionEpoch=1 peerEpoch=0
      packetsend from=1 to=2 state=0 leader=1 zxid=0 electionEpoch=1 peerEpoch=0
      packetsend from=1 to=0 state=0 leader=2 zxid=0 electionEpoch=1 peerEpoch=0
      packetsend from=0 to=1 state=0 leader=2 zxid=0 electionEpoch=1 peerEpoch=0
      packetsend from=1 to=2 state=0 leader=2 zxid=0 electionEpoch=1 peerEpoch=0
      packetsend from=0 to=2 state=0 leader=2 zxid=0 electionEpoch=1 peerEpoch=0
      diskwrite nodeId=0 write=currentEpoch
      nodecrash id=0
      nodecrash id=1
      nodestart id=0
      nodestart id=1
      diskwrite nodeId=2 write=currentEpoch
      packetsend from=2 to=0 state=0 leader=2 zxid=0 electionEpoch=1 peerEpoch=0
      packetsend from=0 to=2 state=0 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
      packetsend from=2 to=1 state=0 leader=2 zxid=0 electionEpoch=1 peerEpoch=0
      packetsend from=0 to=1 state=0 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
      packetsend from=1 to=0 state=0 leader=1 zxid=0 electionEpoch=1 peerEpoch=0
      packetsend from=1 to=2 state=0 leader=1 zxid=0 electionEpoch=1 peerEpoch=0
      packetsend from=2 to=0 state=2 leader=2 zxid=0 electionEpoch=1 peerEpoch=1
      packetsend from=1 to=0 state=0 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
      packetsend from=2 to=1 state=2 leader=2 zxid=0 electionEpoch=1 peerEpoch=1
      packetsend from=1 to=2 state=0 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
      packetsend from=2 to=1 state=2 leader=2 zxid=0 electionEpoch=1 peerEpoch=1
      packetsend from=1 to=0 state=0 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
      packetsend from=1 to=2 state=0 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
      packetsend from=0 to=1 state=2 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
      packetsend from=2 to=1 state=2 leader=2 zxid=0 electionEpoch=1 peerEpoch=1
      packetsend from=1 to=0 state=0 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
      packetsend from=1 to=2 state=0 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
      packetsend from=0 to=1 state=2 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
      packetsend from=2 to=1 state=2 leader=2 zxid=0 electionEpoch=1 peerEpoch=1
      packetsend from=1 to=0 state=0 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
      packetsend from=1 to=2 state=0 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
      packetsend from=0 to=1 state=2 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
      packetsend from=2 to=0 state=0 leader=2 zxid=0 electionEpoch=2 peerEpoch=1
      packetsend from=2 to=1 state=0 leader=2 zxid=0 electionEpoch=2 peerEpoch=1
      packetsend from=0 to=2 state=2 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
      packetsend from=2 to=0 state=0 leader=2 zxid=0 electionEpoch=2 peerEpoch=1
      packetsend from=1 to=0 state=0 leader=2 zxid=0 electionEpoch=2 peerEpoch=1
      packetsend from=1 to=2 state=0 leader=2 zxid=0 electionEpoch=2 peerEpoch=1
      packetsend from=2 to=1 state=0 leader=2 zxid=0 electionEpoch=2 peerEpoch=1
      packetsend from=0 to=2 state=2 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
      packetsend from=2 to=0 state=0 leader=2 zxid=0 electionEpoch=2 peerEpoch=1
      packetsend from=0 to=1 state=2 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
      packetsend from=2 to=1 state=0 leader=2 zxid=0 electionEpoch=2 peerEpoch=1
      packetsend from=0 to=2 state=2 leader=0 zxid=0 electionEpoch=1 peerEpoch=1
      diskwrite nodeId=2 write=currentEpoch
      diskwrite nodeId=1 write=currentEpoch

      1. conf.zip
        0.9 kB
        Tanakorn Leesatapornwongsa
      2. log.zip
        15 kB
        Tanakorn Leesatapornwongsa

        Activity

        Hide
        Tanakorn Leesatapornwongsa added a comment -

        Attach log files and config files

        Show
        Tanakorn Leesatapornwongsa added a comment - Attach log files and config files
        Hide
        Flavio Junqueira added a comment -

        Thanks for reporting this. Having two servers thinking they could be leaders is not really a bug. It is a bug if we have two servers leading and both having quorums of supporters concurrently, which your case doesn't seem to imply.

        If you can find a way of reducing the amount of time we have concurrent leaders, then fine with me, but in my experience it doesn't happen often.

        Your tool sounds like a cool one!

        Show
        Flavio Junqueira added a comment - Thanks for reporting this. Having two servers thinking they could be leaders is not really a bug. It is a bug if we have two servers leading and both having quorums of supporters concurrently, which your case doesn't seem to imply. If you can find a way of reducing the amount of time we have concurrent leaders, then fine with me, but in my experience it doesn't happen often. Your tool sounds like a cool one!

          People

          • Assignee:
            Unassigned
            Reporter:
            Tanakorn Leesatapornwongsa
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Development