Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-4708

ZooKeeper 3.6.4 quorum failing due to <unresolved> address

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.6.4, 3.8.1
    • None
    • None

    Description

      We work on the Strimzi project which is about deploying an Apache Kafka cluster on Kubernetes together with a ZooKeeper ensamble.

      Until ZooKeeper version 3.6.3 (brought by Kafka 3.4.0), there were no issues when running on minikube for development purposes.

      With using ZooKeeper version 3.6.4 (brought by Kafka 3.4.1), we started to have issues during the quorum formation and leader election.

      The first one was about ZooKeeper pods not able to bind the quorum port 3888 to the Cluster IP but during the DNS resolution they get the loopback address instead.
      Following a possible log at ZooKeeper startup where you can see the binding at 127.0.0.1:3888 instead of something like 172.17.0.4:3888 (so getting a valid not loopback IP address).

       

      INFO 3 is accepting connections now, my election bind port: my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/127.0.0.1:3888 (org.apache.zookeeper.server.quorum.QuorumCnxManager) [ListenerHandler-my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/127.0.0.1:3888]

      This specific issue had two solutions: using quorumListenOnAllIPs=true on ZooKeeper configuration or binding to 0.0.0.0 address. 

      Anyway it is actually not clear why it wasn't needed until 3.6.3, but needed for getting 3.6.4 working. What is changed from this perspective?

      Said that, While binding to 0.0.0.0 seems to work fine, using the quorumListenOnAllIPs=true doesn't.

      Assuming a ZooKeeper ensamble with 3 nodes, Getting the log of the current ZooKeeper leader (ID=3) we see the following.
      (Starting with ** you can see some additional logs added to org.apache.zookeeper.server.quorum.Leader#getDesignatedLeader in order to get more information.)

      2023-06-19 12:32:51,990 INFO Have quorum of supporters, sids: [[1, 3],[1, 3]]; starting up and setting last processed zxid: 0x100000000 (org.apache.zookeeper.server.quorum.Leader) [QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
      2023-06-19 12:32:51,990 INFO ** newQVAcksetPair.getQuorumVerifier().getVotingMembers().get(self.getId()).addr = my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/172.17.0.6:2888 (org.apache.zookeeper.server.quorum.Leader) [QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
      2023-06-19 12:32:51,990 INFO ** self.getQuorumAddress() = my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/<unresolved>:2888 (org.apache.zookeeper.server.quorum.Leader) [QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
      2023-06-19 12:32:51,992 INFO ** qs.addr my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/172.17.0.6:2888, qs.electionAddr my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/172.17.0.6:3888, qs.clientAddr/127.0.0.1:12181 (org.apache.zookeeper.server.quorum.QuorumPeer) [QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
      2023-06-19 12:32:51,992 DEBUG zookeeper (org.apache.zookeeper.common.PathTrie) [QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
      2023-06-19 12:32:51,993 WARN Restarting Leader Election (org.apache.zookeeper.server.quorum.QuorumPeer) [QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)] 

      So the leader is ZooKeeper with ID=3 and it was ACKed by the ZooKeeper node ID=1.
      As you can see we are in the Leader#startZkServer method, and because of the reconfiguration enabled, the designatedLeader is processed. The problem is that the Leader#getDesignatedLeader is not returning “self” as leader but another one (ID=1), because of the difference in the quorum address.
      From the above log, it’s not an actual difference in terms of addresses but the self.getQuorumAddress() is returning an <unresolved> (even if it’s still the same hostname related to ZooKeeper-2 instance). This difference causes the allowedToCommit=false, meanwhile the ZooKeeper-2 is still reported as leader but it’s not able to commit, so prevents any requests and the ZooKeeper ensemble gets stuck.

      2023-06-19 12:32:51,996 WARN Suggested leader: 1 (org.apache.zookeeper.server.quorum.QuorumPeer) [QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
      2023-06-19 12:32:51,996 WARN This leader is not the designated leader, it will be initialized with allowedToCommit = false (org.apache.zookeeper.server.quorum.Leader) [QuorumPeer[myid=3](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)] 

      The overall issue could be related to DNS problems, with DNS records not registered yet during pod initialization (where ZooKeeper is running on Kubernetes). But we don’t understand why it’s not able to recover somehow.

      What we don't get a reason is why ZooKeeper 3.6.3 didn't need any binding specific configuration and was working just fine, while the new 3.6.4 needs that. Other than that, why we see the above described issue after the binding works fine.

      We also tried by using latest ZooKeeper 3.8.1 facing the same issue.

      It's worth mentioning that the issue wasn't reproducible in environment like AKS or OpenShift, but if it's somehow slow DNS related, it could happen anyway (even not on minikube as in our use case).

      FYI, the discussion we had in the upstream Strimzi community https://github.com/strimzi/strimzi-kafka-operator/issues/8675

      There is also a thread on the ZooKeeper users mailing list  https://lists.apache.org/thread/vsff39wpsrxb431hdw9db4np12vytdpm

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ppatierno Paolo Patierno
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 5.5h
                  5.5h