Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-4708 ZooKeeper 3.6.4 quorum failing due to <unresolved> address
  3. ZOOKEEPER-4728

Zookeepr cannot bind to itself forever if DNS is not ready when startup

    XMLWordPrintableJSON

Details

    Description

      Note: This issue also happened in the latest `master` branch

       

      When the leader tried to bind the host/IP to get connection from followers, if the DNS is not ready at first, it'll always stay in <unresolved> state forever. The error log is like this:

       

      2023-07-26 00:25:25,251 ERROR Couldn't bind to localhost1/<unresolved>:2888 (org.apache.zookeeper.server.quorum.Leader) [QuorumPeer[myid=1]]java.net.SocketException: Unresolved address    at java.base/java.net.ServerSocket.bind(ServerSocket.java:380)    at java.base/java.net.ServerSocket.bind(ServerSocket.java:342)    at org.apache.zookeeper.server.quorum.Leader.createServerSocket(Leader.java:315)    at org.apache.zookeeper.server.quorum.Leader.lambda$new$0(Leader.java:294)    at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)    at java.base/java.util.concurrent.ConcurrentHashMap$KeySpliterator.forEachRemaining(ConcurrentHashMap.java:3573)    at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)    at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)    at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)    at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)    at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)    at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)    at org.apache.zookeeper.server.quorum.Leader.<init>(Leader.java:297)    at org.apache.zookeeper.server.quorum.QuorumPeer.makeLeader(QuorumPeer.java:1272)    at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1479)2023-07-26 00:25:25,252 WARN Unexpected exception (org.apache.zookeeper.server.quorum.QuorumPeer) [QuorumPeer[myid=1]]java.io.IOException: Leader failed to initialize any of the following sockets: [localhost1/<unresolved>:2888]    at org.apache.zookeeper.server.quorum.Leader.<init>(Leader.java:300)    at org.apache.zookeeper.server.quorum.QuorumPeer.makeLeader(QuorumPeer.java:1272)    at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1479) 

       

       

      This repeatedly appear and never successfully bind to the address, so the quorum never formed.

       

      Reproduce steps:

      1. setup 1 zookeeper node, and set the zookeeper connection config as:

      server.1=localhost1:2888:3888

      Note, it's "localhost1"

      2. startup the zookeeper node, it'll show the `Exception while listening` error , as well as the `Couldn't bind to localhost1/<unresolved>:2888 ` error like above. This is to simulate the DNS is not ready when zookeeper startup. It's quite common in k8s environment.

      3. edit /etc/hosts, map `localhost1` into `127.0.0.1`

      4. You can see the log, the `Exception while listening` error is gone, but `Couldn't bind to localhost1/<unresolved>:2888 ` still keeps appearing, and the quorum never formed.

       

      Note: The `Exception while listening` can be self-healing is because it re-resolve the hostname each time it tried to bind the hostname. So we should apply the same solution to the leader binding. (i.e. ZOOKEEPER-3991)

      Attachments

        Issue Links

          Activity

            People

              showuon Luke Chen
              showuon Luke Chen
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 50m
                  1h 50m