Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-9531

java.net.UnknownHostException loop on VM rolling update using CNAME

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.4.0
    • Fix Version/s: None
    • Labels:
      None

      Description

      Hello,

       

      My cluster setup in based on VMs behind DNS CNAME .

      Example:  node.internal is a CNAME to either nodeA.internal or nodeB.internal

      Since kafka-client 1.2.1,  it has been observed that sometimes Kafka clients get stuck on a loop with the exception:
      Example after nodeB.internal is replaced with nodeA.internal 

       

      2020-02-10T12:11:28.181Z o.a.k.c.NetworkClient [WARN]    - [Consumer clientId=consumer-6, groupId=consumer.group] Error connecting to node nodeB.internal:9092 (id: 2 rack: null)
      java.net.UnknownHostException: nodeB.internal:9092
      	at java.net.InetAddress.getAllByName0(InetAddress.java:1281) ~[?:1.8.0_222]
      	at java.net.InetAddress.getAllByName(InetAddress.java:1193) ~[?:1.8.0_222]
      	at java.net.InetAddress.getAllByName(InetAddress.java:1127) ~[?:1.8.0_222]
      	at org.apache.kafka.clients.ClientUtils.resolve(ClientUtils.java:104) ~[stormjar.jar:?]
      	at org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.currentAddress(ClusterConnectionStates.java:403) ~[stormjar.jar:?]
      	at org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.access$200(ClusterConnectionStates.java:363) ~[stormjar.jar:?]
      	at org.apache.kafka.clients.ClusterConnectionStates.currentAddress(ClusterConnectionStates.java:151) ~[stormjar.jar:?]
      	at org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:943) ~[stormjar.jar:?]
      	at org.apache.kafka.clients.NetworkClient.access$600(NetworkClient.java:68) ~[stormjar.jar:?]
      	at org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate(NetworkClient.java:1114) ~[stormjar.jar:?]
      	at org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate(NetworkClient.java:1005) ~[stormjar.jar:?]
      	at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:537) ~[stormjar.jar:?]
      	at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:262) ~[stormjar.jar:?]
      	at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233) ~[stormjar.jar:?]
      	at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224) ~[stormjar.jar:?]
      	at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:161) ~[stormjar.jar:?]
      	at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:366) ~[stormjar.jar:?]
      	at org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1251) ~[stormjar.jar:?]
      	at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1220) ~[stormjar.jar:?]
      	at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1159) ~[stormjar.jar:?]
      	at org.apache.storm.kafka.spout.KafkaSpout.pollKafkaBroker(KafkaSpout.java:365) ~[stormjar.jar:?]
      	at org.apache.storm.kafka.spout.KafkaSpout.nextTuple(KafkaSpout.java:294) ~[stormjar.jar:?]
      	at org.apache.storm.daemon.executor$fn__10715$fn__10730$fn__10761.invoke(executor.clj:649) ~[storm-core-1.1.3.jar:1.1.3]
      	at org.apache.storm.util$async_loop$fn__553.invoke(util.clj:484) ~[storm-core-1.1.3.jar:1.1.3]
      	at clojure.lang.AFn.run(AFn.java:22) ~[clojure-1.7.0.jar:?]
      	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222]
      

       

      The time it spends in the loop is arbitrary, but it seems the client effectively stops while this is happening.

      This error contrasts with instances where the client is able to recover on its own after a few seconds:

      2020-02-08T01:15:37.390Z o.a.k.c.c.i.AbstractCoordinator [INFO] - [Consumer clientId=consumer-7, groupId=consumer-group] Group coordinator nodeA.internal:9092 (id: 2147483645 rack: null) is unavailable or invalid, will attempt rediscovery
       
      2020-02-08T01:15:37.885Z o.a.k.c.c.i.AbstractCoordinator [INFO] - [Consumer clientId=consumer-7, groupId=consumer-group] Discovered group coordinator nodeB.internal:9092 (id: 2147483646 rack: null)
      
      2020-02-08T01:15:37.885Z o.a.k.c.ClusterConnectionStates [INFO] - [Consumer clientId=consumer-7, groupId=consumer-group] Hostname for node 2147483646 changed from nodeA.internal to nodeB.internal
      

         

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              RAbreu Rui Abreu
            • Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated: