Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-7017

GroupCoordinator response error: Broker: Group coordinator not available

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 1.1.0
    • 1.1.0
    • None

    Description

      __
      1. Most of the consumers got stuck while reading the data from Kafka topic, the stuck stack trace is given as below, After certain timeout application got restarted, try to connect with the same consumer group, however, it still went to same stuck stack
       
       "main" #1 prio=5 os_prio=0 tid=0x0000000001811800 nid=0x194 runnable [0x00007ffe513bd000]
         java.lang.Thread.State: RUNNABLE
              at org.apache.kafka.common.network.PlaintextTransportLayer.read(PlaintextTransportLayer.java:104)
              at org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:122)
              at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:93)
              at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:235)
              at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:196)
              at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:557)
              at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:495)
              at org.apache.kafka.common.network.Selector.poll(Selector.java:424)
              at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:460)
              at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:261)
              at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233)
              at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
              at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:156)
              at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:228)
              - locked <0x00000000ae7acf08> (a org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
              at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:205)
              - locked <0x00000000ae7acf08> (a org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
              at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.fetchCommittedOffsets(ConsumerCoordinator.java:465)
              at org.apache.kafka.clients.consumer.KafkaConsumer.committed(KafkaConsumer.java:1461)
       
       
      2.  To debug further installed KafkaCat, tried to consume the data using same consumer group which is getting stuck, and then with the new consumer group. Stuck consumer group we are not able to consume data, however new consumer group it was able to consume the data, the error is seen for stuck consumer group as follows
       
      7|1528304675.172|COMMIT|rdkafka#consumer-1| OffsetCommit for -1 partition(s) returned: Local: No offset stored
      %7|1528304675.172|UNASSIGN|rdkafka#consumer-1| Group "agent.defaultagent": unassign done in state wait-broker (join state init): without new assignment: OffsetCommit done (__NO_OFFSET)
      %7|1528304675.223|CGRPQUERY|rdkafka#consumer-1| broker:9092/bootstrap: Group "agent.defaultagent": querying for coordinator: intervaled in state wait-broker
      %7|1528304675.244|SEND|rdkafka#consumer-1| broker:9092/bootstrap: Sent GroupCoordinatorRequest (v0, 41 bytes @ 0, CorrId 25)
      %7|1528304675.255|RECV|rdkafka#consumer-1| broker:9092/bootstrap: Received GroupCoordinatorResponse (v0, 12 bytes, CorrId 25, rtt 10.91ms)
      %7|1528304675.326|CGRPCOORD|rdkafka#consumer-1| broker:9092/bootstrap: Group "agent.defaultagent" GroupCoordinator response error: Broker: Group coordinator not available
      %7|1528304676.226|CGRPQUERY|rdkafka#consumer-1| broker-0.broker.default.svc.cluster.local:9092/0: Group "agent.defaultagent": querying for coordinator: intervaled in state wait-broker
      %7|1528304676.330|SEND|rdkafka#consumer-1| broker-0.broker.default.svc.cluster.local:9092/0: Sent GroupCoordinatorRequest (v0, 41 bytes @ 0, CorrId 33)
      %7|1528304676.350|RECV|rdkafka#consumer-1| broker-0.broker.default.svc.cluster.local:9092/0: Received GroupCoordinatorResponse (v0, 12 bytes, CorrId 33, rtt 19.93ms)
      %7|1528304676.430|CGRPCOORD|rdkafka#consumer-1| broker-0.broker.default.svc.cluster.local:9092/0: Group "agent.defaultagent" GroupCoordinator response error: Broker: Group coordinator not available
      %7|1528304677.226|CGRPQUERY|rdkafka#consumer-1| broker:9092/bootstrap: Group "agent.defaultagent": querying for coordinator: intervaled in state wait-broker
       
       
      3. Tried to delete the stuck consumer group, however, its failing with the same highlighted error 
       
      Error: Deletion of some consumer groups failed:

      • Group 'agent.defaultagent' could not be deleted due to: COORDINATOR_NOT_AVAILABLE
         
        4. From the link I can see http://home.apache.org/~ewencp/kafka-0.10.2.0-rc1/javadoc/org/apache/kafka/common/errors/GroupCoordinatorNotAvailableException.html this is a temporary issue, will get resolved once offset topic created, but in our case, it's not recovered, however for the same topic with different consumer group consumption is happenings
         
         
        Can you let me know the way to recover the system, without restarting the broker or Zookeeper, What is the way to avoid this race condition, also is this is a bug in Kafka?
         
        Let me know if any other details required 

      Attachments

        Activity

          People

            Unassigned Unassigned
            sakumar Sampath Kumar
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: