Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-10643

Static membership - repetitive PreparingRebalance with updating metadata for member reason

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Cannot Reproduce
    • 2.6.0
    • None
    • streams
    • None

    Description

      Kafka streams 2.6.0, brokers version 2.6.0. Kafka nodes are healthy, kafka streams app is healthy. 

      Configured with static membership. 

      Every 10 minutes (I assume cause of topic.metadata.refresh.interval.ms), I see the following group coordinator log for different stream consumers: 

      INFO [GroupCoordinator 2]: Preparing to rebalance group *--*-stream in state PreparingRebalance with old generation 12244 (__consumer_offsets-45) (reason: Updating metadata for member ****-stream-11-1-013edd56-ed93-4370-b07c-1c29fbe72c9a) (kafka.coordinator.group.GroupCoordinator)

      and right after that the following log: 

      INFO [GroupCoordinator 2]: Assignment received from leader for group *-*-stream for generation 12246 (kafka.coordinator.group.GroupCoordinator)

       

      Looked a bit on the kafka code and Im not sure that I get why such a thing happening - is this line described the situation that happens here re the "reason:"?https://github.com/apache/kafka/blob/7ca299b8c0f2f3256c40b694078e422350c20d19/core/src/main/scala/kafka/coordinator/group/GroupCoordinator.scala#L311

      I also dont see it happening too often in other kafka streams applications that we have. 

      The only thing suspicious that I see around every hour that different pods of that kafka streams application throw this exception: 

      {"timestamp":"2020-10-25T06:44:20.414Z","level":"INFO","thread":"**-**-stream-94561945-4191-4a07-ac1b-07b27e044402-StreamThread-1","logger":"org.apache.kafka.clients.FetchSessionHandler","message":"[Consumer clientId=**-**-stream-94561945-4191-4a07-ac1b-07b27e044402-StreamThread-1-restore-consumer, groupId=null] Error sending fetch request (sessionId=34683236, epoch=2872) to node 3:","context":"default","exception":"org.apache.kafka.common.errors.DisconnectException: null\n"}

      I came across this strange behaviour after stated to investigate a strange stuck rebalancing state after one of the members left the group and caused the rebalance to stuck - the only thing that I found is that maybe because that too often preparing to rebalance states, the app might affected of this bug - KAFKA-9752 ?

      I dont understand why it happens, it wasn't before I applied static membership to that kafka streams application (since around 2 weeks ago). 

      Will be happy if you can help me

       

       

      Attachments

        1. broker-4-11.csv
          86 kB
          Eran Levy
        2. client-4-11.csv
          34 kB
          Eran Levy
        3. client-d-9-11-11-2020.csv
          8.44 MB
          Eran Levy

        Activity

          People

            Unassigned Unassigned
            eran-levy Eran Levy
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: