Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-7888

kafka cluster not recovering - Shrinking ISR from 14,13 to 13 (kafka.cluster.Partition) continously

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.1.0
    • Fix Version/s: None
    • Labels:
      None
    • Environment:

      Description

      we're seeing the following repeating logs on our kafka cluster from time to time which seems to cause messages expiring on Producers and the cluster going into a non-recoverable state. The only fix seems to be to restart brokers.

      Shrinking ISR from 14,13 to 13 (kafka.cluster.Partition)
      Cached zkVersion [21] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)

       and later on the following log is repeated:

      Got user-level KeeperException when processing sessionid:0xe046aa4f8e60000 type:setData cxid:0x2df zxid:0xa000001fd txntype:-1 reqpath:n/a Error Path:/brokers/topics/ucTrade/partitions/6/state Error:KeeperErrorCode = BadVersion for /brokers/topics/ucTrade/partitions/6/state

      We haven't interfered with any of the brokers/zookeepers whilst this happened.

      I've attached a combined log which represents a combination of controller, server and state change logs from each broker (ids 13,14 and 15, log files have the suffix b13, b14, b15 respectively)

      We have increased the heaps from 1g to 6g for the brokers and from 512m to 4g for the zookeepers since this happened but not sure if it is relevant. the ZK logs are unfortunately overwritten so can't provide those.

      We produce varying message sizes but some messages are relatively large (6mb) but we use compression on the producers (set to gzip).

      I've attached some logs from one of our producers as well.

      producer.properties that we've changed:

      spring.kafka.producer.key-serializer=org.apache.kafka.common.serialization.StringSerializer
      spring.kafka.producer.compression-type=gzip
      spring.kafka.producer.retries=5
      spring.kafka.producer.acks=-1
      spring.kafka.producer.batch-size=1048576

      spring.kafka.producer.properties.linger.ms=200
      spring.kafka.producer.properties.request.timeout.ms=600000
      spring.kafka.producer.properties.max.block.ms=240000
      spring.kafka.producer.properties.max.request.size=104857600

       

       

        Attachments

        1. combined.log
          650 kB
          Kemal ERDEN
        2. producer.log
          252 kB
          Kemal ERDEN

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              kemalerden Kemal ERDEN
            • Votes:
              2 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated: