Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-7417

Some topics lost / cannot recover their ISR status following broker crash

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.1.1, 2.0.0
    • Fix Version/s: None
    • Component/s: replication
    • Labels:
      None

      Description

      Hi,
      we have faced with the next issue - some replicas cannot become in-sync. Distribution of in-sync replicas amongst topics is random. For instance:

      $ kafka-topics --zookeeper 1.2.3.4:8181 --describe --topic TEST
      Topic:TEST PartitionCount:8 ReplicationFactor:3 Configs:
      Topic: TEST Partition: 0 Leader: 2 Replicas: 0,2,1 Isr: 0,1,2
      Topic: TEST Partition: 1 Leader: 1 Replicas: 1,0,2 Isr: 0,1,2
      Topic: TEST Partition: 2 Leader: 2 Replicas: 2,1,0 Isr: 0,1,2
      Topic: TEST Partition: 3 Leader: 2 Replicas: 0,1,2 Isr: 0,1,2
      Topic: TEST Partition: 4 Leader: 1 Replicas: 1,2,0 Isr: 0,1,2
      Topic: TEST Partition: 5 Leader: 2 Replicas: 2,0,1 Isr: 0,1,2
      Topic: TEST Partition: 6 Leader: 0 Replicas: 0,2,1 Isr: 0,1,2
      Topic: TEST Partition: 7 Leader: 0 Replicas: 1,0,2 Isr: 0,2

      Files in segment TEST-7 are equal (the same md5sum) on all 3 brokers. Also were checked by kafka.tools.DumpLogSegments - messages are the same.

      We have 3-broker cluster configuration with Confluent Kafka 5.0.0 (it's Apache Kafka 2.0.0).
      Each broker has the next configuration:

      advertised.host.name = null
      advertised.listeners = PLAINTEXT://1.2.3.4:9200
      advertised.port = null
      alter.config.policy.class.name = null
      alter.log.dirs.replication.quota.window.num = 11
      alter.log.dirs.replication.quota.window.size.seconds = 1
      authorizer.class.name = 
      auto.create.topics.enable = true
      auto.leader.rebalance.enable = true
      background.threads = 10
      broker.id = 1
      broker.id.generation.enable = true
      broker.interceptor.class = class org.apache.kafka.server.interceptor.DefaultBrokerInterceptor
      broker.rack = null
      client.quota.callback.class = null
      compression.type = producer
      connections.max.idle.ms = 600000
      controlled.shutdown.enable = true
      controlled.shutdown.max.retries = 3
      controlled.shutdown.retry.backoff.ms = 5000
      controller.socket.timeout.ms = 30000
      create.topic.policy.class.name = null
      default.replication.factor = 3
      delegation.token.expiry.check.interval.ms = 3600000
      delegation.token.expiry.time.ms = 86400000
      delegation.token.master.key = null
      delegation.token.max.lifetime.ms = 604800000
      delete.records.purgatory.purge.interval.requests = 1
      delete.topic.enable = true
      fetch.purgatory.purge.interval.requests = 1000
      group.initial.rebalance.delay.ms = 3000
      group.max.session.timeout.ms = 300000
      group.min.session.timeout.ms = 6000
      host.name = 
      inter.broker.listener.name = null
      inter.broker.protocol.version = 2.0
      leader.imbalance.check.interval.seconds = 300
      leader.imbalance.per.broker.percentage = 10
      listener.security.protocol.map = PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL
      listeners = PLAINTEXT://0.0.0.0:9200
      log.cleaner.backoff.ms = 15000
      log.cleaner.dedupe.buffer.size = 134217728
      log.cleaner.delete.retention.ms = 86400000
      log.cleaner.enable = true
      log.cleaner.io.buffer.load.factor = 0.9
      log.cleaner.io.buffer.size = 524288
      log.cleaner.io.max.bytes.per.second = 1.7976931348623157E308
      log.cleaner.min.cleanable.ratio = 0.5
      log.cleaner.min.compaction.lag.ms = 0
      log.cleaner.threads = 1
      log.cleanup.policy = [delete]
      log.dir = /tmp/kafka-logs
      log.dirs = /var/lib/kafka/data
      log.flush.interval.messages = 9223372036854775807
      log.flush.interval.ms = null
      log.flush.offset.checkpoint.interval.ms = 60000
      log.flush.scheduler.interval.ms = 9223372036854775807
      log.flush.start.offset.checkpoint.interval.ms = 60000
      log.index.interval.bytes = 4096
      log.index.size.max.bytes = 10485760
      log.message.downconversion.enable = true
      log.message.format.version = 2.0
      log.message.timestamp.difference.max.ms = 9223372036854775807
      log.message.timestamp.type = CreateTime
      log.preallocate = false
      log.retention.bytes = -1
      log.retention.check.interval.ms = 300000
      log.retention.hours = 8760
      log.retention.minutes = null
      log.retention.ms = null
      log.roll.hours = 168
      log.roll.jitter.hours = 0
      log.roll.jitter.ms = null
      log.roll.ms = null
      log.segment.bytes = 1073741824
      log.segment.delete.delay.ms = 60000
      max.connections.per.ip = 2147483647
      max.connections.per.ip.overrides = 
      max.incremental.fetch.session.cache.slots = 1000
      message.max.bytes = 1000012
      metric.reporters = []
      metrics.num.samples = 2
      metrics.recording.level = INFO
      metrics.sample.window.ms = 30000
      min.insync.replicas = 2
      num.io.threads = 8
      num.network.threads = 8
      num.partitions = 8
      num.recovery.threads.per.data.dir = 1
      num.replica.alter.log.dirs.threads = null
      num.replica.fetchers = 4
      offset.metadata.max.bytes = 4096
      offsets.commit.required.acks = -1
      offsets.commit.timeout.ms = 5000
      offsets.load.buffer.size = 5242880
      offsets.retention.check.interval.ms = 600000
      offsets.retention.minutes = 525600
      offsets.topic.compression.codec = 0
      offsets.topic.num.partitions = 50
      offsets.topic.replication.factor = 3
      offsets.topic.segment.bytes = 104857600
      password.encoder.cipher.algorithm = AES/CBC/PKCS5Padding
      password.encoder.iterations = 4096
      password.encoder.key.length = 128
      password.encoder.keyfactory.algorithm = null
      password.encoder.old.secret = null
      password.encoder.secret = null
      port = 9092
      principal.builder.class = null
      producer.purgatory.purge.interval.requests = 1000
      queued.max.request.bytes = -1
      queued.max.requests = 500
      quota.consumer.default = 9223372036854775807
      quota.producer.default = 9223372036854775807
      quota.window.num = 11
      quota.window.size.seconds = 1
      replica.fetch.backoff.ms = 1000
      replica.fetch.max.bytes = 1048576
      replica.fetch.min.bytes = 1
      replica.fetch.response.max.bytes = 10485760
      replica.fetch.wait.max.ms = 5000
      replica.high.watermark.checkpoint.interval.ms = 5000
      replica.lag.time.max.ms = 30000
      replica.socket.receive.buffer.bytes = 65536
      replica.socket.timeout.ms = 30000
      replication.quota.window.num = 11
      replication.quota.window.size.seconds = 1
      request.timeout.ms = 30000
      reserved.broker.max.id = 1000
      sasl.client.callback.handler.class = null
      sasl.enabled.mechanisms = [GSSAPI]
      sasl.jaas.config = null
      sasl.kerberos.kinit.cmd = /usr/bin/kinit
      sasl.kerberos.min.time.before.relogin = 60000
      sasl.kerberos.principal.to.local.rules = [DEFAULT]
      sasl.kerberos.service.name = null
      sasl.kerberos.ticket.renew.jitter = 0.05
      sasl.kerberos.ticket.renew.window.factor = 0.8
      sasl.login.callback.handler.class = null
      sasl.login.class = null
      sasl.login.refresh.buffer.seconds = 300
      sasl.login.refresh.min.period.seconds = 60
      sasl.login.refresh.window.factor = 0.8
      sasl.login.refresh.window.jitter = 0.05
      sasl.mechanism.inter.broker.protocol = GSSAPI
      sasl.server.callback.handler.class = null
      security.inter.broker.protocol = PLAINTEXT
      socket.receive.buffer.bytes = 102400
      socket.request.max.bytes = 104857600
      socket.send.buffer.bytes = 102400
      ssl.cipher.suites = []
      ssl.client.auth = none
      ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
      ssl.endpoint.identification.algorithm = https
      ssl.key.password = null
      ssl.keymanager.algorithm = SunX509
      ssl.keystore.location = null
      ssl.keystore.password = null
      ssl.keystore.type = JKS
      ssl.protocol = TLS
      ssl.provider = null
      ssl.secure.random.implementation = null
      ssl.trustmanager.algorithm = PKIX
      ssl.truststore.location = null
      ssl.truststore.password = null
      ssl.truststore.type = JKS
      transaction.abort.timed.out.transaction.cleanup.interval.ms = 60000
      transaction.max.timeout.ms = 900000
      transaction.remove.expired.transaction.cleanup.interval.ms = 3600000
      transaction.state.log.load.buffer.size = 5242880
      transaction.state.log.min.isr = 2
      transaction.state.log.num.partitions = 50
      transaction.state.log.replication.factor = 3
      transaction.state.log.segment.bytes = 104857600
      transactional.id.expiration.ms = 604800000
      unclean.leader.election.enable = false
      zookeeper.connect = 1.2.3.4:8181,1.2.3.5:8181,1.2.3.6:8181
      zookeeper.connection.timeout.ms = null
      zookeeper.max.in.flight.requests = 10
      zookeeper.session.timeout.ms = 60000
      zookeeper.set.acl = false
      zookeeper.sync.time.ms = 2000

      History:

      • initially was working Confluent version 3.2.1 (Kakfa 0.10.2)
      • we updated Confluent image to 4.1.1 (Kafka 1.1.1) according to https://docs.confluent.io/4.1.1/installation/upgrade.html
      • after a few days one of Kafka broker was restarted. Since that cluster starts working strangely - broker 0 often was absent in ISR.
        We have RF=3 for all topics and most topics had only 2 ISR while some of them had all 3 ISR.

      Unfortunately, cannot exactly point the moment after that this happened.

      Steps were done trying to fix this issue:

      • restarted all 3 brokers in rolling manner. Each time cluster controller was restarted. After that an issue transferred to broker 1 instead of 0
      • changed replica.lag.time.max.ms: 10s -> 30s
      • changed num.replica.fetchers: 1 -> 4
      • changed num.network.threads: 3 -> 8
      • because often preferred replica was not a leader, kafka-preferred-replica-election was running for all topics. It was done a few times
      • CP version was upgraded to 5.0.0 (Kafka 2.0.0)
      • changed zookeeper.session.timeout.ms: 6000 -> 60000
      • changed replica.fetch.wait.max.ms: 500 -> 5000

       

      Any ideas how to fix it (excluding restarts of brokers)?

      Many thanks in advance!

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              mvhoma Mikhail Khomenko
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: