Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-1825

leadership election state is stale and never recovers without all brokers restarting

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Auto Closed
    • 0.8.1.1, 0.8.2.0
    • None
    • None
    • None

    Description

      I am not sure what is the cause here but I can succinctly and repeatedly reproduce this issue. I tried with 0.8.1.1 and 0.8.2-beta and both behave in the same manner.

      The code to reproduce this is here https://github.com/stealthly/go_kafka_client/tree/wipAsyncSaramaProducer/producers

      scenario 3 brokers, 1 zookeeper, 1 client (each AWS c3.2xlarge instances)

      create topic
      producer client sends in 380,000 messages/sec (attached executable)

      everything is fine until you kill -SIGTERM broker #2

      then at that point the state goes bad for that topic. even trying to use the console producer (with the sarama producer off) doesn't work.

      doing a describe the yoyoma topic looks fine, ran prefered leadership election lots of issues... still can't produce... only resolution is bouncing all brokers

      root@ip-10-233-52-139:/opt/kafka_2.10-0.8.1.1# bin/kafka-topics.sh --zookeeper 10.218.189.234:2181 --describe
      Topic:yoyoma PartitionCount:36 ReplicationFactor:3 Configs:
      Topic: yoyoma Partition: 0 Leader: 1 Replicas: 1,2,3 Isr: 1,3
      Topic: yoyoma Partition: 1 Leader: 1 Replicas: 2,3,1 Isr: 1,3
      Topic: yoyoma Partition: 2 Leader: 1 Replicas: 3,1,2 Isr: 1,3
      Topic: yoyoma Partition: 3 Leader: 1 Replicas: 1,3,2 Isr: 1,3
      Topic: yoyoma Partition: 4 Leader: 1 Replicas: 2,1,3 Isr: 1,3
      Topic: yoyoma Partition: 5 Leader: 1 Replicas: 3,2,1 Isr: 1,3
      Topic: yoyoma Partition: 6 Leader: 1 Replicas: 1,2,3 Isr: 1,3
      Topic: yoyoma Partition: 7 Leader: 1 Replicas: 2,3,1 Isr: 1,3
      Topic: yoyoma Partition: 8 Leader: 1 Replicas: 3,1,2 Isr: 1,3
      Topic: yoyoma Partition: 9 Leader: 1 Replicas: 1,3,2 Isr: 1,3
      Topic: yoyoma Partition: 10 Leader: 1 Replicas: 2,1,3 Isr: 1,3
      Topic: yoyoma Partition: 11 Leader: 1 Replicas: 3,2,1 Isr: 1,3
      Topic: yoyoma Partition: 12 Leader: 1 Replicas: 1,2,3 Isr: 1,3
      Topic: yoyoma Partition: 13 Leader: 1 Replicas: 2,3,1 Isr: 1,3
      Topic: yoyoma Partition: 14 Leader: 1 Replicas: 3,1,2 Isr: 1,3
      Topic: yoyoma Partition: 15 Leader: 1 Replicas: 1,3,2 Isr: 1,3
      Topic: yoyoma Partition: 16 Leader: 1 Replicas: 2,1,3 Isr: 1,3
      Topic: yoyoma Partition: 17 Leader: 1 Replicas: 3,2,1 Isr: 1,3
      Topic: yoyoma Partition: 18 Leader: 1 Replicas: 1,2,3 Isr: 1,3
      Topic: yoyoma Partition: 19 Leader: 1 Replicas: 2,3,1 Isr: 1,3
      Topic: yoyoma Partition: 20 Leader: 1 Replicas: 3,1,2 Isr: 1,3
      Topic: yoyoma Partition: 21 Leader: 1 Replicas: 1,3,2 Isr: 1,3
      Topic: yoyoma Partition: 22 Leader: 1 Replicas: 2,1,3 Isr: 1,3
      Topic: yoyoma Partition: 23 Leader: 1 Replicas: 3,2,1 Isr: 1,3
      Topic: yoyoma Partition: 24 Leader: 1 Replicas: 1,2,3 Isr: 1,3
      Topic: yoyoma Partition: 25 Leader: 1 Replicas: 2,3,1 Isr: 1,3
      Topic: yoyoma Partition: 26 Leader: 1 Replicas: 3,1,2 Isr: 1,3
      Topic: yoyoma Partition: 27 Leader: 1 Replicas: 1,3,2 Isr: 1,3
      Topic: yoyoma Partition: 28 Leader: 1 Replicas: 2,1,3 Isr: 1,3
      Topic: yoyoma Partition: 29 Leader: 1 Replicas: 3,2,1 Isr: 1,3
      Topic: yoyoma Partition: 30 Leader: 1 Replicas: 1,2,3 Isr: 1,3
      Topic: yoyoma Partition: 31 Leader: 1 Replicas: 2,3,1 Isr: 1,3
      Topic: yoyoma Partition: 32 Leader: 1 Replicas: 3,1,2 Isr: 1,3
      Topic: yoyoma Partition: 33 Leader: 1 Replicas: 1,3,2 Isr: 1,3
      Topic: yoyoma Partition: 34 Leader: 1 Replicas: 2,1,3 Isr: 1,3
      Topic: yoyoma Partition: 35 Leader: 1 Replicas: 3,2,1 Isr: 1,3
      root@ip-10-233-52-139:/opt/kafka_2.10-0.8.1.1# bin/kafka-preferred-replica-election.sh --zookeeper 10.218.189.234:2181
      Successfully started preferred replica election for partitions Set([yoyoma,29], [yoyoma,14], [yoyoma,22], [yoyoma,15], [yoyoma,3], [yoyoma,11], [yoyoma,32], [yoyoma,23], [yoyoma,18], [yoyoma,25], [yoyoma,26], [yoyoma,1], [yoyoma,9], [yoyoma,33], [yoyoma,5], [yoyoma,12], [yoyoma,20], [yoyoma,4], [yoyoma,7], [yoyoma,24], [yoyoma,35], [yoyoma,10], [yoyoma,8], [yoyoma,2], [yoyoma,21], [yoyoma,31], [yoyoma,28], [yoyoma,19], [yoyoma,16], [yoyoma,13], [yoyoma,34], [yoyoma,0], [test-1210,0], [yoyoma,30], [yoyoma,27], [yoyoma,17], [yoyoma,6])
      [2014-12-19 18:33:56,228] INFO [ReplicaFetcherManager on broker 1] Removed fetcher for partitions [yoyoma,29],[yoyoma,14],[yoyoma,11],[yoyoma,32],[yoyoma,23],[yoyoma,26],[yoyoma,5],[yoyoma,20],[yoyoma,35],[yoyoma,8],[yoyoma,2],[yoyoma,17] (kafka.server.ReplicaFetcherManager)
      [2014-12-19 18:33:56,229] INFO Truncating log yoyoma-29 to offset 6481451. (kafka.log.Log)
      [2014-12-19 18:33:56,229] INFO Truncating log yoyoma-14 to offset 6469671. (kafka.log.Log)
      [2014-12-19 18:33:56,229] INFO Truncating log yoyoma-11 to offset 6472578. (kafka.log.Log)
      [2014-12-19 18:33:56,229] INFO Truncating log yoyoma-32 to offset 6481923. (kafka.log.Log)
      [2014-12-19 18:33:56,230] INFO Truncating log yoyoma-23 to offset 6473039. (kafka.log.Log)
      [2014-12-19 18:33:56,230] INFO Truncating log yoyoma-26 to offset 6478089. (kafka.log.Log)
      [2014-12-19 18:33:56,230] INFO Truncating log yoyoma-5 to offset 6473159. (kafka.log.Log)
      [2014-12-19 18:33:56,230] INFO Truncating log yoyoma-20 to offset 6474790. (kafka.log.Log)
      [2014-12-19 18:33:56,230] INFO Truncating log yoyoma-35 to offset 6482661. (kafka.log.Log)
      [2014-12-19 18:33:56,230] INFO Truncating log yoyoma-8 to offset 6467814. (kafka.log.Log)
      [2014-12-19 18:33:56,231] INFO Truncating log yoyoma-2 to offset 6477942. (kafka.log.Log)
      [2014-12-19 18:33:56,231] INFO Truncating log yoyoma-17 to offset 6476136. (kafka.log.Log)
      [2014-12-19 18:33:56,241] INFO [ReplicaFetcherThread-2-3], Starting (kafka.server.ReplicaFetcherThread)
      [2014-12-19 18:33:56,243] INFO [ReplicaFetcherThread-1-3], Starting (kafka.server.ReplicaFetcherThread)
      [2014-12-19 18:33:56,244] INFO [ReplicaFetcherThread-3-3], Starting (kafka.server.ReplicaFetcherThread)
      [2014-12-19 18:33:56,245] INFO [ReplicaFetcherThread-0-3], Starting (kafka.server.ReplicaFetcherThread)
      [2014-12-19 18:33:56,245] INFO [ReplicaFetcherManager on broker 1] Added fetcher for partitions ArrayBuffer([[yoyoma,23], initOffset 6473039 to broker id:3,host:10.51.176.70,port:9092] , [[yoyoma,17], initOffset 6476136 to broker id:3,host:10.51.176.70,port:9092] , [[yoyoma,32], initOffset 6481923 to broker id:3,host:10.51.176.70,port:9092] , [[yoyoma,14], initOffset 6469671 to broker id:3,host:10.51.176.70,port:9092] , [[yoyoma,20], initOffset 6474790 to broker id:3,host:10.51.176.70,port:9092] , [[yoyoma,8], initOffset 6467814 to broker id:3,host:10.51.176.70,port:9092] , [[yoyoma,5], initOffset 6473159 to broker id:3,host:10.51.176.70,port:9092] , [[yoyoma,35], initOffset 6482661 to broker id:3,host:10.51.176.70,port:9092] , [[yoyoma,2], initOffset 6477942 to broker id:3,host:10.51.176.70,port:9092] , [[yoyoma,11], initOffset 6472578 to broker id:3,host:10.51.176.70,port:9092] , [[yoyoma,26], initOffset 6478089 to broker id:3,host:10.51.176.70,port:9092] , [[yoyoma,29], initOffset 6481451 to broker id:3,host:10.51.176.70,port:9092] ) (kafka.server.ReplicaFetcherManager)
      [2014-12-19 18:33:56,289] WARN [KafkaApi-1] Fetch request with correlation id 1845 from client ReplicaFetcherThread-1-1 on partition [yoyoma,29] failed due to Leader not local for partition [yoyoma,29] on broker 1 (kafka.server.KafkaApis)
      [2014-12-19 18:33:56,290] WARN [KafkaApi-1] Fetch request with correlation id 1845 from client ReplicaFetcherThread-1-1 on partition [yoyoma,5] failed due to Leader not local for partition [yoyoma,5] on broker 1 (kafka.server.KafkaApis)
      [2014-12-19 18:33:56,290] WARN [KafkaApi-1] Fetch request with correlation id 1845 from client ReplicaFetcherThread-1-1 on partition [yoyoma,17] failed due to Leader not local for partition [yoyoma,17] on broker 1 (kafka.server.KafkaApis)
      [2014-12-19 18:33:56,290] WARN [KafkaApi-1] Fetch request with correlation id 1845 from client ReplicaFetcherThread-3-1 on partition [yoyoma,11] failed due to Leader not local for partition [yoyoma,11] on broker 1 (kafka.server.KafkaApis)
      [2014-12-19 18:33:56,290] WARN [KafkaApi-1] Fetch request with correlation id 1845 from client ReplicaFetcherThread-3-1 on partition [yoyoma,23] failed due to Leader not local for partition [yoyoma,23] on broker 1 (kafka.server.KafkaApis)
      [2014-12-19 18:33:56,290] WARN [KafkaApi-1] Fetch request with correlation id 1845 from client ReplicaFetcherThread-3-1 on partition [yoyoma,35] failed due to Leader not local for partition [yoyoma,35] on broker 1 (kafka.server.KafkaApis)
      [2014-12-19 18:33:56,290] WARN [KafkaApi-1] Fetch request with correlation id 1845 from client ReplicaFetcherThread-2-1 on partition [yoyoma,14] failed due to Leader not local for partition [yoyoma,14] on broker 1 (kafka.server.KafkaApis)
      [2014-12-19 18:33:56,290] WARN [KafkaApi-1] Fetch request with correlation id 1845 from client ReplicaFetcherThread-2-1 on partition [yoyoma,26] failed due to Leader not local for partition [yoyoma,26] on broker 1 (kafka.server.KafkaApis)
      [2014-12-19 18:33:56,291] WARN [KafkaApi-1] Fetch request with correlation id 1845 from client ReplicaFetcherThread-2-1 on partition [yoyoma,2] failed due to Leader not local for partition [yoyoma,2] on broker 1 (kafka.server.KafkaApis)
      [2014-12-19 18:33:56,334] WARN [KafkaApi-1] Fetch request with correlation id 1845 from client ReplicaFetcherThread-0-1 on partition [yoyoma,32] failed due to Leader not local for partition [yoyoma,32] on broker 1 (kafka.server.KafkaApis)
      [2014-12-19 18:33:56,334] WARN [KafkaApi-1] Fetch request with correlation id 1845 from client ReplicaFetcherThread-0-1 on partition [yoyoma,20] failed due to Leader not local for partition [yoyoma,20] on broker 1 (kafka.server.KafkaApis)
      [2014-12-19 18:33:56,334] WARN [KafkaApi-1] Fetch request with correlation id 1845 from client ReplicaFetcherThread-0-1 on partition [yoyoma,8] failed due to Leader not local for partition [yoyoma,8] on broker 1 (kafka.server.KafkaApis)
      root@ip-10-233-52-139:/opt/kafka_2.10-0.8.1.1# bin/kafka-topics.sh --zookeeper 10.218.189.234:2181 --describe
      Topic:yoyoma PartitionCount:36 ReplicationFactor:3 Configs:
      Topic: yoyoma Partition: 0 Leader: 1 Replicas: 1,2,3 Isr: 1,3
      Topic: yoyoma Partition: 1 Leader: 1 Replicas: 2,3,1 Isr: 1,3
      Topic: yoyoma Partition: 2 Leader: 3 Replicas: 3,1,2 Isr: 1,3
      Topic: yoyoma Partition: 3 Leader: 1 Replicas: 1,3,2 Isr: 1,3
      Topic: yoyoma Partition: 4 Leader: 1 Replicas: 2,1,3 Isr: 1,3
      Topic: yoyoma Partition: 5 Leader: 3 Replicas: 3,2,1 Isr: 1,3
      Topic: yoyoma Partition: 6 Leader: 1 Replicas: 1,2,3 Isr: 1,3
      Topic: yoyoma Partition: 7 Leader: 1 Replicas: 2,3,1 Isr: 1,3
      Topic: yoyoma Partition: 8 Leader: 3 Replicas: 3,1,2 Isr: 1,3
      Topic: yoyoma Partition: 9 Leader: 1 Replicas: 1,3,2 Isr: 1,3
      Topic: yoyoma Partition: 10 Leader: 1 Replicas: 2,1,3 Isr: 1,3
      Topic: yoyoma Partition: 11 Leader: 3 Replicas: 3,2,1 Isr: 1,3
      Topic: yoyoma Partition: 12 Leader: 1 Replicas: 1,2,3 Isr: 1,3
      Topic: yoyoma Partition: 13 Leader: 1 Replicas: 2,3,1 Isr: 1,3
      Topic: yoyoma Partition: 14 Leader: 3 Replicas: 3,1,2 Isr: 1,3
      Topic: yoyoma Partition: 15 Leader: 1 Replicas: 1,3,2 Isr: 1,3
      Topic: yoyoma Partition: 16 Leader: 1 Replicas: 2,1,3 Isr: 1,3
      Topic: yoyoma Partition: 17 Leader: 3 Replicas: 3,2,1 Isr: 1,3
      Topic: yoyoma Partition: 18 Leader: 1 Replicas: 1,2,3 Isr: 1,3
      Topic: yoyoma Partition: 19 Leader: 1 Replicas: 2,3,1 Isr: 1,3
      Topic: yoyoma Partition: 20 Leader: 3 Replicas: 3,1,2 Isr: 1,3
      Topic: yoyoma Partition: 21 Leader: 1 Replicas: 1,3,2 Isr: 1,3
      Topic: yoyoma Partition: 22 Leader: 1 Replicas: 2,1,3 Isr: 1,3
      Topic: yoyoma Partition: 23 Leader: 3 Replicas: 3,2,1 Isr: 1,3
      Topic: yoyoma Partition: 24 Leader: 1 Replicas: 1,2,3 Isr: 1,3
      Topic: yoyoma Partition: 25 Leader: 1 Replicas: 2,3,1 Isr: 1,3
      Topic: yoyoma Partition: 26 Leader: 3 Replicas: 3,1,2 Isr: 1,3
      Topic: yoyoma Partition: 27 Leader: 1 Replicas: 1,3,2 Isr: 1,3
      Topic: yoyoma Partition: 28 Leader: 1 Replicas: 2,1,3 Isr: 1,3
      Topic: yoyoma Partition: 29 Leader: 3 Replicas: 3,2,1 Isr: 1,3
      Topic: yoyoma Partition: 30 Leader: 1 Replicas: 1,2,3 Isr: 1,3
      Topic: yoyoma Partition: 31 Leader: 1 Replicas: 2,3,1 Isr: 1,3
      Topic: yoyoma Partition: 32 Leader: 3 Replicas: 3,1,2 Isr: 1,3
      Topic: yoyoma Partition: 33 Leader: 1 Replicas: 1,3,2 Isr: 1,3
      Topic: yoyoma Partition: 34 Leader: 1 Replicas: 2,1,3 Isr: 1,3
      Topic: yoyoma Partition: 35 Leader: 3 Replicas: 3,2,1 Isr: 1,3

      Attachments

        1. KAFKA-1825.executable.tgz
          2.12 MB
          Joe Stein

        Activity

          People

            Unassigned Unassigned
            joestein Joe Stein
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: