Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-1600

Controller failover not working correctly.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 0.8.1
    • 0.8.2.0
    • controller
    • None
    • Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.46-1 x86_64 GNU/Linux
      java version "1.7.0_03"

    Description

      We are running a 10 node Kafka 0.8.1 cluster and experienced a failure as following.
      At some time, broker A stopped acting as controller any more. We see this by kafka.controller - KafkaController - ActiveControllerCount in JMX metrics jumped from 1 to 0.
      In the meanwhile, broker A was still running and registering itself in the zookeeper /kafka/controller node. So no other brokers could be elected as new controller.
      Since that the cluster was running without controller. Producers and consumers still worked. But functions requiring a controller such as new topic leader election and topic leader failover were not working any more.
      A force restart of broker A could lead to a controller election and bring the cluster back to a correct state.
      Here is our brief observations. I can provide more necessary informations if needed.

      Attachments

        1. kafka_failure_logs.tar.gz
          833 kB
          Ding Haifeng

        Issue Links

          Activity

            People

              Unassigned Unassigned
              dinghaifeng Ding Haifeng
              Neha Narkhede Neha Narkhede
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: