Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-1600

Controller failover not working correctly.

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.8.1
    • Fix Version/s: 0.8.2.0
    • Component/s: controller
    • Labels:
      None
    • Environment:
      Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.46-1 x86_64 GNU/Linux
      java version "1.7.0_03"

      Description

      We are running a 10 node Kafka 0.8.1 cluster and experienced a failure as following.
      At some time, broker A stopped acting as controller any more. We see this by kafka.controller - KafkaController - ActiveControllerCount in JMX metrics jumped from 1 to 0.
      In the meanwhile, broker A was still running and registering itself in the zookeeper /kafka/controller node. So no other brokers could be elected as new controller.
      Since that the cluster was running without controller. Producers and consumers still worked. But functions requiring a controller such as new topic leader election and topic leader failover were not working any more.
      A force restart of broker A could lead to a controller election and bring the cluster back to a correct state.
      Here is our brief observations. I can provide more necessary informations if needed.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                dinghaifeng Ding Haifeng
                Reviewer:
                Neha Narkhede
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: