Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-3215

controller may not be started when there are multiple ZK session expirations

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.9.0.0
    • core

    Description

      Suppose that broker 1 is the controller and it has 2 consecutive ZK session expirations. In this case, two ZK session expiration events will be fired.

      1. When handling the first ZK session expiration event, SessionExpirationListener.handleNewSession() can elect broker 1 itself as the new controller and initialize the states properly.

      2. When handling the second ZK session expiration event, SessionExpirationListener.handleNewSession() first calls onControllerResignation(), which will set ReplicaStateMachine.hasStarted to false. It then continues to do controller election in ZookeeperLeaderElector.elect() and try to create the controller node in ZK. This will fail since broker 1 has already registered itself as the controller node in ZK. In this case, we simply ignore the failure to create the controller node since we assume the controller must be in another broker. However, in this case, the controller is broker 1 itself, but the ReplicaStateMachine.hasStarted is still false.
      3. Now, if a new broker event is fired, we will be ignoring the event in BrokerChangeListener.handleChildChange since ReplicaStateMachine.hasStarted is false. Now, we are in a situation that a controller is alive, but won't react to any broker change event.

      Attachments

        Activity

          People

            fpj Flavio Paiva Junqueira
            junrao Jun Rao
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: