Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
Suppose that broker 1 is the controller and it has 2 consecutive ZK session expirations. In this case, two ZK session expiration events will be fired.
1. When handling the first ZK session expiration event, SessionExpirationListener.handleNewSession() can elect broker 1 itself as the new controller and initialize the states properly.
2. When handling the second ZK session expiration event, SessionExpirationListener.handleNewSession() first calls onControllerResignation(), which will set ReplicaStateMachine.hasStarted to false. It then continues to do controller election in ZookeeperLeaderElector.elect() and try to create the controller node in ZK. This will fail since broker 1 has already registered itself as the controller node in ZK. In this case, we simply ignore the failure to create the controller node since we assume the controller must be in another broker. However, in this case, the controller is broker 1 itself, but the ReplicaStateMachine.hasStarted is still false.
3. Now, if a new broker event is fired, we will be ignoring the event in BrokerChangeListener.handleChildChange since ReplicaStateMachine.hasStarted is false. Now, we are in a situation that a controller is alive, but won't react to any broker change event.