Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-17241 KIP-853 follow-ups
  3. KAFKA-17333

KRaft notifies the controller of leadership too early

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.9.0
    • None
    • None

    Description

      After a crash the old leader stayed as leader on disk:

      $ cat ../__cluster_metadata-0/quorum-state
      {"leaderId":2,"leaderEpoch":77,"votedId":-1,"votedDirectoryId":"AAAAAAAAAAAAAAAAAAAAAA","data_version":1}
      

      While the rest of the qourum move on from epoch 77:

      $ bin/kafka-metadata-quorum.sh --bootstrap-controller localhost:9093,localhost:9094,localhost:9095 describe --status
      [2024-08-16 14:03:14,897] WARN [AdminClient clientId=adminclient-1] Connection to node -2 (localhost/127.0.0.1:9094) could not be established. Node may not be available. (org.apache.kafka.clients.NetworkClient)
      ClusterId:              kfalSizvRGOry-gExUTS5A
      LeaderId:               1
      LeaderEpoch:            78
      HighWatermark:          98479
      ...
      

      After restarting the failed controller it looks like the state machine is notified that it is leader. This should not happen.

      [2024-08-16 14:22:22,502] DEBUG [RaftManager id=2] Notifying listener org.apache.kafka.controller.QuorumController$QuorumMetaLogListener@22148336 of leader change LeaderAndEpoch(leaderId=OptionalInt[2], epoch=77) (org.apache.kafka.raft.KafkaRaftClient)
      [2024-08-16 14:22:22,508] INFO [controller-2-ThrottledChannelReaper-Fetch]: Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
      [2024-08-16 14:22:22,508] INFO [controller-2-ThrottledChannelReaper-Produce]: Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
      [2024-08-16 14:22:22,508] TRACE [RaftManager id=2] Received inbound message InboundResponse(correlationId=0, data=EndQuorumEpochResponseData(errorCode=0, topics=[TopicData(topicName='__cluster_metadata', partitions=[PartitionData(partitionIndex=0, errorCode=74, leaderId=3, leaderEpoch=79)])], nodeEndpoints=[NodeEndpoint(nodeId=3, host='localhost', port=9095)]), source=localhost:9093 (id: 1 rack: null)) (org.apache.kafka.raft.KafkaRaftClient)
      [2024-08-16 14:22:22,509] TRACE Writing tmp quorum state /tmp/kraft-controller-2-logs/__cluster_metadata-0/quorum-state.tmp (org.apache.kafka.raft.FileQuorumStateStore)
      [2024-08-16 14:22:22,510] ERROR Encountered fatal fault: exception while renouncing leadership (org.apache.kafka.server.fault.ProcessTerminatingFaultHandler)
      java.lang.IllegalStateException: Attempt to resign by a non-voter
              at org.apache.kafka.raft.KafkaRaftClient.resign(KafkaRaftClient.java:3359)
              at org.apache.kafka.controller.QuorumController.renounce(QuorumController.java:1263)
              at org.apache.kafka.controller.QuorumController.handleEventException(QuorumController.java:544)
              at org.apache.kafka.controller.QuorumController.access$800(QuorumController.java:179)
              at org.apache.kafka.controller.QuorumController$ControllerWriteEvent.complete(QuorumController.java:874)
              at org.apache.kafka.controller.QuorumController$ControllerWriteEvent.handleException(QuorumController.java:864)
              at org.apache.kafka.queue.KafkaEventQueue$EventContext.completeWithException(KafkaEventQueue.java:153)
              at org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:142)
              at org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:215)
              at org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:186)
              at java.base/java.lang.Thread.run(Thread.java:840)  

      While restarting a controller that was leader after a crash the controller gets notify of leadership. This is not correct. The controller should only get notified once it has reached the high-watermark.

      Attachments

        Issue Links

          Activity

            People

              jsancio José Armando García Sancio
              jsancio José Armando García Sancio
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: