Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-16563

migration to KRaft hanging after MigrationClientException

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.7.0
    • 3.8.0, 3.7.1
    • None
    • None

    Description

      When running ZK migrating to KRaft process, we encountered an issue that the migrating is hanging and the `ZkMigrationState` cannot move to `MIGRATION` state. After investigation, the root cause is because the pollEvent didn't retry with the retriable `MigrationClientException` (i.e. ZK client retriable errors) while it should. And because of this, the poll event will not poll anymore, which causes the KRaftMigrationDriver cannot work as expected.

       

      2024-04-11 21:27:55,393 INFO [KRaftMigrationDriver id=5] Encountered ZooKeeper error during event PollEvent. Will retry. (org.apache.kafka.metadata.migration.KRaftMigrationDriver) [controller-5-migration-driver-event-handler]org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /migration    at org.apache.zookeeper.KeeperException.create(KeeperException.java:126)    at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)    at kafka.zookeeper.AsyncResponse.maybeThrow(ZooKeeperClient.scala:570)    at kafka.zk.KafkaZkClient.createInitialMigrationState(KafkaZkClient.scala:1701)    at kafka.zk.KafkaZkClient.getOrCreateMigrationState(KafkaZkClient.scala:1689)    at kafka.zk.ZkMigrationClient.$anonfun$getOrCreateMigrationRecoveryState$1(ZkMigrationClient.scala:109)    at kafka.zk.ZkMigrationClient.getOrCreateMigrationRecoveryState(ZkMigrationClient.scala:69)    at org.apache.kafka.metadata.migration.KRaftMigrationDriver.applyMigrationOperation(KRaftMigrationDriver.java:248)    at org.apache.kafka.metadata.migration.KRaftMigrationDriver.recoverMigrationStateFromZK(KRaftMigrationDriver.java:169)    at org.apache.kafka.metadata.migration.KRaftMigrationDriver.access$1900(KRaftMigrationDriver.java:62)    at org.apache.kafka.metadata.migration.KRaftMigrationDriver$PollEvent.run(KRaftMigrationDriver.java:794)    at org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)    at org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)    at org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)    at java.base/java.lang.Thread.run(Thread.java:840)

      Attachments

        Issue Links

          Activity

            People

              showuon Luke Chen
              showuon Luke Chen
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: