Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-16431

Handle log dir failure in hybrid mode

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Duplicate
    • 3.7.0
    • None
    • jbod
    • None

    Description

      As part of the KRaft migration, the Controller implements some of the ZK-mode controller functionality that is employed during the migration in what is known as "hybrid mode".

      In hybrid mode some brokers may still be running in ZK-mode and some brokers may have already been restarted into KRaft mode.

      The ZK-mode Controller implementation in KRaft does not implement the ZK-based logic to handle directory failures, so it will be unable to re-elect leaders for partitions led by failed directories.

      This leaves a gap for JBOD during the ZK-KRaft migration. And there are two main ways this can be addressed:

      1. Implement the ZK-mode functionality to handle failed directories. Like in ZK-mode, the controller needs to subscribe to events in the `/log_dir_event_notification` ZNode, and rely on per-partition errors on full LeaderAndIsr responses to detect directory failures.
      2. Another, simpler way to address this, would be to have a migrating ZK broker stop upon any directory failure. This would sacrifice some availability / operational flexibility, but it may be much more straightforward to implement in comparison.

      Without a solution, a directory failure during the migration may lead to indefinite partition unavailability.

      Attachments

        Issue Links

          Activity

            People

              soarez Igor Soarez
              soarez Igor Soarez
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: