Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Duplicate
-
3.7.0
-
None
-
None
Description
As part of the KRaft migration, the Controller implements some of the ZK-mode controller functionality that is employed during the migration in what is known as "hybrid mode".
In hybrid mode some brokers may still be running in ZK-mode and some brokers may have already been restarted into KRaft mode.
The ZK-mode Controller implementation in KRaft does not implement the ZK-based logic to handle directory failures, so it will be unable to re-elect leaders for partitions led by failed directories.
This leaves a gap for JBOD during the ZK-KRaft migration. And there are two main ways this can be addressed:
- Implement the ZK-mode functionality to handle failed directories. Like in ZK-mode, the controller needs to subscribe to events in the `/log_dir_event_notification` ZNode, and rely on per-partition errors on full LeaderAndIsr responses to detect directory failures.
- Another, simpler way to address this, would be to have a migrating ZK broker stop upon any directory failure. This would sacrifice some availability / operational flexibility, but it may be much more straightforward to implement in comparison.
Without a solution, a directory failure during the migration may lead to indefinite partition unavailability.
Attachments
Issue Links
- is fixed by
-
KAFKA-15357 Aggregate and propagate assignments
- Resolved