Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
In a similar way to HDDS-8535, if the cluster is small, say 4 nodes and a Ratis container has 2 unhealthy containers, RM will currently recover one new replia, leaving all 4 nodes used with 2 healthy and 2 unhealthy. As unhealthy containers are only removed after all over and under replication has been resolved, the container will remain stuck like this.
To avoid this, if there are insufficient spare nodes and also some unhealthy containers, then the under replication handler may need to call into the unhealthy handler to remove some of the unhealthy replicas to allow progress to be made.
Attachments
Issue Links
- relates to
-
HDDS-9257 LegacyReplicationManager: Unhealthy replicas could block under replication handling
- Resolved
- links to