Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
The initial problem was that LegacyReplicationManager was not considering that UNHEALTHY replicas could be decommissioning or entering maintenance. So, its logic for determining whether a container with all UNHEALTHY replicas is under replicated was flawed. This was fixed in HDDS-9652. The fix simply used existing logic in RatisContainerReplicaCount that is able to account for decommissioning UNHEALTHY replicas. However this didn't completely fix the problem because DatanodeAdminMonitorImpl also needs to be updated.
RatisContainerReplicaCount (extended by LegacyRatisContainerReplicaCount, exclusively used by the legacy replication manager) is the interface between the replication manager and the decommissioning flow. It's used by both to determine whether a container is under replicated. This Jira should make it so that when a container has all UNHEALTHY replicas, DatanodeAdminMonitor receives the LegacyRatisContainerReplicaCount object which can handle decommissioning UNHEALTHY containers.
Attachments
Issue Links
- links to