Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
With EC containers, if there is a small cluster of say 6 nodes with EC-3-2, a container will require 5 nodes. If 2 containers become unhealthy, reconstruction will be required to recover the 2 containers, but there is only 1 spare node.
This means one will get recovered, and we will have 4 "good" containers and 2 "unhealthy" and the container will remain stuck like this because unhealthy containers are only removed once the container is has no over or under replication.
A similar problem was resolved previously where an EC container with both over and under replication can meet the same problem, where under replication cannot proceed due to insufficient spare nodes. In that case, the solution was to check for this case, and call the over-replication handler to clear up the excess replicas. A similar solution is required here to remove some unhealthy nodes to allow progress to be made.
Attachments
Issue Links
- links to