Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Currently, if there is an issue with a container, we must choose whether to mark it as unhealthy or not. Once a container is marked unhealthy, it can never leave this state and will not be replicated. If we never have bugs with containers, this binary healthy/unhealthy state would probably suffice. However, as we saw with HDDS-6235, containers may end up in a bad state due to bugs which are recoverable with manual intervention. It would be useful for SCM to still replicate and maintain these containers and log the issues it finds so that we can identify and fix issues faster with less risk of data loss.
With this feature we could make the container scrubber less aggressive and enable it by default to find bugs. We could also use it to flag non-fatal issues on container import/export.