Currently, containers are marked UNHEALTHY by Container Scrubber for one of the following reasons:
- If an operation fails on an open/ closing container, it is marked unhealthy so that subsequent write transactions also fail.
- If Container Scrubber is enabled and ContainerMetadataScanner detects an error during KeyValueContainerCheck#fastCheck().
- Metadata path or Chunks path is not accessible as a directory
- Container checksum verification fails
- On-disk Container Yaml data does not match in-memory container data (ContainerType, ContainerID, Container DBType, Metadata Path)
- If Container Scrubber is enabled and ContainerDataScanner (runs only on closed and quasi-closed containers) detects any block with missing or corrupted chunks file.
If a container in “open” state in SCM is marked unhealthy (in the container report), SCM asks the DNs to close the container. But for a “closing” container with an “unhealthy” replica, SCM leaves the container replica as is.
Some of the issues with how unhealthy containers are handled:
- If ReplicationManager does not find a healthy replica for a container, it does not replicate that container. So if there is only 1 replica of a container and it is unhealthy, SCM will never replicate it and there is potential for data loss if that single replica is lost for any reason (for example: disk failure).
- If there is a Quasi-Closed replica and an Unhealthy container, SCM will delete the unhealthy container. In this scenario, SCM should not delete the unhealthy container if it can recovered as it is possible that the unhealthy container is ahead of the quasi-closed container.
- SCM should be more conservative with deleting unhealthy containers as they could possibly be recovered. This Jira proposes to let SCM replicate an unhealthy container if there is no other replica. Also, if there is only a quasi-closed replica and an unhealthy replica, SCM should not delete the unhealthy replica.
- Let’s say there are 3 quasi-closed replicas of a closed container with all of them having bcsId < container bcsId (closed replica is lost and a quasi-closed replica is replicated). RelicationManager will delete one of these quesi-closed replicas (handleUnstableContainer) and then in the next cycle replicate it again as container would now be under-replicated (handleUnderreplicatedContainer). This will become a loop of replicating and deleting the container replica.