Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Consider a case where SCM has a CLOSED container and the replica states are CLOSED, CLOSED, QUASI. In the pull replication model, RM will send all 3 of these replicas to the datanode to use as replication sources. The DN will do a random shuffle and pick one to replicate. If it chooses the QUASI-CLOSED replica, the next iteration of RM will see replicas CLOSED, CLOSED, QUASI, QUASI. RM will issue the same command since the CLOSED replicas are still under replicated, but now the odds of the DN's random shuffle choosing a quasi closed replica are increased. This process can repeat until the cluster is filled with a quasi-closed replica on each datanode. This can bring the cluster into the stuck state described in HDDS-8536.