Details
Description
We had incident whereas after resolving a missing blocks incident by restarting two dead nodes, there were still 8 missing, but the list was empty. Metasave shows the 8 blocks are "orphaned" meaning the files were already deleted. It is unclear why they were left in the replication queue.
- The containing node was flaky and started stoped multiple time.
- The block allocation didn't work well due to the cluster-level storage space exhaustion.
- The NN was in safe mode.
Triggering a full block report from the node didn't have any effect. It will clear up if a failover happens as the repl queue will be reinitialized.