Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Not A Problem
-
None
-
None
-
None
-
None
Description
Currently NameNode prepares recovering when finding an under replicated block group. This is inefficient and reduces resources for other operations. It would be better to postpone the recovery work for a period of time if only one internal block is corrupted considering points shown by papers such as [1][2]:
1. Transient errors in which no data are lost account for more than 90% of data center failures, owing to network partitions, software problems, or non-disk hardware faults.
2. Although erasure codes tolerate multiple simultaneous failures, single failures represent 99.75% of recoveries.
Different clusters may have different status, so we should allow user to configure the time for postponing the recoveries. Proper configuration will reduce a large proportion of unnecessary recoveries. When finding multiple internal blocks corrupted in a block group, we prepare the recovery work immediately because it’s very rare and we don’t want to increase the risk of losing data.
[1] Availability in globally distributed storage systems
http://static.usenix.org/events/osdi10/tech/full_papers/Ford.pdf
[2] Rethinking erasure codes for cloud file systems: minimizing I/O for recovery and degraded reads
http://static.usenix.org/events/fast/tech/full_papers/Khan.pdf