-
Type:
Improvement
-
Status: Open
-
Priority:
Minor
-
Resolution: Unresolved
-
Affects Version/s: 2.8.1
-
Fix Version/s: None
-
Component/s: block placement
-
Labels:None
Corrupt blocks currently need to be removed manually and effectively block the data node on which they reside from receiving a good copy of the same block. In small clusters (ie. node count == replication factor), this prevents the name node from finding a free data node to keep the desired replication level up until the user manually runs some fsck command to remove the corrupt block.
I suggest moving the corrupt block out of the way, like it's usually done by ext2-based filesystems, ie. move the block to /lost+found directory, such that the name node can replace it immediately. Or maybe simply add a configuration option that allows to fix corrupt blocks in-place because harddisks usually internally replace bad sectors on their own and a simple rewrite can often fix those issues.