Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
2.8.1
-
None
-
None
Description
Corrupt blocks currently need to be removed manually and effectively block the data node on which they reside from receiving a good copy of the same block. In small clusters (ie. node count == replication factor), this prevents the name node from finding a free data node to keep the desired replication level up until the user manually runs some fsck command to remove the corrupt block.
I suggest moving the corrupt block out of the way, like it's usually done by ext2-based filesystems, ie. move the block to /lost+found directory, such that the name node can replace it immediately. Or maybe simply add a configuration option that allows to fix corrupt blocks in-place because harddisks usually internally replace bad sectors on their own and a simple rewrite can often fix those issues.