[HDFS-12662] lost+found strategy for bad/corrupt blocks to improvate data replication 'SLA' for small clusters - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: 2.8.1
Fix Version/s: None
Component/s: block placement
Labels:
None

Description

Corrupt blocks currently need to be removed manually and effectively block the data node on which they reside from receiving a good copy of the same block. In small clusters (ie. node count == replication factor), this prevents the name node from finding a free data node to keep the desired replication level up until the user manually runs some fsck command to remove the corrupt block.

I suggest moving the corrupt block out of the way, like it's usually done by ext2-based filesystems, ie. move the block to /lost+found directory, such that the name node can replace it immediately. Or maybe simply add a configuration option that allows to fix corrupt blocks in-place because harddisks usually internally replace bad sectors on their own and a simple rewrite can often fix those issues.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Gruust

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 14/Oct/17 00:45

Updated:: 14/Oct/17 00:51